@qt4cg statuses in 2023
This page displays status updates about the QT4 CG project from 2023.
See also recent statuses.
Issue #920 created #created-920
The rules for the "tail position" of a sequence constructor need to take account of xsl:switch
Under xsl:iterate
, there are rules defining what it means for an instruction to be in a tail position in a sequence constructor. In these rules xsl:switch
should be treated the same way as xsl:choose
.
Issue #919 created #created-919
Should predicate callbacks use EBV?
Currently predicate callback functions used by things like fn:filter have to return a boolean; they can't rely on EBV semantics. For example you have to write fn{boolean(self::p)}
rather than fn{self::p}
.
It's probably with self
tests that this is most noticeable, because self is so often used in a boolean context.
Of course the current rule gives stricter type-checking which will presumably catch some user errors. But it seems an unnecessary inconsistency.
Pull request #918 created #created-918
Minor cx through chap. 14
fn:splice
examples expanded to illustrate integer steps other than 1.fn:unparsed-text-lines
updated to reflect recent decisions about line handling infn:unparsed-text
.- Other clarifications or corrections.
Let me know if any of these edits are misfires.
Issue #917 created #created-917
Better support for typed maps
Edit (2023-01-04): See https://github.com/qt4cg/qtspecs/issues/917#issuecomment-1875712638 for the most promising suggestion resulting from the discussion in this thread.
Inspired by #720 and concerns regarding usability and performance, it may be a big step, but couldn’t we define records as subtypes of maps?
- The main difference would be that updates on records are only allowed as long as the resulting map matches a record definition.
- This would allow us to return much better error messages, and to prevent users from deconstructing their own data structures.
- We could still benefit from the existing map functions… provided that we believe it's an advantage. A stricter solution would be to disallow optional map entries completely (and treating records as a separate type).
- From a technical point of view, data with a fixed structure can be optimized much better than a structure that changes dynamically.
Pull request #916 created #created-916
720 Allow methods in maps with access to $this
This proposal allows functions within maps to access the containing map using the variable $this.
The proposal needs editorial work to integrate it fully into the text, but it is intended to be sufficiently complete to enable a full technical review.
Fix #720
Issue #915 created #created-915
[Editorial] Incorrect terminology: function implementation is now function body
In 4.6.2.5 Inline Function Expressions we refer to the "implementation" property of a function item; this property has been renamed as the "function body".
Pull request #914 created #created-914
XQFO minor edits
From reading up through chap. 13. The change in the title of chap. 10.2 is more accurate, and avoids the repetition of the title of chap. 10 itself.
Issue #913 created #created-913
XQFO: under/unused variable apparatus
In the preamble of XQFO chap. 13, several paragraphs are spent introducing a tree structure example, defining variables $po
, $item1
, $item2
, and $item3
. The prose leads the reader to expect frequent invocation of this tree example, but chap. 13 never uses it.
Chapter 13's tree example is referred to (sans link) and summarized in the preamble of chap. 14. In that chapter it is used only once, in a fn:count
example that really doesn't rely upon anything special in the tree example.
Variables $item1
and $item2
are invoked only once more: in chap. 4, for fn:number
. It's rather out of the blue, because neither the function definition nor chapter 4's preamble say anything about what the variables mean.
My recommendation is to drop this material all together, and for the functions fn:count
and fn:number
replace the examples with simpler examples.
OTOH, I might have come across an incomplete implementation, and the editors might prefer to make more thorough use of this tree example. I don't know.
Pull request #912 created #created-912
XQFO: Minor edits
Editorial; examples fixed
Issue #297 closed #closed-297
Lookup in deeply nested JSON, an abbreviated syntax for map:find
Issue #20 closed #closed-20
Highlight EBNF grammar differences in the diff versions of the specs
Issue #51 closed #closed-51
Generalize lookup operator for function items
Issue #705 closed #closed-705
Function Coercion: Function Arities
Issue #707 closed #closed-707
Dynamic Function Calls: Processing Empty Sequences
Issue #892 closed #closed-892
XPDY0002: Misleading examples
Issue #903 closed #closed-903
892 XPDY0002: Misleading examples
Issue #902 closed #closed-902
900 fn:sort, array:sort: Parameter names
Issue #900 closed #closed-900
fn:sort, array:sort: Parameter names
Issue #894 closed #closed-894
Errors in forming function items
Issue #897 closed #closed-897
894 - errors in forming function items
Issue #866 closed #closed-866
fn:sort, and XSLT and XQuery sorting, should use transitive comparisons
Issue #881 closed #closed-881
866 Introduce and exploit new numeric-compare() function
QT4 CG meeting 059 draft minutes #minutes-12-19
Draft minutes published.
Issue #911 created #created-911
Type "Promotion" in the coercion rules
I have an open action: QT4CG-052-06: MK to consider the editorial question of “promotion” for the symmetric relations.
I think the point that led to this was the fact that the word "promotion" seems inappropriate for cases like (string/uri) where the implicit conversion can take place in either direction.
I'd like to propose a fix to this that is not merely editorial. I propose that we allow any cast from one numeric type to another in the coercion rules. For example, if the required type is decimal, then a double or float can be supplied. Since, for many implementations of xs:decimal, this can be done losslessly, it makes at least as much sense to convert from double to decimal as from decimal to double.
The word "promotion" is in fact used (in relation to the coercion rules) only in a table heading, and we can change this heading to "implicit casting".
Appendix B.1 currently says:
B.1 Type Promotion
[Definition: Under certain circumstances, an atomic value can be promoted from one type to another.] Type promotion is used in a number of contexts:
It forms part of the process described by the [coercion rules], invoked for example when a value of one type is supplied as an argument of a function call where the required type of the corresponding function parameter is declared with a different type. It forms part of the process described in [B.2 Operator Mapping]), which selects the implementation of a binary operator based on the types of the supplied operands. It is invoked (by explicit reference) in a number of other situations, for example when computing an average of a sequence of numeric values (in the fn:avg function).
and I suggest we retain the term only for the second case, operator mapping. This differs from the coercion rules in that there are two operands and the effect is always to convert one to the type of the other. This affects numeric types only (not string/uri or binary), and it will continue to promote decimal to double, decimal to float, and float to double.
Where functions (fn:avg, fn:sum, math:pow) refer to the promotion rules, I suggest that we spell out the conversions that happen explictly, since it's not entirely obvious how the rules should be extrapolated. (For example, fn:avg doesn't make it entirely clear what should happen if the first item is a decimal, the second is a float, and the third is a double. Are you expected to "look ahead" to see what types are present, rather than evaluating the average incrementally?)
QT4 CG meeting 059 draft agenda #agenda-12-19
Draft agenda published.
Issue #910 created #created-910
Introduce a Kollection object with functions that operate on all types of items that can be containers of unlimited number of "members"
The base for this issue is the email sent by @ndw to public-xslt-40@w3.org on Dec. 13th 2023, fully quoted below:
Hello all,
After a couple of weeks of discussion[1][2] about naming things, there seem to be a some quite different perspectives on the problem.
As background, let’s remember that we have a language (or a set of languages) that evolved over time. We couldn’t anticipate in version 1.0 what we would have in 4.0. We added new features in 2.0 and 3.0 that weren’t anticipated in previous versions either.
We live with decisions (some the result of long and hard battles within the working group(s)) like the fact that sequences don’t nest so all individual items are also sequences of length one.
The context for each addition to the language has been roughly: how can we add new, useful features with a minimum of backwards incompatibility.
It’s a natural consequence of this sort of evolution that there are rough edges. Why does fn:count returns the number of items in a sequence but always returns 1 if the argument is an array? Because an array is an item and an item is a sequence of length one.
(It doesn’t help that the vision of what the X* languages should be has changed over time. What started out envisioned as a tool for transforming documents from one format to another for presentation on the web or in print has grown into something that at least some members of the group view as first class, functional programming languages. That’s not bad, but it puts entirely different stresses on the design, I think.)
As we add new functions (specifically, in the case of recent discussions, but I expect the same perspectives apply more generally), I think one perspective is roughly this:
How can we name and organize the functions so that users are least likely to be surprised and most likely to be able to figure out how to solve a particular problem?
Taken to an extreme, this perspective isn’t about changing the semantics of the functions at all, it’s “just” about naming them. Is fn:get() better (easier to understand, less confusing) than fn:items-at?
I think another perspective is roughly this:
We have a messy design. It would be better if we could refactor the design so that it was more harmonious and logical. We don’t need four different, closely related functions to get items out of different sorts of data structures, we need a set of abstractions that make it obvious that only one function is necessary.
Taken to an extreme, this perspective is about reshaping the whole language so that a single, obvious set of function names emerges naturally from the carefully constructed abstractions.
I don’t think anyone holds exactly one perspective (discussions about renaming often involve some level of discussion about semantics, for example) and I’m attempting to polarize the perspectives a little bit in an effort to shine light on a larger problem, not to be divisive.
With my chair’s hat on, the main problem I see with the first perspective is that naming is hard, often personal and emotional, and will never be wholly logical (so there will always be more to discuss, so the “problem” is never resolved). It’s not quite fair to say it’s a distraction from the “bigger” issues we need to resolve, but it does take a lot of time.
I see the appeal of the second perspective. If we had a green field, we’d do things differently. I think we might all agree that, ideally, fn:count should return the number of items in a sequence, the number of items in an array, and the number of key-value pairs in a map. But it doesn’t and it can’t without fundamentally breaking things. I don’t think we’d get agreement to break fn:count, so what can we do?
A proposal to fundamentally redesign the data model would be a tough sell, I think.
One thing we could do is define a new namespace “gn” with functions that work more logically, that treat sequences, arrays, and maps, as collections and operate on them uniformly.
I suppose we could reconstruct the whole set of functions in this new namespace and focus our efforts there, perhaps going so far as to deprecate the current fn: namespace in favor of this new one. But could we get consensus to do that? Would users thank us?
I dunno. Innovations welcome.
Be seeing you, norm
This issue addresses the 2nd alternative formulated briefly by Norm as:
One thing we could do is define a new namespace “gn” with functions that work more logically, that treat sequences, arrays, and maps, as collections and operate on them uniformly.
I suppose we could reconstruct the whole set of functions in this new namespace and focus our efforts there, perhaps going so far as to deprecate the current fn: namespace in favor of this new one. But could we get consensus to do that? Would users thank us?
Here are some of the obvious advantages of having a uniform Kollection concept that covers: arrays, sequences, maps, ... and possibly future new, specific, collection-like datatypes as sets:
-
Uniform definition and understanding of a single data type - the Kollection.
-
O(N) functions only, compared to O(M * N) at present. Here N is the number of functions needed for each of the current collection-like data types (Arrays, Sequences and Maps) and M is the number of collection-like data types (currently 3).
-
The users will need to know about and understand just the single Kollection data type and its functions, not 3 or more similar collection-like data types and 3 or more number of similar (but different) functions. Minimizing by a factor of 3 the amount of factual knowledge that a user needs is something HUGE and extremely positive.
-
Allowing users to say "Good Bye" to the unclear and treacherous flat-sequence concepts we have as legacy from XPath 1.0.
-
Staying aligned to the examples of other modern programming languages such as C# with its IEnumerable interface. It is good to know that this has already been done in other shining programming languages, thus a nay-sayer will not be able to argue that this is not doable or, if done, would be negative to the language and its users.
-
Freeing enormous resources and time for the members of the Community Group so that they can spend this on more valuable avenues, than trying to find similar and best names to M similar functions each defined to one of the M current collection-like data types.
Now, to dispel some plausible myths before they start circulating here:
-
Myth 1: This will break backwards-compatibility? No, as proposed by Norm, all the functions operating on the generalized collection data type can be in a separate, new namespace and thus no existing user-code is affected.
-
Myth 2: If a sequence containing a single Kollection still has
count()
of 1, then what is the use of the Kollection data type? Actually, as proposed by Norm, the Kollection data type and its functions reside in their own namespace. Doing things using only functions from this new namespace eliminates the possibility of usingfn:count
as it resides in the different, currently existing standard function namespace. -
Myth 3: This will be too-complex for the users and the users will not embrace it, so let us not waste time designing it. Wow, there were such prophets saying exactly the same about LINQ in 2005. As it often happens, the future proved them wrong. Users clearly and overwhelmingly "voted with their code" incorporating LINQ in almost all everyday applications and code repositories.
-
Myth 4: Banning the current functions operating on sequences, arrays and maps would be a huge burden to the users, and would intervene negatively with their programming. In fact, nobody would be banning any of the existing functions. Users can continue to use them forever. The acceptance of the uniform and generalized Kollection data - type can happen gradually with time, as was the case with the addition of LINQ to C#.
Pull request #909 created #created-909
893 fn:compare: Support for arbitrary atomic types
Issue #908 created #created-908
Function identity: documentation, nondeterminism
In #520, the concept of function identities was introduced. This is what the current draft says:
XDM, 2.9.4 Function Items
identity: an abstract property that can be used to test whether two variables refer to the same function or to different functions. This property is exposed only for this purpose.
Note: Currently, the concept of function identity is used for two purposes: firstly, when functions appear in the arguments supplied to the
fn:deep-equal
function; and secondly, in establishing whether the arguments and results of a function are "the same" when deciding whether the function is deterministic.Note: Function identity is not currently defined for maps and arrays, because in the circumstances where function identity would otherwise be used, maps and arrays are compared by examining their content.
XQFO, 1.8.4 Properties of functions
- […] the two function items have the same function identity. The concept of function identity is explained in Section 2.9.4 Function Items.
XQFO, 14.2.8 fn:deep-equal
c.
$i1
and$i2
have the same function identity. The concept of function identity is explained in Section 2.9.4 Function Items.
XQFO, 17.1.1 fn:function-lookup
The function identity is determined in the same way as for a named function reference. Specifically, if there is no context dependency, two calls on
fn:function-lookup
with the same name and arity must return the same function.
While I definitely believe in the concept, I believe the documentation is still cryptic, or even impossible, to understand, at least without reading #520 or consuming the existing QT4 test cases. Here are some questions that I’m trying to answer:
- Does “abstract property” mean that the property will not be materialized in an implementation, or does it mean that the property too vague to be precisely defined?
- We should try to specify what “refer to the same function” means. Are function properties that allow us to at least safely identify a subset of functions the same? For example, will
true#0
andtrue#0
always be identical? The test cases imply this, whereas #520 doesn’t. - In XQFO 17.1.1, there’s a hint that context-dependency influences the decision if functions are identified as equal. Does this mean that
name#0
andname#0
cannot be equal? Or can, or will, they be equal if the context is identical? - The term “deterministic” does not appear anywhere else in the XDM spec, so one is inclined to think of the XQFO nondeterminism. It is then unclear whether two instances of, for example,
map:entries
andfn:parse-xml
can be “the same” if the parameters are the same. - In XQFO 17.1.1, “must return the same function.” is also unclear: What exactly is meant by “same function”? Is it a function that creates an identical result (thus, excluding nondeterministic functions like
fn:parse-xml
)? - We should explain better what was the motivation to include the context-dependency of functions in the definition. It would certainly be more intuitive if both
deep-equal(name#0, name#0)
anddeep-equal(name#1, name#1)
returnedtrue
.
I’m sorry for not offering good answers in return. I could try to describe what we’ve implemented so far – mostly inspired by the test cases – but I’m not sure if it meets the requirements.
Related: #333
Pull request #907 created #created-907
906 fn:deep-equal: unordered → ordered
Issue #906 created #created-906
fn:deep-equal: unordered → ordered
As already suggested in https://github.com/qt4cg/qtspecs/pull/798#pullrequestreview-1709271106 (and by first user feedback), the upcoming PR renames the option unordered
to ordered
, with true
as default.
Issue #339 closed #closed-339
The constraints on document-uri are too...constraining
Pull request #905 created #created-905
898 - relax the constraints on document-uri
Fix #898
Changes non-normative text for the doc() and document-uri() functions to make it clearer that the consequences of the normative rules are not quite as previously stated.
Pull request #904 created #created-904
821 Annotations: Make default namespace explicit
In addition: Lexicographic order; formatting.
Issue #129 closed #closed-129
Context item → Context value?
Issue #608 closed #closed-608
Formatting Monospace (II)
Pull request #903 created #created-903
892 XPDY0002: Misleading examples
Pull request #902 created #created-902
900 fn:sort, array:sort: Parameter names
Closes #900. In addition, the equivalent expression for array:sort
was fixed.
Pull request #901 created #created-901
895 Parameters with default values: allow empty sequences
Closes #895
Issue #900 created #created-900
fn:sort, array:sort: Parameter names
The sort functions now accept multiple collations, keys, and orders, and this needs to be reflected in the parameter names (which are still singular).
Issue #899 created #created-899
Simplifying the language - types have behaviour.
I may misunderstand something but I always find the use of types and "as" to be counter intuitive (I'd prefer to be able to run an xslt 3+ script in some sort of 'strict' mode that was a bit more rigid, but thus simpler) e.g.
consider
<xsl:variable name="foo1">
<foo/>
</xsl:variable>
question - what is the type of foo1
?
answer - (according to my saxon/oxygen setup the answer is) "document-node"
consider
<xsl:variable name="foo2" as="element(foo)">
<foo/>
</xsl:variable>
question - is this code valid then (I would as someone not used to xslt 2+ assume not)? answer - yes
but surely this code is identical to foo1, so the 'type' of variable is actually changing the interpretation of the expression.
For me that's quite confusing
It would appear that these 2 values are not two different views (interfaces) of the same underlying value, else this
<xsl:variable name="foo3" as="element(foo)" select="$foo1"/>
would be valid.
It isnt (i.e. this doesn't appear to be some subtle OO style scenario where an evaluation can have multiple interfaces, here 'document-node' and 'element' are presumably disjoint types).
For me conceptually types are descriptions of expressions, they have no behaviour, yet here they appear to (to me) effect the interpretation, not simply describe it.
For me, I'd prefer a 'strict' mode where either
<xsl:variable name="foo2" as="element(foo)">
<foo/>
</xsl:variable>
is a type error, because the expression is clearly a document-node OR some other mechanism to clarify the ambiguity without this conceptual wrinkle.
An expression should either have 1 interpretation, or if its ambigious, that should be an error, I don't think the language should default to prefer one over another.
P.S. why doesnt this work? I genuinely don't know how to explicitly declare something as a document-node.
<xsl:variable name="foo1" as=document-node()>
<foo/>
</xsl:variable>
Issue #898 created #created-898
Drop the requirement for document-uri() uniqueness
The specification of document-uri() states that the returned URI must be useable as input to the doc() function and that it always round-trips, so doc(document-uri($X)) is $X
is guaranteed true.
A consequence of this rule is that you can't have two documents in the same execution scope with the same document-uri() property.
Enforcing this rule causes a lot of trouble. For example at the API level, the user can set two stylesheet or query parameters to two different documents that are associated with the same URI. Another example, two XSLT packages in the same stylesheet can call doc() on the same URI and get different documents back because they have set different validation and whitespace-stripping options. Use of fn:transform()
causes further complications when documents are passed across the boundary (in fact that's where we first encountered the problem). And collection()
brings further complications.
In order to conform to the rule in the spec, we changed Saxon a while back so the only documents that are guaranteed to have a document-uri() property are those that were read using the doc() function - and even then, things like validation and whitespace variations are troublesome. This causes users much confusion, partly, because of the change from earlier Saxon releases, but more because there are many situations where they would expect document-uri() to give a useful result and it doesn't.
I think we can fix this simply by removing the guarantee. That will cause less inconvenience to users than the current rule. We could perhaps modify it to say that in the case of a document returned from the doc() function, its document-uri() property will be such that a call to doc() with that URI will return the same document, but in the case of documents derived from other sources (for example collection()
, or a result from fn:transform
) there is no such guarantee.
Issue #887 closed #closed-887
Trivial syntax error under "named function references"
Issue #896 closed #closed-896
887 - fix simple typo in example
Issue #862 closed #closed-862
Examples needed for "Implausible Expressions"
Issue #884 closed #closed-884
862 Add explanations and examples of implausible expressions
Issue #844 closed #closed-844
New sequence functions: names
Issue #879 closed #closed-879
844 New sequence functions: names
Issue #875 closed #closed-875
XQFO, chap. 9 minor edits
Issue #865 closed #closed-865
Need to explain change in numeric comparison semantics
Issue #873 closed #closed-873
865 Improve explanation of equality comparisons
Issue #867 closed #closed-867
Signature notation in F+O: default values
Issue #870 closed #closed-870
867 Explain defaults in function signatures
Issue #864 closed #closed-864
$position argument in fold-right
Issue #742 closed #closed-742
xsl:function-library: keep, drop, or refine?
Issue #863 closed #closed-863
742 Drop xsl:function-library declaration
Issue #847 closed #closed-847
build-uri() - is {"port":()} legal?
Issue #849 closed #closed-849
847 Allow uri-structure-record keys to have empty sequence values
Issue #479 closed #closed-479
fn:deep-equal: Input order
Issue #798 closed #closed-798
479: fn:deep-equal: Input order
QT4 CG meeting 058 draft minutes #minutes-12-12
Draft minutes published.
Pull request #897 created #created-897
894 - errors in forming function items
Fix #894
Pull request #896 created #created-896
887 - fix simple typo in example
Fix #887
Issue #895 created #created-895
Parameters with default values: allow empty sequences
We need a consistent approach for defining types of optional function arguments. In most current cases, if a function argument is supplied, it must be non-empty:
map:get(
$map as map(*),
$key as xs:anyAtomicType,
$fallback as function(xs:anyAtomicType) as item()* := void#1
) as item()*
fn:starts-with-sequence(
$input as item()*,
$subsequence as item()*,
$compare as function(item(), item()) as xs:boolean := fn:deep-equal#2
) as xs:boolean
In some cases, it’s optional:
fn:replace(
$value as xs:string?,
$pattern as xs:string,
$replacement as xs:string? := (),
$flags as xs:string? := '',
$action as (function(xs:untypedAtomic, xs:untypedAtomic*) as item()?)? := ()
) as xs:string)
(: #874 :)
fn:subsequence(
$input as item()*,
$start as xs:double? := (),
$length as xs:double? := (),
$from as (function(item(), xs:integer) as xs:boolean)? := (),
$while as (function(item(), xs:integer) as xs:boolean)? := (),
$until as (function(item(), xs:integer) as xs:boolean)? := ()
) as item()*
As a result, map:get($map, fallback := ())
is invalid, while replace($string, $pattern, action := ())
would be valid.
I think it’s better to enforce non-empty arguments (provided that a single item is expected).
Issue #894 created #created-894
Errors in forming function items
Follow-up to issue #888
There are a number of situations in which error behaviour is insufficiently specified:
- Consider the partial function application
fn:contains(?, 23)
, where the second argument is an integer, not a string. Does the partial application fail, or does it result in a function item that fails when dynamically evaluated? The spec does not say. - Consider the expression
function-lookup(fn:name, 0
) evaluated when there is no context item in the dynamic context. Does the call on function-lookup fail, or does it return a function item that fails when dynamically evaluated? The spec does not say, though there are numerous test cases in the function-lookup test set that suggest the latter. - The same is true for the expression
fn:name#0
evaluated when there is no context item in the dynamic context. - Consider a user-defined function
my:f
with a parameter that has a default value of "."; consider both a function referencemy:f#0
and a partial applicationmy:f(?)
evaluated when there is no context item. Again, I think the spec is unclear on the error behaviour.
It feels to me that the right thing to do in all these cases is to raise the error early: that is, to fail at the point where a function item is being constructed, not at the point where the function item is subsequently evaluated. However, this disagrees with QT3 expected test results for tests such as fn-function-lookup-267
and function-literal-267
.
Issue #893 created #created-893
fn:compare: Support for arbitrary atomic types
Inspired by #866:
We should extend fn:compare
to support arbitrary atomic types. The comparison rules…
- would be unchanged for strings,
- would rely on
fn:numeric-compare
for numbers, and - would rely on the existing
op:
functions for the remaining types.
For example, the rule for dates would be:
0
is returned ifop:date-equal(A, B)
istrue
,-1
is returned ifop:date-less-than(A, B)
istrue
,1
is returned otherwise.
Some types will be rejected (xs:duration
, xs:QName
, xs:NOTATION
, Gregorian types).
In addition, I would vote for making fn:numeric-compare
and fn:atomic-equal
private. I don’t see a benefit to expose them; I rather expect people to be confused.
Issue #892 created #created-892
XPDY0002: Misleading examples
Related to #888. The examples in 4.6.2.2 Evaluating Dynamic Function Calls…
…are misleading; all of them raise XPDY0002
if no element is bound to the global context value. Maybe shop
could be replaced with $shop
or doc('wares.xml')/shop
.
QT4 CG meeting 058 draft agenda #agenda-12-12
Draft agenda published.
Issue #891 closed #closed-891
Cleanup the post-diff-hacking hack
Pull request #891 created #created-891
Cleanup the post-diff-hacking hack
Improved, I think.
Issue #890 closed #closed-890
Stop fussing with merge base branch
Pull request #890 created #created-890
Stop fussing with merge base branch
Trying to track down the spurious diffs that we see in PRs.
I'll have to merge this to test it, so there will be a few random merges here. Sorry.
Issue #889 created #created-889
Rename "Named Function Reference"
The term "named function reference" is used for a construct like name#1
.
Although we define it as "A named function reference is an expression (written name#arity) which evaluates to a [function item]", the term "named function reference" perpetuates the incorrect assumption that it is some kind of literal or constant denoting a function item.
Of course, in many cases it can be treated as just that. But not when the function is context-dependent, for example name#0 or lang#1.
The term is also questionable because one would assume that a "named function reference" is a reference to a "named function", but there is no such concept as a "named function".
So what might be a better name? What the expression actually does (when evaluated) is to search the static context for a function definition whose name and arity range correspond, and then construct a function item that captures the relevant part of the dynamic context in its closure. It's hard to encapsulate all of that in a simple name for the construct, but I would suggest named function generator. This is sufficiently close to the current term to be recognisable, but tries to capture the fact that it's not just a constant or literal, it's an expression that activately does something when evaluated; and it's reasonably accurate in that the result of the evaluation is a function item that has a non-absent name.
Issue #888 created #created-888
Reclassify XPDY0002 as a type error
I propose to reclassify XPDY0002 (context item is absent) as a type error rather than a dynamic error.
I don't propose to change the error code.
The only practical distinction is that this will allow the error to be reported statically when it can be detected statically, for example if the user writes something like
function($x as node()) {
starts-with(name(), 'x')
}
At present Saxon will give you a compile-time warning for this, followed by a run-time error if the code is actually executed; this is the required behaviour for dynamic errors.
The change does mean that in a case like this example, it will no longer be possible to catch the error using try/catch. However, type errors can only be reported statically if the code is bound to fail at run-time, and catching errors that occur every time is not especially useful.
Issue #887 created #created-887
Trivial syntax error under "named function references"
In the XPath/XQuery book, §4.6.2.4,
let $f := <foo/>/fn:name#0 return <bar>/$f()
should be
let $f := <foo/>/fn:name#0 return <bar/>/$f()
Digging a bit deeper, this reveals that we are not properly tagging and syntax-checking code examples in the spec.
Issue #886 created #created-886
Binary map keys
We have made xs:hexBinary
and xs:base64Binary
comparable and we now allow implicit coercion between the two types.
I've been assuming, though I'm not sure we ever discussed it, that this automatically means that the two types can be "atomic equal" from the point of view of entries in maps: that is, a hexBinary representation of a particular binary value can no longer coexist in a map with a base64Binary representation of the same binary value.
If we were starting from scratch this would clearly make sense, but it has some messy implications:
- It's a backwards incompatibility; in 3.1 you could construct maps that you can no longer construct in 4.0
- It potentially affects interoperability of 3.1 and 4.0 applications. For example, an XQuery 4.0 application invoking an XSLT 3.0 transformation via
fn:transform
might get back a map that's not a valid map in 4.0.
In effect, this is not just a change to the behaviour of one function/operator, it is a data model change, because it changes the value space of the map(*)
data type.
And more parochially, I freely admit, there's a lot of internal complexity trying to maintain a code base that supports both the 3.1 and 4.0 rules simultaneously.
Is this a feature that benefits users sufficiently to justify the transition complexities? Note that we can still support "eq" between the two data types without supporting fn:atomic-equal
.
Issue #885 created #created-885
fn:uuid
…to create a random universally unique identifier (UUID), represented as 128-bit value.
Should ideally be nondeterministic, or we may need to do something that’s similar to fn:random-number-generator
.
Pull request #884 created #created-884
862 Add explanations and examples of implausible expressions
Fix #862
Issue #883 created #created-883
Improve return type for fn:load-xquery-module()
The return type is given as map(*)
. We could make it more precise with a record type.
The same goes for a number of other function signatures that currently use map(*) as an argument or result type.
Perhaps we should also define a more precise type for options parameters.
Issue #882 created #created-882
fn:chain or fn:compose
I thought I had a great opportunity to use fn:chain
the other day, and then found it didn't do what I wanted.
I wanted to negate a predicate: in pseudo-code
items-where($seq, not contains(?, "e"))
and I thought I could do this by chaining contains
and not
. But it doesn't work that way: fn:chain
applies a sequence of functions to an argument, it doesn't compose a sequence of functions to yield a new function.
I wonder if a function that composes functions would be more useful, so I could write
items-where($seq, compose((contains(?, "e"), not#1)))
Pull request #881 created #created-881
866 Introduce and exploit new numeric-compare() function
Fix #866
The proposal introduces a new fn:numeric-compare function that differs from lt/eq primarily in that decimals are compared retaining their full precision, rather than converting them to doubles which may lose precision. This makes the comparison fully transitive which makes it safe to use in all sorting algorithms.
The new comparison semantics are exploited in max(), min(), and sort(), and indirectly in highest() and lowest(); they are also referenced for comparing numeric values in XSLT xsl:sort
(and therefore indirectly in xsl:merge
) and in XQuery order by
.
An effect of the change is that max() and min() applied to a sequence of integers now return an integer, not a double.
Pull request #880 created #created-880
872 Symmetry: fn:items-at → fn:get
Closes #872.
Pull request #879 created #created-879
844 New sequence functions: names
Closes #844. The items
keyword in the function names (excluding items-at
) has been changed to subsequence
.
See #878 for the controversial discussion on what to do with subsequence-(after|before|starting-where|ending-where)
.
Issue #855 closed #closed-855
844 New sequence functions: names
Issue #869 closed #closed-869
Incorrect example: for-each-pair
Issue #878 created #created-878
Proposed extension to subsequence
Copied from https://github.com/qt4cg/qtspecs/issues/844#issuecomment-1841415417:
I'm thinking again about integrating the items-* quartet into a heavily overloaded subsequence
function.
Must supply zero or one of:
- $start - the start position
- $from - a predicate, such that the start is the first item to match the predicate
-
- defaulting to 1
And zero or one of:
- $length - the number of items to include
- $while - a predicate, the subsequence takes items so long as the predicate is true
- $until - a predicate, the subsequence takes items up to and including the first for which the predicate is false
-
- defaulting to the end of the sequence.
This omits the "items-after" combination, but that one is easily achieved using tail(subsequence(from:="xxx"))
.
Issue #877 created #created-877
Inconsistency in XQFO comparator functions/operators with recursive rules
The rules for op:hexBinary-less-than() appear to define a recursive octet-by-octet operation, but I think it flounders in rule 3, where it does not ask for rule 2 to be applied seriatim to each octet pair, but asks for an en masse comparison of two octet sequences.
Compare to 5.3.2, Unicode Codepoint Collation, which describes a similar recursive item-for-item comparison. Interesting formal differences (e.g., unordered list versus ordered list).
fn:deep-equal() is similar, but it is also much more complex. Nevertheless, the way it breaks down the problem at the outset, to dispense immediately with the recursive factor, and deal simply with the rules for equality, is IMO admirable.
It would be nice if there were a bit more consistency in the prose and presentation of recursive rules. Do others agree, and are there other functions/operations that should be considered in this question? I'm thinking immediately only of comparator functions/operations, not functions that use recursion to filter or create. (There may be parallels, but let's start with those functions that are most similar.)
Issue #876 created #created-876
Placement of fn:in-scope-namespaces(), fn:in-scope-prefixes(), fn:namespace-uri-for-prefix()
Currently fn:in-scope-namespaces()
, fn:in-scope-prefixes()
, and fn:namespace-uri-for-prefix
are filed under XQFO chapter 10, which purports to deal exclusively with QNames. But these three functions have no direct bearing on QNames in either input or output.
Two options occur to me:
- Move sections 10.2.6-8 to fall after 13.3.
- Rename chapter 10 to "Functions related to QNames and namespaces". Create a 10.3 that pertains exclusively to namespaces. Move to this new subchapter 10.2.4, 10.2.6-8, as well as 13.3
fn:namespace-uri()
.
Or some variant of the above.
At any rate, I think the current placement doesn't properly expose these functions to the browsing reader.
Pull request #875 created #created-875
XQFO, chap. 9 minor edits
Hopefully nothing controversial here. Edits are motivated by consistency and clarity.
Issue #624 closed #closed-624
XPath function definition clarification
Issue #616 closed #closed-616
XDM: X Node vs. x node
Issue #464 closed #closed-464
Serialization sequence normalization step 3 needs clarification
Pull request #874 created #created-874
878 Proposed extension to subsequence
Following discussion under issue #844, I decided to explore the possibility of extending subsequence() with optional parameters, with the aim of making the quartet of items-before/after/starting-with/ending-with unnecessary.
This is the spec that results. I feel it's a good trade-off; by adding three optional parameters to fn:subsequence
, we can eliminate 4 functions that we are having trouble finding names for. The examples feel to me to be intuitive and readable; and there is more capability in the new function than we had before, for example by combining a predicate for the start position with an integer for the length.
I haven't explored arity-2 callbacks - these certainly need some notes and examples.
Issue #822 closed #closed-822
XQuery, XQFO: Edits (pool)
Issue #851 closed #closed-851
822: XQuery, XQFO: Edits (pool)
QT4 CG meeting 057 draft minutes #minutes-12-05
Draft minutes published.
Pull request #873 created #created-873
865 Improve explanation of equality comparisons
Fix #865
This PR:
- Adds a non-normative appendix to XPath and XQuery comparing and contrasting the different ways of doing equality comparisons
- Changes fn:atomic-equal so it no longer refers to fn:deep-equal (the recursion terminated, but was confusing to follow)
- Removes text in XQuery describing the non-transitivity of
group by
clauses, which is now a solved problem - Corrects the description of backwards incompatibilities relating to numeric comparisons in the F+O spec.
Issue #872 created #created-872
Symmetry: fn:items-at → fn:get
I think that fn:items-at
should be changed to fn:get
:
- In #843, we try to harmonize the function names across sequences, maps, and arrays. We have
array:get
andmap:get
to retrieve single entries of the input, but we havefn:items-at
for sequences. fn:items-at
allows you to supply more than a single position, butitems-at($seq, (1, 3, 2)
can easily be rewritten to(1, 3, 2) ! get($seq, .)
– similar to(1, 3, 2) ! array:get($array, .)
and(1, 3, 2) ! map:get($map, .)
.- With #844,
fn:items-at
would be the only function left withitems
in its name.
The function signature would be as simple as:
fn:get(
$input as item()*,
$at as xs:integer
) as item()
Obviously, most people will still use $input[$at]
– but the same applies to arrays and maps (and other functions like fn:head
). One of the advantages of fn:get
is that you can pass on the context item as position argument.
Pull request #871 created #created-871
Action qt4 cg 027 01 next match
Pull request #870 created #created-870
867 Explain defaults in function signatures
Fix #867
Issue #869 created #created-869
Incorrect example: for-each-pair
The fourth example of fn:for-each-pair
is wrong:
for-each-pair(
(1, 8, 2),
(3, 4, 3),
fn($item1, $item2, $pos) {
$pos || ': ' || max(($item1, $item2))
}
)
Result:
("1: 1", "2: 4", "3: 2")
The results as given return the min of the pair, not the max.
QT4 CG meeting 057 draft agenda #agenda-12-05
Draft agenda published.
Issue #868 created #created-868
fn:intersperse → fn:join, array:join($arrays, $separator)
With string-join
, you can create a string for multiple strings, optionally interspersed with a separator. array:join
can be used to create an array from multiple arrays.
fn:intersperse
, which has been added to the XQuery 4 draft, does something very similar, and early feedback indicates that the function is useful, but easily to overlook due to its name.
I propose to unify the functions by…
- renaming
fn:intersperse
tofn:join
; - adding a parameter to
array:join
:$separator as array(*)* := ()
; and - allowing a separator sequence for
fn:string-join
:$separator as xs:string* := ()
.
Examples
Query | Result | Info
-- | -- | --
string-join(('1','2','3'), '-')
| '1-2-3'
| existing syntax
array:join([[1],[2],[3]], ['-'])
| [1,'-',2,'-',3]
| new
join((1,2,3), '-')
| (1,'-',2,'-',3)
| now: intersperse((1,2,3),'-')
string-join(('1','2','3'))
| '123'
| existing syntax
array:join([[1],[2],[3]])
| [1,2,3]
| existing syntax
join((1,2,3))
| (1,2,3)
| now: intersperse((1,2,3))
(or just (1,2,3)
)
string-join(('1','2','3'), ('-','+')
| '1-+2-+3'
| new
array:join([[1],[2],[3]], ['-','+'])
| [1,'-','+',2,'-','+',3]
| new
join((1,2,3), ('-','+'))
| (1,'-','+',2,'-','+',3)
| now: intersperse((1,2,3),('-','+'))
Issue #867 created #created-867
Signature notation in F+O: default values
Section 1.5 of F+O introduces the signature proforma notation, and indicates that default values may be included for parameters.
It does not however say how the default value is interpreted. For example, with the signature
fn:starts-with-sequence(
$input as item()*,
$subsequence as item()*,
$compare as function(item(), item()) as xs:boolean := fn:deep-equal#2
) as xs:boolean
There is nothing to tell us that the expression fn:deep-equal#2 is evaluated with the static and dynamic context of the caller (or of the function reference).
Note that this is different from the similar notation used for function declarations in XQuery, where the static context for the default fn:deep-equal#2 would be taken from the function declaration in the Query prolog.
Issue #866 created #created-866
fn:sort, and XSLT and XQuery sorting, should use transitive comparisons
We have addressed the question of non-transitivity of equality matching in distinct-values()
, and in XSLT and XQuery grouping, but the same issue exists for sorting. Currently fn:sort
, as well as XSLT and XQuery sorting, rely on the "lt" operator for comparing values including mixed numerics such as doubles and decimals. Because this promotes to double, it is capable of losing precision, and is therefore non-transitive. Most sort algorithms rely on the supplied comparison function being transitive, and if it isn't, then undefined failures may occur including non-termination.
One particular quirk (which led me here) is that fn:highest
and fn:lowest
start by using fn:sort
semantics to put the values in order, and then rely on fn:deep-equal
semantics to find the values that are "equal highest" or "equal lowest". But fn:sort
and fn:deep-equal
have different ways of deciding whether two values are equal: decimal 1.2 and double 1.2 are equal for fn:sort
, but not for fn:deep-equal
.
Issue #865 created #created-865
Need to explain change in numeric comparison semantics
We need to explain more clearly that we now have different rules for comparing numeric values in different circumstances. My understanding of the situation is:
For eq
, =
, etc, we continue to use the XPath 2.0/3.0/3.1 rules for backwards compatibility reasons: for example comparison between decimal and double is done by converting the decimal to a double. This has known problems in terms of transitivity, but we have retained the rules because we identified that too many compatibility problems would be introduced by changing them.
For deep-equal, distinct-values, XSLT and XQuery grouping, etc, we have switched to the rules that were introduced for comparing map keys in 3.0, now available through the fn:atomic-equal function. Under these rules, doubles are promoted to decimals for comparison.
We should probably include a table showing which rules are used where.
A good example to use is (1e-3 = 0.001). This is true in both 3.1 and 4.0. But under the rules for maps in 3.1, and the new rules for distinct-values in 4.0, these two values are considered distinct.
Issue #864 created #created-864
$position argument in fold-right
In fn:fold-right (and thus array:fold-right) it's not clear how the position parameter works.
It appears to start at 1, and then to be decremented, which seems a little weird.
Working out what happens seems to involve reverse engineering the code, which isn't ideal. It's useful to have a formal definition of the function using code, but it shouldn't be necessary to reverse engineer 20 lines of difficult recursive code in order to get a feel for what the function does.
The only example given doesn't add any clarification.
Issue #470 closed #closed-470
369: add fixed-prefixes attribute in XSLT
Issue #412 closed #closed-412
409, QT4CG-027-01: xsl:next-match
Issue #856 closed #closed-856
Spec for deep-equal() still references FOTY0015
Issue #857 closed #closed-857
856 Drop reference to obsolete error condition in deep-equal()
Pull request #863 created #created-863
742 Drop xsl:function-library declaration
Fix #742
Issue #862 created #created-862
Examples needed for "Implausible Expressions"
In XPath 4.0 there is a new concept of Implausible Expressions.
There are several different sections about different types of implausible expressions:
- 2.4.6 Implausible Expressions - only a single example, and it is not in an Examples sub-section and is difficult to locate.
- 3.8.1 Implausible Coercions -has 3 examples and seems OK.
- 4.7.4.3 Implausible Axis Steps - has no visible examples.
- 4.15.3.4 Implausible Lookup Expressions - has no examples.
Another problem is that the definition of "implausible" seems not precise and subjective (what is the meaning of "there is a high probability that they were written incorrectly"):
"implausible Certain expressions, while not erroneous, are classified as being implausible, because there is a high probability that they were written incorrectly."
Proposed fixing actions:
-
Provide a more precise and non-subjective definition of the concept.
-
Provide many examples of implausible expressions - both in the central section 2.4.6 and in all other sections dealing with more specific types of implausible expressions.
Issue #169 closed #closed-169
Handling of duplicate keys in xsl:map
QT4 CG meeting 056 draft minutes #minutes-11-28
Draft minutes published.
QT4 CG meeting 056 draft agenda #agenda-11-28
Draft agenda published.
Issue #858 closed #closed-858
fn:identity: accept 2 arguments, ignore second
Issue #861 created #created-861
Precise meaning of $E??KS
I don't think that the semantics of the expression $E??KS
are clearly enough defined.
The effect of the deep lookup expression E??KS is obtained by evaluating E, establishing its recursive content C, removing any item that is not a map or array to yield a sequence D, and then evaluating the shallow lookup expression D?KS, but with one exception: if evaluation of any shallow lookup fails, then the error is not propagated, but instead its result is taken to be an empty sequence.
- The definition of "recursive content" needs to be tightened up.
- It needs to be more clearly stated which errors we ignore, and which we don't. For example, what if the key specifier evaluates to a non-singleton sequence?
- In the case of
$E??*
in particular, I don't think it makes much sense to exclude items that are not maps or arrays. I think the expectation in this case is to return the full recursive content. - As currently defined, if
$M
is a map, the the result of$M??*
includes the map$M
itself. I don't think this matches expectations. Certainly, with the parallel expression$node//*
, the result does not include$node
.
Issue #860 created #created-860
Unary Lookup when the context value is a sequence
We have added the text
If the context value is anything other than a single item, the semantics of the expression ?KS are defined to be equivalent to the expression . ! ?KS. The remainder of this section therefore explains the semantics on the assumption that the [context value] is a single item, referred to as the context item.
Consider the case where the context value is a sequence of two maps (map{'x':1, 1:'p', 2:'q'}, map{'x':2, 1:'P', 2:'Q'})
and the expression is ?(?x)
. What is the context for evaluation of the key specifier (?x)
? I would have expected that we evaluate the key specifier once, in the outer context, so the key specifier value is (1,2)
and we therefore take the entries with keys 1 and 2 in both maps, giving a result of ('p', 'q', 'P', 'Q'). But the cited paragraph suggests we evaluate KS separately for each item in the context value, and this return entry 1 of map 1 and entry 2 of map 2, giving a result of ('p', 'Q').
I know it's an edge case and it's unlikely in practice that people will write context-dependent key specifiers, but the rules need to be clear. I thought we had previously decided that the key specifier expression should be evaluated in the outer context.
Issue #859 created #created-859
Syntax problem with type-qualified wildcards in lookup expressions
The new syntax for type-qualified wildcards has problems when used in a chained lookup expression, for example
[[1,2], [3,4], 5, 6]?*::array(*)?1
because the "?" that follows array(*)
is interpreted as an occurrence indicator. This can be avoided by using parentheses, but it's too much of an elephant trap - a better solution is needed.
Issue #858 created #created-858
fn:identity: accept 2 arguments, ignore second
Apart from id
, Haskell has const
, which accepts 2 arguments, but only returns the first. Thanks to the introduction of default arguments, it’s straightforward to extend fn:identity
to be able to accept 2 arguments:
fn:identity(
$input as item()*,
$ignored as item()* := ()
) as item()*
Pull request #857 created #created-857
856 Drop reference to obsolete error condition in deep-equal()
Fix #856
Issue #856 created #created-856
Spec for deep-equal() still references FOTY0015
The errors section of fn:deep-equal still says
A type error is raised [[err:FOTY0015] if either input sequence contains a function item that is not a map or array.
This is no longer the case (and error FOTY0015 is now obsolete and should be removed from the appendix)
Pull request #855 created #created-855
844 New sequence functions: names
Closes #844
Issue #852 closed #closed-852
Typo in XQuery equivalent for fn:transitive-closure
Issue #853 closed #closed-853
852 Fix typo in transitive-closure
Issue #854 created #created-854
Need more discussion and explanation of deep-lookup operator
During discussion of the ??
operator it was pointed out that we need more examples and explanation, especially of how to handle cases where the "flattening" behaviour of the operator is inconvenient. This applies equally to paths using the existing ?
operator - $x?y?z
and $x??z
both have this problem. For example, there is no way of filtering the result of$x?y?z
or $x??z
to select only members of size 3.
Issue #57 closed #closed-57
The item-type(T) syntax is not defined
Issue #172 closed #closed-172
Record Tests
Issue #233 closed #closed-233
Declare the result type of a mode, via @as
Issue #698 closed #closed-698
GitHub: Line Endings
Issue #730 closed #closed-730
Equivalence of map and function types
Issue #840 closed #closed-840
Wrong example in fn:seconds-from-duration
Pull request #853 created #created-853
852 Fix typo in transitive-closure
Fix #852
Issue #852 created #created-852
Typo in XQuery equivalent for fn:transitive-closure
tc-inclusive($node/$step(.)), $step)
should read
tc-inclusive($node/$step(.), $step)
Issue #837 closed #closed-837
297 Deep Lookup Operator "??" and wildcard qualifier "::"
QT4 CG meeting 055 draft minutes #minutes-11-21
Draft minutes published.
Issue #848 closed #closed-848
More fo spec examples corrections
Issue #833 closed #closed-833
Fix the line endings, force a single lf in text files
Issue #841 closed #closed-841
840: Typo in fn:seconds-from-duration example
Issue #845 closed #closed-845
Quantified expressions and "binding tuples" (Editorial)
Issue #846 closed #closed-846
845 Drop mention of tuples
Issue #842 closed #closed-842
Improve stylesheet for generating keyword tests
Pull request #851 created #created-851
822: XQuery, XQFO: Edits (pool)
Editorial; closes #822 (visit this issue for a list of the changes)
Issue #850 created #created-850
fn:parse-html: Finalization
Now that fn:parse-html
has been added to the specification, we need test cases for all provided options and input types (including binary input).
Looking at the current set of test cases, it seems unrealistic to use older libraries such as TagSoup for this function. I wonder if we should support ·implementation-defined· parsing algorithms at all. What do others think?
Next, is there any implementation available that supports all given method/html-version variants?
Issue #783 closed #closed-783
Editorial: errors are raised (not reported, signaled, generated, or thrown).
Pull request #849 created #created-849
847 Allow uri-structure-record keys to have empty sequence values
Fix #847
The description of build-uri
was written with the expectation that if a key was in the map, it's value should be used. I don't really want to replace every occurrence of
if `x` is present in the map
with
if `x` is present in the map and its value is not the empty sequence
So I've attempted to justify that globally with the following paragraph at the beginning of the description:
The components are derived from the contents of the $parts map. To simplify the description below, any key whose value is the empty sequence is ignored; this is equivalent to the key not being present in the map.
I'm not hugely proud of that bit of prose though. Suggestions for improvements most welcome.
Pull request #848 created #created-848
More fo spec examples corrections
-
Adds tagging for
fos:test
elements so we know which tests depend on XQuery (rather than XPath). -
Corrects expected results for some tests
Issue #847 created #created-847
build-uri() - is {"port":()} legal?
The (only) example in the spec of a build-uri() call specifies {"port":()}
. But uri-structure-record
has
port? as xs:string,
which means that the empty sequence is not a valid value. Either the record structure should be changed to specify the type as xs:string?
, or the example should be changed.
QT4 CG meeting 055 draft agenda #agenda-11-21
Draft agenda published.
Pull request #846 created #created-846
845 Drop mention of tuples
Fix #845
Issue #845 created #created-845
Quantified expressions and "binding tuples" (Editorial)
The section of the XPath specification on Quantified Expressions contains paragraph starting
"The order in which test expressions are evaluated for the various binding tuples"
This is the only place in which "binding tuples" are mentioned in connection with quantified expressions, and in XPath (as distinct from XQuery) it is the only place where tuples are mentioned at all. The paragraph could easily be rewritten to avoid introducing a new concept.
Issue #844 created #created-844
New sequence functions: names
Observations:
A. What about renaming fn:contains-sequence
, fn:starts-with-sequence
and fn:ends-with-sequence
to fn:contains-items
, fn:starts-with-items
and fn:ends-with-items
, in alignment with fn:items-at
, fn:items-before
, etc.? If we add equivalent functions for arrays, it could be array:contains-members
, etc.
B. It seems confusing to have fn:items-starting-where
and fn:items-after
, instead of fn:items-starting-after
. Maybe we can think of alternative (shorter) names for fn:items-starting-where
and fn:items-ending-where
? – I know we’ve discussed before; I raised it again due to user feedback.
Issue #843 created #created-843
Standard, array & map functions: Equivalencies
In many threads (#135, others), we have discussed how to align the functions for sequences, arrays, and maps. This is an attempt to summarize the status quo, and I hope to keep it up-to-date in the coming weeks.
The 4.0 functions are the ones with the keyword new attached. If the function is followed by a question mark, there may be an existing issue for its addition, or it may be consistent to add it.
Please note that the data types have fundamental differences, so it’s not always possible to present or provide exact symmetries.
To be discussed
Functions | Array Functions | Map Functions
--- | --- | ---
fn:contains-subsequence
new: #94, #844 | array:contains-subarray
? | map:contains
fn:ends-with-subsequence
new: #96, #844 | array:ends-with-subarray
? | –
fn:starts-with-subsequence
new: #96, #844 | array:starts-with-subarray
? | –
fn:distinct-values
| array:distinct-members
? | –
fn:duplicate-values
new: #123 | array:duplicate-members
? | –
fn:empty
| array:empty
new: #229 | map:empty
? #827
fn:exists
| array:exists
new: #229 | map:exists
? #827
fn:every
new | array:every
? | map:every
?
fn:some
new | array:some
? | map:some
?
fn:highest
new | array:highest
? | –
fn:lowest
new | array:lowest
? | –
fn:index-of
| array:index-of
? #260 | –
fn:index-where
new | array:index-where
new: #114 | map:keys($m, $pred)
new: #467
fn:items-at
new: #213
→ fn:get
? #872 | array:members-at
? #825array:get
| map:get
fn:intersperse
new: #2
→ fn:join
? #868 | array:join
| –
fn:subsequence-where
new: #878 | array:subarray-where
? | –
fn:substitute
? #553, #583 | array:replace
new;array:substitute
? #583 | map:replace
new;map:substitute
? #583
fn:slice
new | array:slice
new | –
– | array:split
new | map:entries
new
– | array:values
new | map:keys
; map:values
new
– | array:entries
? #826 | map:entries
new
– | array:merge
? #826 | map:merge
– | – | map:entry
– | array:members
new → keep? #826 | map:pairs
new → keep? #826
– | array:of-members
new → keep? #826 | map:of-pairs
new → keep? #826
– | – | map:pair
new: #508 → keep? #826
Settled
Functions | Array Functions | Map Functions
--- | --- | ---
fn:count
| array:size
| map:size
fn:filter
| array:filter
| map:filter
fn:fold-left
| array:fold-left
| –
fn:fold-right
| array:fold-right
| –
fn:for-each-pair
| array:for-each-pair
| –
fn:for-each
| array:for-each
| map:for-each
fn:head
| array:head
| –
fn:insert-before
| array:insert-before
| –
fn:remove
| array:remove
| map:remove
fn:reverse
| array:reverse
| –
fn:sort
| array:sort
| –
fn:subsequence
| array:subarray
| –
fn:tail
| array:tail
| –
– | array:put
; array:append
| map:put
fn:foot
new: #250 | array:foot
new: #250 | –
fn:trunk
new: #250 | array:trunk
new: #250 | –
Issue #91 closed #closed-91
name of map:substitute
Issue #104 closed #closed-104
name of map:replace/array:replace
Issue #699 closed #closed-699
GitHub: Signing
Pull request #842 created #created-842
Improve stylesheet for generating keyword tests
Improves the stylesheet that generates the test set misc/BuiltInKeywords.xml
; specifically, it's smarter about generating acceptable callback functions that won't trigger an unwanted error.
(The generated test calls each function twice, once with positional arguments and once with keyword arguments, and checks that the two results are deep-equal).
Issue #838 closed #closed-838
Collations in F&O examples for functions such as fn:contains()
Issue #839 closed #closed-839
838 Fix collation variable references
Pull request #841 created #created-841
840: Typo in fn:seconds-from-duration example
Addresses #840
Issue #840 created #created-840
Wrong example in fn:seconds-from-duration
The newly added example
seconds-from-duration(
xs:duration("P1Y1D")
)
Result:
1
Looks clearly wrong.
Joel, what did you have in mind?
Pull request #839 created #created-839
838 Fix collation variable references
Fix #838. Editorial.
Issue #838 created #created-838
Collations in F&O examples for functions such as fn:contains()
The examples for a number of functions, such as fn:contains()
, use a UCA collation URI bound to the variable $coll
.
Two problems:
(a) These sections also contain prose saying "The collation used in these examples, http://example.com/CollationA is a collation in which both -and * are ignorable collation units." - but this is not the collation URI actually used. This error was already present in the published 3.1 Recommendation.
(b) The examples that use the variable $coll
do not have a use
attribute, which means that the test cases generated in the test suite do not declare the variable, which means that the tests fail.
Pull request #837 created #created-837
297 Deep Lookup Operator "??" and wildcard qualifier "::"
Adds support for a deep lookup operator "??" as a transitive equivalent of "?", and allows a wildcard lookup X?*
or X??*
to be qualified with the required type of value, for example X??*::record(from, to)
.
Issue #836 created #created-836
Add support for CSV 'dialect' features covered by the OKFN's Frictionless Data CSV spec in `fn:parse-csv` and related functions
The OKFN's Frictionless Data project's CSV Standard specifies some additional things we should take into account for fn:parse-csv
and related functions.
Most important is the option to specify a comment line character, whose presence at the start of a line will cause it to be treated as a comment. Because of the way that rows can span lines, post-processing to extract comments might be impossible in some cases.
Issue #835 created #created-835
Review names of record types
The use of record types as a thing that users will interact with seems to be increasing. There are a number of new types proposed and we should review their names to ensure that we are using a coherent naming scheme for all spec-defined record types, and that the names are good enough - that they explain what they are, and aren't too unwieldy.
Issue #834 created #created-834
Add creation function for `csv-row-record` type
The csv-row-record
type used by the CSV XDM mapping provides its fields and an accessor function that can perform field lookup by index or column name. The column names are set when the CSV is parsed. For users who want to make use of csv-row-record
(when needing to parse CSVs that fn:parse-csv
itself cannot handle out-of-the-box, perhaps) we should provide a creation function that accepts the name: index map and the fields for the row and creates a csv-row-record
with a correctly functioning field
function entry.
Pull request #833 created #created-833
Fix the line endings, force a single lf in text files
Hello,
Per discussion on the list, this PR changes the way git
handles line endings so that text files will always, exclusively have line endings delimited by a single lf
. This should be a largely transparent change for Mac/Unix users. On Windows, it means that checked out files will have lf
line termination. If your favorite editing tool can handle this, then you don't have to care. Even if your editor saves files with cr
/lf
line endings, git
will turn them back into single lf
endings when you commit your changes.
Hat tip to @ChristianGruen for keeping focus on this issue.
This PR has no technical changes, and effects a relatively small number of files. I'll leave it here for a day or two, then I'm going to be inclined to merge it. Object now, if you object :-)
Pull request #832 created #created-832
77 Add map:deep-update and array:deep-update
Note that unlike many of the functions we have added, these are non-trivial: they cannot easily be implemented in XSLT or XQuery.
This is a first cut and I expect some refinement will be needed, but reviews are invited.
I might subsequently propose layering some XSLT syntax on top of this for convenience.
Issue #831 closed #closed-831
Fixed a couple of markup errors
Pull request #831 created #created-831
Fixed a couple of markup errors
I don't really know how these crept in. Perhaps I was negligent in confirming that #828 passed tests in PR? Too late to tell now, but I think I've fixed them.
Issue #516 closed #closed-516
Add position argument to HOF callbacks
Issue #828 closed #closed-828
516 Add position argument to HOF callbacks
Issue #736 closed #closed-736
730: Clarify (and correct) rules for maps as instances of function types
Issue #719 closed #closed-719
413: Spec for CSV-related functions
Issue #554 closed #closed-554
The Transitive Closure function produces an incomplete result, completeness/success and number of actual iterations must also be returned
Issue #754 closed #closed-754
fn:transitive-closure: signature; remarks; too specific?
Issue #761 closed #closed-761
554/754 Simplify the new transitive-closure function
Issue #216 closed #closed-216
fn:unparsed-text: End-of-line characters
Issue #794 closed #closed-794
216: fn:unparsed-text: End-of-line characters
Issue #712 closed #closed-712
array:sort: to be aligned with fn:sort
Issue #823 closed #closed-823
712 Extend array:sort to align with fn:sort
QT4 CG meeting 054 draft minutes #minutes-11-14
Draft minutes published.
Issue #738 closed #closed-738
FO: Why is fn:op under section "17.3 Dynamic loading"
Issue #799 closed #closed-799
Errors in F&O spec examples
Issue #824 closed #closed-824
799 errors in examples; 738 section heading for fn:op
Issue #747 closed #closed-747
QName literals
Issue #743 closed #closed-743
Extend enumeration types to allow values other than strings
QT4 CG meeting 054 draft agenda #agenda-11-14
Draft agenda published.
Issue #830 created #created-830
Revise appendix D.4 of F+O: Illustrative user-written functions
Many of the functions in this non-normative appendix are no longer needed, or can be expressed more concisely using new 4.0 language features.
Issue #829 created #created-829
fn:boolean: EBV support for more item types
In #817, it was discussed that the current EBV semantics have been inspired a lot by XPath 1.0. Today, we have numerous other data types apart from strings, doubles, booleans, and nodes, and I believe it’s time to do justice to this by getting rid of the error for unsupported data types for fn:boolean
.
We currently have:
Type | Rule to compute boolean value
--- | ---
node()
| true()
xs:boolean
| $item != 0 and not(is-NaN($item))
xs:untypedAtomic
, xs:string
, xs:anyURI
| $item != ''
I have two options in mind:
- The easiest solution, which would come closest to JavaScript, would be to return
true()
for all other items. This would allow us to do simple checks like:
declare function local:byte-length($data as xs:basexBinary?) xs:integer {
(: instead of exists($data); utilizes the EXPath Binary Module :)
if($data) then bin:length($data) else 0
};
- If we want to be more fine granular, we could do justice to the specifics of 4 more types:
Type | Rule to compute boolean value
--- | ---
array(*)
| array:size($item) != 0
map(*)
| map:size($item) != 0
xs:base64Binary
xs:hex64Binary
| bin:length($item) != 0
or not($item = (xs:hexBinary(''), xs:base64Binary(''))
It would then be possible to write:
if($map)
instead ofmap:size($map) != 0
ormap:exists($map)
(see #827 for the naming controversy).
Note thatif($map)
will also returnfalse()
is$map
is an empty sequence.if($func) { $func(1, 2) }
instead ofexists($func)
In the last comments of #817, it was addressed that the behavior of existing code may change if errors are replaced by results. I hope we can live with that, as I cannot think of cases in which the EBV computation make sense for items that always raise an error.
Which option do some of you prefer?
Issue #817 closed #closed-817
EBV 4.0
Pull request #828 created #created-828
516 Add position argument to HOF callbacks
I have added positional parameters to the following functions:
array:filter
array:fold-left
array:fold-right
array:for-each
array:for-each-pair
array:index-where
fn:every
fn:filter
fn:fold-left
fn:fold-right
fn:for-each
fn:for-each-pair
fn:index-where
fn:items-after
fn:items-before
fn:items-ending-where
fn:items-starting-where
fn:iterate-while
fn:partition
fn:some
Comments:
- For
fn:every
andfn:some
, the additional parameter seemed useful to me, as a positional variable has been requested for quantifier expressions in the past. - I’ve also added positional variables to folds.
- I’ve unified and simplified the formal XPath/XQuery equivalencies in the rule sets.
- I’ve dropped some XSLT equivalencies because I felt that the XQuery representations are more concise (I certainly won't mind if they're added back).
Closes #516.
Issue #827 created #created-827
map:empty, map:exists ← array:empty, array:exists
We have array:empty
and array:exists
, but no equivalent functions for maps.
I think we have decided to live with the ambiguity (discussed in #229) that map:exists(map {})
will return false
although the “map exists”. Same for arrays.
Issue #826 created #created-826
Arrays: Representation of single members of an array
When introducing the new array features to some users, the for member
syntax was welcomed by everyone.
However, there was some confusion (again, see my past feedback to the mailing list) about what the QT4 group considers to be “members of an array”, and about value records.
In particular, the “value record” representation of arrays led to questions that I didn’t have a good answer for. In particular, people didn’t understand why an array member was returned as a map, and why that map is (again) called “array member” or “value record” – a term no one associated with arrays (at least for now… which somewhat is not surprising, as it has just been introduced).
Next, due to atomization (as mentioned before), array:split
allows us to omit the explicit ?value
lookups that are required for array:members
:
sum(array:members($array)?value)
sum(array:split($array))
I suppose I have been biased in my presentation, but I’ve failed to give good arguments to justify the current solution in the spec. The questions that I think need to be answered are:
- How will people benefit from the (usually intermediate) map representation for array members?
- What exactly do we win with
array:members
andarray:of-members
instead of using the existingarray:join
function, combined with the newarray:split
function?
Out of interest, I have rewritten the formal equivalencies for the array functions with array:split
/array:join
:
array:append
array:of-members((array:members($array), map{'value':$member}))
array:join((array:split($array), array { $member }))
array:build
array:of-members($input ! map { 'value': $action(.) })
array:join($input ! array { $action(.) })
array:filter
array:of-members(array:members($array) => filter(function($m) { $predicate($m?value) })
array:join(array:split($array) => filter(function($m) { $predicate($m?*) })
array:for-each
array:of-members(array:members($array) ! map { 'value': $action(?value) })
array:join(array:split($array) ! array { $action(?*) })
array:for-each-pair
array:of-members(
for-each-pair(array:members($array1),
array:members($array2),
function($m, $n) {map{'value': $action($m?value, $n?value)}}))
array:join(
for-each-pair(array:split($array1), array:split($array2),
function($m, $n) { array { $action($m?*, $n?*) } }))
array:insert-before
array:of-members(array:members($array) => insert-before($position, map{'value':$member}))
array:join(array:split($array) => insert-before($position, array { $member }))
array:remove
array:of-members(array:members($array) => remove($positions))
array:join(array:split($array) => remove($positions))
array:reverse
array:of-members(array:members($array) => reverse())
array:join(array:split($array) => reverse())
array:slice
array:of-members(array:members($array) => slice($start, $end, $step))
array:join(array:split($array) => slice($start, $end, $step))
array:split
array:of-members(array:members($array) => sort($collation, function($x) { $key($x?value) }))
array:join(array:split($array) => sort($collation, function($x) { $key($x?*) }))
array:subarray
array:of-members(array:members($array) => subsequence($start, $length))
array:join(array:split($array) => subsequence($start, $length))
array { $sequence }
array:of-members($sequence ! map { 'value': . })
array:join($sequence ! array { . })
[E1, E2, E3, ..., En]
array:join((map { 'value': E1 }, map { 'value': E2 }, map { 'value': E3 }, ... map { 'value': En }))
array:join((array { E1 }, array { E2 }, array { E3 }, ... array { En }))
$array?*
array:members($array) ! ?value
array:split($array) ! ?*
$array?$N / $array($N)
array:members($array)[$N]?value
array:split($array)[$N]?*
(or array:get($array, $N)
)
As a side note, I noticed that the equivalence given for array:join
must be buggy:
(: current equivalence presented in the spec :)
array:of-members($arrays ! array:members(.))
(: returns [ 1, 2, 3 ] :)
let $arrays := ([ 1 ], [ 2, 3 ])
return array:of-members($arrays ! array:members(.))
Concluding, If I could choose, I would tend to drop array:members
and array:of-members
and rename array:split
to array:members
.
Issue #825 created #created-825
array:members-at
The title says it all.
We have fn:slice and array:slice. We also do have fn:items-at, but we have somehow missed to add the corresponding array:items-at array:members-at function.
We could even think of a function map:entries-at and map:values-for-keys. The first of these would return all map entries that have as keys one of the provided as argument set of keys. The 2nd function would return all values of the map entries that have as keys one of the provided as argument set of keys.
Here is a complete XPath 3.1 implementation:
let $members-at := function(
$input as array( *),
$indexes as xs:integer*
) as array(*)*
{
for $ind in $indexes
return [$input($ind)]
}
Example:
Evaluating this expression:
let $members-at := function(
$input as array( *),
$indexes as xs:integer*
) as array(*)*
{
for $ind in $indexes
return [$input($ind)]
}
return
$members-at([1, (2, 3), (4, 5, 6)], (1, 3) )
produces the wanted result:
[1], [(4,5,6)]
Issue #771 closed #closed-771
British vs. American English
Pull request #824 created #created-824
799 errors in examples; 738 section heading for fn:op
Fix #799 Fix #738
Pull request #823 created #created-823
712 Extend array:sort to align with fn:sort
Fix #712
Issue #480 closed #closed-480
Allow type promotion of xs:string to xs:anyURI
Issue #822 created #created-822
XQuery, XQFO: Edits (pool)
XQuery spec:
- [x] 4.3.4 Context Value Reference → 4.3.4 Context Value References (plural, in alignment with the other expressions)
- [x] Move 4.3.4 before 4.3.3 Parenthesized Expressions (and after 4.3.2 Variable References)
- [x] 4.19 Switch Expression → 4.19 Switch Expressions
- [x] Try/Catch Expressions: There’s no
CatchErrorList
anymore
XQFO spec:
- [x] Unify representation of equivalent examples, implementations, …see https://github.com/qt4cg/qtspecs/pull/828#issuecomment-1807222990
- [x] Add History sections for new functions
- [x]
9^XXX outputs "4451"
→"3185"
- [x] errors are signaled → raised (#783)
Issue #820 closed #closed-820
FLWOR: Variable Bindings, coercion
Issue #821 created #created-821
Annotations: Make default namespace explicit
In XQuery, the default namespace for annotations is http://www.w3.org/2012/
. It’s the only namespace for which no prefix exists, and I think we should change that. ann
feels like a reasonable choice to me as we tend to have short prefixes (such as err
for errors).
Issue #820 created #created-820
FLWOR: Variable Bindings, coercion
In the current XQuery 4, the coercion rules are applied to variable bindings of FLWOR expressions:
https://qt4cg.org/specifications/xquery-40/xquery-40-diff.html#id-binding-rules
I believe this is yet another feature that needs to be formally accepted. If it has already been done so, this issue can be closed again immediately (maybe with a reference to the related issue, or the associated QT4 meeting).
Issue #65 closed #closed-65
Support using different input/output element namespaces
Issue #238 closed #closed-238
Support Invisible XML
Issue #789 closed #closed-789
Serialization spec: terminology
Issue #807 closed #closed-807
789 Serialization terminology [editorial]
Issue #791 closed #closed-791
238: First draft of an fn:invisible-xml function
Issue #130 closed #closed-130
New super/union type xs:binary?
Issue #815 closed #closed-815
130,480 Binary Promotion
Issue #772 closed #closed-772
Revise the fn:parse-html rules to make them clearer to follow.
Issue #809 closed #closed-809
Placement of fn:atomic-equal in the specification
Issue #813 closed #closed-813
809 Move fn:atomic-equal to section 14.2
Issue #806 closed #closed-806
566 A few minor fixes for parse-uri
Issue #804 closed #closed-804
Minor edits, XQFO chh. 7, 8
Issue #651 closed #closed-651
fn:log → fn:message
Issue #803 closed #closed-803
651: fn:log → fn:message
Issue #801 closed #closed-801
nondeterministic vs non-deterministic
Issue #802 closed #closed-802
801: non-deterministic → nondeterministic
Issue #660 closed #closed-660
Static functions, default parameters
Issue #800 closed #closed-800
660: Static functions, default parameters, XPST0017
Issue #797 closed #closed-797
Edits to parse-uri()
Issue #704 closed #closed-704
Context Value Expression → Context Value Reference
Issue #793 closed #closed-793
704: Context Value Expression → Context Value Reference
Issue #819 closed #closed-819
Fix markup error in example
Pull request #819 created #created-819
Fix markup error in example
Issue #792 closed #closed-792
783 XSLT: errors are raised
Issue #790 closed #closed-790
129 XSLT40 and SER40 changes for context item -> value
Issue #775 closed #closed-775
517: Reflected Christian Gruen's remarks
QT4 CG meeting 053 draft minutes #minutes-11-07
Draft minutes published.
Issue #756 closed #closed-756
JSON serialization - number formatting
Issue #818 created #created-818
Foxpath integration
This is a placeholder issue for Syd Bauman’s suggestion on Slack to integrate Foxpath, or parts of it, in the standard.
Issue #817 created #created-817
EBV 4.0
Yes, I dare to question the semantics of effective boolean values. The reason is that I never learned to fully like them. It seems obvious where the rules come from, and why they have been reasonable in previous versions of the language. From today’s perspective, I think there’s really some need to simplify and unify the rules, and I believe it’s possible with little effort and without endangering backward compatibility (provided that we are willing to drop errors and return results).
Some examples for the somewhat strange nature of the current rules:
boolean((<_>x</_>, <_>y</_>))
returnstrue
, whereasboolean(('x', 'y'))
raises an error.boolean(xs:NCName('x'))
returnstrue
, whereasboolean(xs:QName('x'))
raises an error.boolean((<a/>, 1))
andboolean((1, <a/>))
may either returntrue
or raise an error, depending on the implementation.
I believe it will make much more sense to
- check all values of the input equally (in analogy to the existential semantics of general comparisons), and
- use existence checks for more types instead of raising a clueless error.
The semantics would be tidied up a lot, it could look like this…
declare function ebv($input as item()*) as xs:boolean {
some $item in $input satisfies typeswitch($item) {
case xs:untypedAtomic | xs:string | xs:anyURI return $item != ''
case xs:numeric return $item != 0
case xs:boolean return $item
default return true()
}
};
…or, if we include more types, like this:
declare function ebv($input as item()*) as xs:boolean {
some $item in $input satisfies typeswitch($item) {
case xs:untypedAtomic | xs:string | xs:anyURI return $item != ''
case xs:numeric return $item != 0
case xs:boolean return $item
case xs:base64Binary return $item != xs:base64Binary('')
case xs:hexBinary return $item != xs:hexBinary('')
case array(*) return array:size($item) != 0
case map(*) return map:size($item) != 0
default return true()
}
};
(If we believe that it’s too progressive to accept all types, we could still raise an error for some specific types… although I don’t think that anyone would benefit from this choice).
As a result, EBV checks could also be used to check more than one item:
(: true if at least one tokenized string is non-empty :)
if(tokenize('a/', '/')) then ...
(: true if at least one number is unequal to 0 :)
if($numbers) then ...
(: true if at least one Boolean is true :)
if(false(), true(), true()) then ...
Nothing would change for the classical EBV checks: if($node/*)
, if($x = $y)
, if($ok)
, …
Regarding “1. check all values of the input equally”, one could argue that this might affect performance. I don’t actually think so: For node sequences, it will still be sufficient to retrieve only the first item. For mixed-type sequences, errors were raised in the past.
The resulting EBV could be easily combined with revised predicate semantics (#816).
Issue #816 created #created-816
Predicates: Support for numeric sequences
Predicates provide a compact syntax for positional access to sequences but only single numbers are supported.
It would be handy to allow E[1, 2, 3]
(E[3, 2, 1]
, E[1 to 3]
, etc.) as a shortcut for E[position() = (1, 2, 3)]
(MarkLogic offers this possibility, if I remember correctly). We shouldn’t change the EBV syntax, and we should continue raising an error if the predicate sequence contains items other than numbers. Examples:
Expression | Result
--- | ---
(1 to 5)[2, 3]
| 2, 3
(1 to 5)[3, 2]
| 2, 3
(1 to 5)[2 to 3]
| 2, 3
(1 to 5)[6, 'x']
| error
(1 to 5)[1 to 5, 'x']
| 1, 2, 3, 4, 5
or error (up to the implementation, similar to sequences that start with a node)
I bet this has already been discussed in the past…
QT4 CG meeting 053 draft agenda #agenda-11-07
Draft agenda published.
Issue #538 closed #closed-538
480: Attempt to allow xs:string to be 'promoted to' xs:anyURI
Pull request #815 created #created-815
130,480 Binary Promotion
Introduces mutual promotion between xs:base64Binary and xs:hexBinary. Fix #130.
Reorganises the material on type promotion. Supersedes PR #538.
Issue #814 created #created-814
XSLT: Rules for on-no-match="shallow-copy-all"
There are some details missing for the error handling of the built-in template processing for shallow-copy-all
.
When the built-in template processes an array, the built-in rule constructs a value record for each member, and applies-templates to this value record, expecting the result to be a value record.
- It's not stated whether the "value record" is extensible, that is, whether it can contain fields other than "value"
- No error code is given, and it's not identified explicitly as a type error.
Similarly when a map is processed, the result is expected to be a key-value record, but the details of the error are not spelled out.
Pull request #813 created #created-813
809 Move fn:atomic-equal to section 14.2
Fix #809
Issue #812 created #created-812
Coercion Rules: Unifications
It has always been a challenge to teach the difference between a conversion, coercion, promotion, casts and treats. Of course, we cannot get rid of the complexity, but I think it’s a good step forward that the conversion rules have recently been unified in the specification.
Maybe we can push it even further. I would suggest…
- renaming “Coercion Rules” to “Type Coercion” (…most other sections contain rules as well),
- making “Function Coercion” a subsection of “Type Coercion”, and
- making “Type Promotion” another subsection of “Type Coercion”.
I’m the wrong person to decide this, but maybe the full section can be moved to the Appendix, as it’s referenced all around the documents.
Issue #811 closed #closed-811
Highlight changed functions in the ToCs and headings
Pull request #811 created #created-811
Highlight changed functions in the ToCs and headings
Obviously, I should have done icons for functions that have changed as well. So now I have. I'm not convinced that the 🆙 emoji is sufficiently different from the 🆕 emoji. Maybe I should try to make them different colors or something. But it's a start.
Issue #810 closed #closed-810
Fix line endings
Pull request #810 created #created-810
Fix line endings
This PR adds a .gitattributes
file that identifies some files explicitly as text files (and others explicitly as binary files).
After this PR is merged, I believe it will be the case that end-of-line handling will be correct on a per-platform basis. That is, if a Windows user checks out the repository, all the files will have PC-style line endings (CR followed by LF). If a Mac or Unix user checks out the repository, all the files will have Unix-style line endings (LF).
Commits should "do the right thing" to preserve the line endings appropriately.
Fingers crossed!
Issue #809 created #created-809
Placement of fn:atomic-equal in the specification
On the XML.com slack, Pieter Lamers observes that fn:atomic-equal
looks a little out of place in chapter 18 with the other map functions.
Issue #808 closed #closed-808
Tweaks to highlight new functions in F&O
Pull request #808 created #created-808
Tweaks to highlight new functions in F&O
Following a discussion on the XML.com slack, this is a little lunchtime hackery...
Any function listed in the new-functions
section of the changes appendix or identified with an ednote
that contains the string New in 4.0
will be marked as 🆕 in the specification. The 🆕 occurs in the ToC, the drop-down function list, and to the left of the section title.
(I'm just going to merge this as it has no spec changes and the effect won't be visible in the PR anyway.)
Pull request #807 created #created-807
789 Serialization terminology [editorial]
"Instance of the data model" generally becomes "input tree"
An instance of the data model used to hold serialization parameters is now referred to as a "parameter document".
Fix #789
Pull request #806 created #created-806
566 A few minor fixes for parse-uri
As CG observes in issue 566, there are still a couple of small problems with fn:parse-uri()
.
- The regular expressions used to parse the fragment identifier and query are incorrect. The URI specification allows
?
to appear in a query string and#
to appear in a fragment identifier, so the expressions have been rewritten to match everything after the first?
and#
, respectively, even if they contain additional?
or#
characters. - The description of how to interpret the regular expression for parsing a Windows file path preceded by slashes was incorrect. It caused the leading "/" to be lost. That's been corrected.
Issue #805 closed #closed-805
Improve formatting of FO examples
Pull request #805 created #created-805
Improve formatting of FO examples
Stylesheet changes to improve the formatting of F&O examples. (But they still aren't perfect...!)
-
Variables used by the examples are pulled out under a separate heading.
-
If any examples include long lines, then single-column ("wide") format is used automatically.
-
In two-column layout, if the left-hand column uses "eg" format, then so does the right-hand column. (But it would be nice to get rid of the grey background for this case).
Pull request #804 created #created-804
Minor edits, XQFO chh. 7, 8
In addition to a minor typo, note the following:
- rearrangement of a rule for
$input
(the original syntax, mixing types and derived types, confused me) - examples in
X-from-Y
duration functions to help illustrate the problem when mixing the two components of a duration.
Issue #753 closed #closed-753
65: Allow xmlns="xxx" to NOT change the default namespace for NameTests
QT4 CG meeting 052 draft minutes #minutes-10-31
Draft minutes published.
Issue #770 closed #closed-770
566: Use fn:decode-from-uri in fn:parse-uri
Issue #469 closed #closed-469
array:of-members, map:of-pairs: Signatures, Examples
Issue #782 closed #closed-782
469: array:of-members, map:of-pairs: Signatures, Examples
Issue #778 closed #closed-778
XQFO edits 5.4-5.6
Issue #784 closed #closed-784
fos xsd
Issue #785 closed #closed-785
777: updated history
Issue #786 closed #closed-786
695: Added xref to fn:slice()
Issue #787 closed #closed-787
783(part) - Editorial changes to Serialization spec
Pull request #803 created #created-803
651: fn:log → fn:message
Closes #651
Pull request #802 created #created-802
801: non-deterministic → nondeterministic
Closes #801
Issue #801 created #created-801
nondeterministic vs non-deterministic
In the specs, both “nondeterministic” and “non-deterministic” can be found. The first one is more frequent, so I guess it’s the one that’s preferred.
Issue #359 closed #closed-359
fn:void: Absorb result of evaluated argument
Pull request #800 created #created-800
660: Static functions, default parameters, XPST0017
I’ve chosen XPST0017 over XPST0003 (it felt more intuitive to me).
Closes #660.
Issue #799 created #created-799
Errors in F&O spec examples
In fn:expanded-QName, http:/example.com
should be http://example.com
.
In map:merge, there's a stray "(" in
map:merge((
($week, map { 7: "Unbekannt" })
)
Pull request #798 created #created-798
479: fn:deep-equal: Input order
I decided to follow my initial suggestion and add unordered
to fn:deep-equal
instead of adding a separate function for it. My motivation:
- It’s convenient to be able to use the new option in combination with the other options.
- It will be used more often than many other options we’ve added recently.
- It seemed pretty straightforward to add, both in terms of documentation and implementation.
Closes #479
Pull request #797 created #created-797
Edits to parse-uri()
Attn @ndw -- these edits pertain to 6.6.1 fn:parse-uri
and require your careful review.
hierarchical
option appears in the rules of the function but not its preamble- "This function is described..." moved to anticipate the actual narrative.
- I was a bit thrown by the phrase "This approach...is not implementation advice" because "approach" is vague and the reader is left with the impression that a good deal of something that follows is non-normative. We offer non-normative prose in the rules, but that informal prose is normally followed by an equivalent description that is normative. I did not make any edits in this area, but I would recommend clarification as to exactly what parts of the narrative are normative and which are non-normative.
- The period-to-colon change at 27573 is to make sure the reader remains aware of the governing "If..." clause.
- I deleted a paragraph that attempted to distill the rather complicated description of
fn:decode-from-uri
. A mere xref provides a cleaner narrative here and a more accurate description of uri decoding, and the xref is within the same document, so should not pose a burden on the reader. - For the
query-parameters
there was a discrepancy in the data model presented (in the preamble versus in the function rules), and I opted for the array-of-maps model. If it's supposed to be the other way (simple map), let me know and I'll revise. query-segments
versusquery-parameters
; I went with the latter.
Let me know where things ain't right.
Issue #796 created #created-796
allow explicit type expressions in XPath variable bindings
It would be useful to be able to write
let $n as xs:integer := some_expr ....
and also maybe
for $s as xs:string, $p as my:name return....
This might occur e.g. in the body of a function, for example... and would help type safety and debugging.
Pull request #795 created #created-795
655 fn:sort-with
Closes #655
Pull request #794 created #created-794
216: fn:unparsed-text: End-of-line characters
Closes #216. See this issue for more details on the proposed change.
Pull request #793 created #created-793
704: Context Value Expression → Context Value Reference
Closes #704
Pull request #792 created #created-792
783 XSLT: errors are raised
See issue #783.
Errors are raised, not signalled or reported or generated or thrown.
Pull request #791 created #created-791
238: First draft of an fn:invisible-xml function
Pursuant to action QT4CG-051-02, here is a proposal for fn:invisible-xml
.
There are a number of different design choices that could be made. For example, one could argue that a function of the form fn:invisible-xml($grammar as xs:anyURI, $input as xs:anyURI)
would be the easiest thing for users in many cases. But not in all cases. You might want versions where either the grammar or the input were strings instead of URIs.
I've attempted to craft the smallest proposal that could get the job done.
Pull request #790 created #created-790
129 XSLT40 and SER40 changes for context item -> value
Proposes XSLT 4.0 and Serialization 4.0 changes resulting from the generalization of context item to context value in XPath 4.0.
The serialization changes are purely editorial.
In XSLT, we acknowledge the introduction of the context value in XPath but don't take advantage of it; at the XSLT level, the context value for an instruction is still always a single item. The only technical change is that we allow xsl:evaluate to pass any value as the context value to the dynamically evaluated expression (while retaining the name of the relevant attribute "context-item").
Fulfils action QT4CG-046-01
QT4 CG meeting 052 draft agenda #agenda-10-31
Draft agenda published.
Issue #789 created #created-789
Serialization spec: terminology
The serialization spec makes extensive use of the phrase "an instance of the data model". This phrase is defined to be a synonym of "value" or "sequence", and it seems to add a lot of words without adding any clarity. In many cases the context makes clear that it is actually referring to a tree rooted at a document node.
In section 3.1 "Setting Serialization Parameters by Means of a Data Model Instance" it would be helpful to use a more specific phrase, for example "a parameter document".
Elsewhere the phrase is often used to mean "the value being serialized", and again, it would be helpful to use a more specific phrase, perhaps a term that is quite distinctive such as "the payload". (It's sometimes referred to as "the result tree", but that assumes that the value being serialized is the result of a query or stylesheet.)
Issue #788 created #created-788
New function fn:annotate()
I propose a function fn:annotate() which will add annotations to a function item. It will create a new function item that differs from the original only in its annotations.
Currently annotations are a dynamic property of function items, but they can only be set in very limited ways. Allowing them to be set dynamically creates lot of opportunities. For example, there is currently no way to set annotations on a map or array, but one could define annotations, for example, to indicate that a map should hold entries ordered by key, or that an array should use a "sparse" implementation; there is scope both for spec-defined and vendor-defined (and perhaps even user-defined) annotations.
Some of the possibilities this would open up have been outlined in other issues. At present, though, we have a capability in the data model that is underexploited, and an fn:annotate()
function is a conceptually simple addition, which can be justified purely on the grounds of completeness - if something exists in the data model, surely it should have getters and setters?
Issue #635 closed #closed-635
451: Schema compatibility
Issue #765 closed #closed-765
XQuery version declaration: upgrade to 4.0
Issue #766 closed #closed-766
765 Update version references etc to 4.0 status
Pull request #787 created #created-787
783(part) - Editorial changes to Serialization spec
- Errors are raised, not signaled
- Cross-references point to 4.0 specs rather than 3.0/3.1
- Some internal markup changes for tidiness
Partial fix for #783 as it affects the serialization spec.
Issue #695 closed #closed-695
Step in RangeExpression
Pull request #786 created #created-786
695: Added xref to fn:slice()
This closes #695 .
Issue #777 closed #closed-777
new replace() parameter $action
Pull request #785 created #created-785
777: updated history
Closes issue #777
Pull request #784 created #created-784
fos xsd
xsd:assert is illegal in schema version 1.0; updated to specify version 1.1.
Issue #783 created #created-783
Editorial: errors are raised (not reported, signaled, generated, or thrown).
The most common phrase we use when describing an error condition is "A (dynamic|static|type) error is raised if ...".
There are other cases where we use the verb "reported" or "signaled" (and occasionally, "thrown" or "generated"). We should avoid these verbs, partly in the interests of consistency, and partly because they are misleading: an error is not reported if it is caught by a try/catch, but it is still raised.
XSLT also commonly uses the phrase "It is a (dynamic|static|type) error if ... " which is also acceptable.
Pull request #782 created #created-782
469: array:of-members, map:of-pairs: Signatures, Examples
I’ve eventually renamed $pairs
to $input
(I didn’t rename $input
to $members
as initially suggested, as we have $member
parameters in other functions that are of type item()*
, not record(value as item()*)
).
Closes #469
Issue #776 closed #closed-776
/etc/XT40 does not get built
Issue #781 closed #closed-781
Fix etc_XT40 output file
Pull request #781 created #created-781
Fix etc_XT40 output file
Fix #776
You'll need to pull and rebase master
after I merge this.
Issue #764 closed #closed-764
XQuery: Simplify module imports
Issue #780 created #created-780
format-number() etc incompatibility
We have changed format-number() and some XSLT functions including system-property(), function-available() etc so that the QName-valued argument is now declared with type union(xs:string, xs:QName) rather than xs:string. I believe that there are edge cases where this is an incompatible change -- for example if the supplied value is an xs:anyURI value. I think the edge cases are probably sufficiently obscure and unlikely to occur in practice that it is sufficient to document them.
Issue #779 created #created-779
Hash/checksum function
I propose a new function for the core XPath functions, here called fn:hash()
for the sake of discussion. The goal is to give XPath users access to CRC, checksum, and cryptographic hash functions.
Rationale
Simple checksums functions, such as those from the Fletcher family, are relatively easy to write in a host language such as XSLT. More complex ones are far more challenging to write, and may incur serious performance penalties. For example, from the TAN function library, see the MD5 checksum/hash functions. (Yes, one day I thought it would be fun to try to implement the MD5 algorithm in XSLT.) Most programming languages in which an implementation is written have access to cryptographic libraries that are highly performative.
Hash functions are widely used, and certainly important in XML-based workflows, whether as filenames, database fields, etc.
The closest comparable existing functions is generate-id()
, but this was designed as an identifier for nodes.
In short, I believe there is a significant need that outstrips current functionality.
In the draft below, I have adopted only the MD-5 algorithm as a core requirement, to catalyze discussion. I have assumed the user wants the string form of the output, not the raw bits. I have not tried to flesh out prose that would warn users away from security complacency.
For discussion, here is a list of relevant algorithms. I look forward to community feedback.
fn:hash
Draft Specification
Summary
This function takes as input a string or octet sequence and returns a string representation of the results from a specified hash, checksum, or cyclic redundancy check function.
Signature
fn:hash(
$value as union(xs:string, xs:hexBinary,
xs:base64Binary)? := fn:string(.),
$algorithm as xs:string := "md5"
) as xs:string?
Properties
The zero-argument form of this function is deterministic, context-dependent, and focus-dependent.
The one- and two-argument form of this function is deterministic, context-independent, and focus-independent.
Rules
If the zero-argument version of the function is used, the result is the same as calling the one-argument version, with $value
set to fn:string(.)
.
If the one-argument version of the function is used, the result is the same as calling the two-argument version, with $algorithm
set to "md5".
The effective value of $algorithm
is the value of the expression fn:lower-case(fn:replace($function, '\W+', ''))
.
If $value
is the empty sequence, or a string of zero length, the function returns the empty sequence.
If $value
is an instance of xs:string
it is cast to xs:hexBinary
on the basis of UTF-8 encoding. If $value
is an instance of xs:base64Binary
it is cast to xs:hexBinary
.
The function returns an xs:string
representation of the bytes returned by passing the xs:hexBinary
value of $value
as an octet sequence through the specified hash or checksum function.
Conforming implementations MUST support md5
and the associated MD5 Message-Digest algorithm defined by RFC 6151 (update to RFC 1321). They MAY support other checksum and hash functions with implementation-defined semantics.
Error Conditions
A dynamic error is raised [err:XXXXXXX] if the effective value of $algorithm
is not one of the values supported by the implementation.
Notes
- The MD5 algorithm is normally not used for cryptographic purposes. [More cautionary prose about not assuming that something can be trusted as secure.]
Examples
Expression | Result -- | -- fn:hash("abc") | 900150983cd24fb0d6963f7d28e17f72 fn:hash("ABC") | 902fbdd2b1df0c4f70b4a5d23525e932
Pull request #778 created #created-778
XQFO edits 5.4-5.6
Light edits for consistency.
- Examples in
char()
have been reordered so that the most interesting and useful examples are at top. - Substring functions were difficult to read in the specs because of the overly long collation URI. These are now bound to a variable, to enhance legibility.
- Schema location fixed.
- extra examples in
string-length()
, to help users recognize the role of combining characters.
Issue #777 created #created-777
new replace() parameter $action
In the current draft for replace() the history log states that the new parameter $action
has not yet had community review. This ticket provides a placeholder for attention and discussion.
If the CG has already reviewed and accepted $action
, comment here and I will update the function history accordingly.
Issue #776 created #created-776
/etc/XT40 does not get built
The serialisation spec has xspecref spec="XT31"
references that need to be updated to XT40
. But for some reason the /etc/XT40
file isn't being built, so these references fail to resolve.
Pull request #775 created #created-775
517: Reflected Christian Gruen's remarks
I deleted my fork, got a new one and applied the latest changes. Seems this fixed the issues.
Issue #773 closed #closed-773
QT4CG-051-04/05: Reflected today's meeting editorial suggestions
Issue #768 closed #closed-768
Details about decode-for-uri
Issue #769 closed #closed-769
768: details about decode-from-uri
Issue #517 closed #closed-517
fn:chain (before: fn:multi-compose)
Issue #758 closed #closed-758
XQFO UCA keyword strength, quaternary setting
Issue #686 closed #closed-686
XQFO presentation of diagnostic functions
Issue #774 created #created-774
What should be percent-encoded in a URI?
(This is related to fn:parse-uri
, fn:build-uri
, and fn:decode-from-uri
. I'm making it a distinct issue to call it out and see if we can get consensus on the right answer. I've come to the conclusion that what I've implemented isn't justified by any specific reading of the relevant specifications, so it's wrong.)
This question is slightly tricky because encoding (or not encoding) characters can change the meaning of the URI.
If you trace your way through the ABNF in RFC 3986 you eventually get to:
path-abempty = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty = 0<pchar>
The various segment
nonterminals boil down to some number of pchar
. (The segment-nz
form is used to forbid a zero length string before the first /
; the segment-nz-nc
form is used for a URI that does not begin with a scheme: it must have a non-zero length string before the first /
that additionally must not contain a :
.)
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
Where:
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded = "%" HEXDIG HEXDIG
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
I think it follows that all characters except the following must be encoded:
Upper-and lower-case A
to Z
, the digits 0
to 9
, -
, .
, _
, ~
, %
, !
, $
, &
, '
, (
, )
, *
, +
, ,
, ;
, =
, :
, and @
.
or, conversely, that any characters other than those must be encoded.
Observe that /
isn't among the characters that are not encoded. That's because the /
in hierarchical URIs divides the segments. It's part of the URI syntax. That's why a literal forward slash that appears somewhere in an actual path segment must be encoded %2F
and it's why causually unencoding such a character changes the URI.
Observe also that there's no provision here for encoding a space with +
, even though it's fairly common. I've carried that error through to some of the test results for fn:build-uri
and fn:parse-uri
. I'll fix those tests.
I'm going to try changing my implementation to follow the rule above and see what happens.
If you think this analysis is incorrect, please explain where I went wrong.
Pull request #773 created #created-773
QT4CG-051-04/05: Reflected today's meeting editorial suggestions
All 3 editorial suggestions made in today's meeting are now reflected.
Pull request #772 created #created-772
Revise the fn:parse-html rules to make them clearer to follow.
This actions the comment in #767 that the rules for the function are unclear.
The SVG element name handling is correct in the spec per the comments in https://github.com/qt4cg/qt4tests/issues/57, so don't need changing.
QT4 CG meeting 051 draft minutes #minutes-10-24
Draft minutes published.
Issue #734 closed #closed-734
517: fn:chain
Issue #763 closed #closed-763
686: XQFO diagnostic function documentation
Issue #762 closed #closed-762
758: XQFO minor edits 3
Issue #653 closed #closed-653
XQuery - option to suppress entity expansion
Issue #749 closed #closed-749
653: Add string literals E".." and L".." to control entity expansion
Issue #647 closed #closed-647
XQuery: import schema with multiple location hints
Issue #659 closed #closed-659
647: schema location hints
Issue #383 closed #closed-383
fn:deep-equal: Order of child elements (unordered-elements)
Issue #771 created #created-771
British vs. American English
Purely editorial: Some words in the specification seem to be British English (organised, generalised, behaviour), whereas the majority of the text is American English. Should this be fixed?
I assume there are tools to get this straight? I am too rarely affected by it to know more about it…
Pull request #770 created #created-770
566: Use fn:decode-from-uri in fn:parse-uri
I think this is sufficient to close issue 566. @ChristianGruen could you let me know if you agree?
Pull request #769 created #created-769
768: details about decode-from-uri
Closes #768
Issue #768 created #created-768
Details about decode-for-uri
I'm unclear on how %XA
should be decoded. Is that %X
(an error) followed by A
, or is that %XA
(an error)?
Also, a couple of typos:
s/to an sequence/to a sequence/
s/that are no hexadecimal/that are not hexadecimal`
QT4 CG meeting 051 draft agenda #agenda-10-24
Draft agenda published.
Issue #767 created #created-767
parse-html(): case of SVG element names
Gunther Rademacher points out at https://github.com/qt4cg/qt4tests/issues/57 that the expected test results for parse-html() expect SVG elements to be output in lower case; and as far as I can see, this is consistent with the spec. But is it useful? It means we're producing XML that isn't valid according to the SVG schema, and that presumably might be rejected by tools that expect to handle valid SVG.
In passing, I note that it's very hard to work out from the parse-html() spec what the function actually does; it's somehow written as if it's obvious, but it never gets around to actually saying it explicitly. It also doesn't say what happens when the method
option is omitted, as it is in these tests.
In addition, it seems that we don't have any tests that exercise the various options of the function, e.g. the ability to consue binary as well as character input.
Pull request #766 created #created-766
765 Update version references etc to 4.0 status
Includes the following changes:
- Clarification of the syntax and semantics of the version number in the XQuery version declaration
- Updates to front and back matter to reflect the current status of the 4.0 specs
- Update cross-spec references to point to the 4.0 specs rather than 3.1 specs
- A DTD change from spec="SER40" to spec="SE40" to reflect the name of the /etc file generated by the gradle script.
Fix #765
Issue #765 created #created-765
XQuery version declaration: upgrade to 4.0
The text at §5.1 does not yet acknowledge version="4.0".
We should also be a bit more precise about the syntax, which currently says that the version number is two integers separated by a dot. We should give a regular expression to be absolutely clear that we mean decimal integers, no underscores allowed.
There are a few other version references to tidy up at the same time, e.g. "XQuery is designed to meet the requirements identified by the W3C XML Query Working Group [[XQuery 3.1 Requirements]]."
Issue #764 created #created-764
XQuery: Simplify module imports
With XQuery, you can import library modules as follows:
import module namespace utils = 'http://project.org/utils' at 'org/project/utils.xq`;
utils:action()
If an implementation has a mechanism to resolve a namespace URI, you can also do this:
import module namespace utils = 'http://project.org/utils`;
utils:action()
It would be desirable to simplify this further, and be able to do one of…
(: already legal, but no namespace binding takes place :)
import module namespace 'http://project.org/utils';
(: not supported yet :)
import module namespace http://project.org/utils;
(: no URI rewriting needed for relative paths :)
import module namespace org/project/utils;
utils:action()
…and…
- rewrite namespace URIs to local paths (this is how we proceed: https://docs.basex.org/wiki/Java_Bindings#URI_Rewriting)
- extract the prefix from the path.
I can imagine that the current ways to simplify imports depend a lot on the used implementations. Suggestions are welcome.
Issue #759 closed #closed-759
Serialization: JSON Parameters
Pull request #763 created #created-763
686: XQFO diagnostic function documentation
I think what prompted my original ticket was the use of "output". That term is used scores of times in the specs, and it always implies the output of a function within the XPath expression. Although the term is qualified in the diagnostic functions with the adjective "trace" or "log", it's tempting to think of the function's actual output. Coming back a few weeks later, I think some gentle massaging (instead of a convoluted preamble) can help readers not make the requisite adjustment, reflected in the present PR.
I have offered two examples that I hope show the practical side of these functions. I might have missed something though, so please chime in.
Pull request #762 created #created-762
758: XQFO minor edits 3
This covers edits to XQFO 4 through 5.3, and incorporates the suggestion in #758
Issue #746 closed #closed-746
break-when -> split-when in fn:partition
Pull request #761 created #created-761
554/754 Simplify the new transitive-closure function
Drops the $min
and $max
parameters, with the effect that this corresponds much more closely to the general computer-science definition of a transitive closure.
The function now computes the set of nodes delivered by the transitive closure of the supplied $step
function when applied to a given $start
node, rather than returning a function item that must then be applied to the chosen $start node. This is hopefully easier for most users to understand, and does not lose any useful functionality.
The $min
parameter of the old function is effectively forced to its default value of 1, and the $max value to its default of infinity.
Fix #754 Fix #554
This PR addresses the main points of #554 in making the function correspond more closely to the mathematical (or at least the computer-science) definition of transitive closure. It doesn't implement other ideas in #554, like returning the depth of search alongside the actual closure. That's because I believe in the principle that wherever possible a function should do one thing in as simple a way as possible.
Issue #760 created #created-760
Serialize functions: consistency
We should be more ambitious about ensuring consistency when serializing data back to its original representation, and how to achieve it. This is the status quo:
Format| Serialize Function | Parse Function
--- | --- | ---
XML | fn:serialize($input)
| fn:parse-xml
JSON | fn:serialize($input, map { 'method': 'json' })
| fn:parse-json
JSON | fn:xml-to-json($input)
| fn:json-to-xml
XHTML | fn:serialize($input, map { 'method': 'xhtml' })
| fn:parse-html
HTML | fn:serialize($input, map { 'method': 'html' })
| fn:parse-html
CSV | still missing | fn:parse-csv
and variants
In BaseX, the XML created by fn:json-to-xml
can already be serialized back to JSON as follows (related: #759):
serialize(
<map xmlns="http://www.w3.org/2005/xpath-functions">
<number key="A">1</number>
</map>,
map { 'method': 'json', 'json': map { 'format': 'basic' }
})
For CSV, we’ve introduced a CSV serialization method that supports all CSV flavors we support (see CSV Module for more details):
serialize(
<map xmlns="http://www.w3.org/2005/xpath-functions">
<number key="A">1</number>
</map>,
map { 'method': 'csv', 'csv': map { 'format': 'direct' }
})
Related (parse functions): #748
Issue #759 created #created-759
Serialization: JSON Parameters
We have more and more serialization parameters that are specific to JSON (related: #530, #756, #576, #641). It won’t get better with each new option we add, so maybe it’s time to introduce a custom serialization parameter for JSON (and possibly other methods):
serialize(
map { "abc": 123 },
map { 'method': 'json', 'json': map { 'escape-solidus': true(), 'number-format':'#' }
})
If we want to be able to use output options in the XQuery prolog, we additionally need to define a syntax for serializing the options to a single string:
declare namespace output = 'http://www.w3.org/2010/xslt-xquery-serialization';
declare option output:method 'json';
declare option output:json 'escape-solidus=yes,number-format=#';
map { "abc": 123 }
Both suggestions are inspired by our own implementation; we use it for our custom JSON and CSV options.
Issue #706 closed #closed-706
FLWOR: for member $m1 in $a1, member $m2 in $a2
Issue #752 closed #closed-752
706: Fix "for member" grammar problems
Issue #750 closed #closed-750
xsl:mode/@as and built-in template rules
Issue #751 closed #closed-751
QT4CG-048-01: xsl:mode/@as with built-in templates
Issue #740 closed #closed-740
QT4CG-047-01: Rename break-when to split-when, plus minor editorial cleanup
Issue #758 created #created-758
XQFO UCA keyword strength, quaternary setting
In the XQFO specs, 5.3.3, the description of keyword strength
, option quaternary
, is I think confusing/misleading:
quaternary considers spaces and punctuation that would otherwise be ignored (for example
data-base
=database
).
I propose the following:
"quaternary always considers as significant spaces and punctuation (data-base≠database; if maxVariable
is punct
or higher and alternate
is not non-ignorable
, lower strengths will treat data-base=database)."
That's a lot more words, but I think more accurate. And it may help the reader become familiar with the other two keywords.
Editors' input is welcome.
QT4 CG meeting 050 draft minutes #minutes-10-17
Draft minutes published.
Issue #757 created #created-757
Function families
We talked on the call today about the tension between defining multiple simple functions focussed on one task, and a small number of omnibus functions that have many different options.
I think we would all agree that multiple simple functions would be the better choice except for the problem that they all end up going into a single global namespace. So the question becomes, how can we better partition the name-space (using the term deliberately with a hyphen).
We're reluctant to use the namespace mechanism to partition our function library because namespaces are cumbersome and clutter the code with lots of boilerplate; declaring namespaces for binding function libraries also has side-effects for example on the semantics of element constructors.
One approach would be to build on the idea that @dnovatchev presented of using maps containing anonymous functions, so for example csv()?parse()
would first call fn:csv() to load a family (or library) of functions, of which one is then selected for execution. This works, but I don't think it's a perfect solution; for example static analysis becomes a lot more difficult, and we don't get the benefits of default parameters and keyword arguments.
Most languages use hierarchic names with "." as a separator. Although XML names allow "." as a regular character, none of our built-in function names currently use it as such. So it would be entirely possible to adopt a convention where names like csv.parse() etc are used to name functions in a function family referred to as "csv". This wouldn't by itself require any language changes.
But if we adopted this convention, we could build on it to provide usability tweaks that make a large function library easier to manage. For example, we could put the math functions into the fn namespace with names like fn:math.sin(x)
, and then provide a way of binding a namespace prefix to a subtree of the fn namespace, so math:sin becomes a synonym for fn:math.sin(). The immediate benefit is that the namespace prefix doesn't need to be declared unless people want to use it. We could also then consider defining an algorithm for searching the fn namespace for abbreviated names such as sin(x), perhaps with some form of "import functions" declaration that says which subtrees of the fn namespace are to be searched.
Issue #741 closed #closed-741
QT4CG-048-03: Fix copy and paste errors in describing type patterns
Issue #739 closed #closed-739
Apply review comment changes to the HTML DOM XDM mapping.
Issue #618 closed #closed-618
Symmetry: fn:html-doc, fn:csv-doc
Issue #756 created #created-756
JSON serialization - number formatting
We get a lot of complaints about the use of exponential notation when formatting large numbers with the JSON serializer.
I propose adding an option such as number-format="picture" to control this, where the picture is a subset of what is allowed for format-number().
And perhaps we should bring xml-to-json()
into line. The current capability of providing a callback function is more powerful, but difficult to align with serialization.
Issue #131 closed #closed-131
Expression for binding the Context Value
Issue #755 created #created-755
Expression for binding the Context Value
We have no expression yet to bind a value to the context value. Such an expression would be useful, among other things, to extend the focus function to sequences (fn { . }
, see #129).
Here are 3 possible constructs for that, ordered by my personal preference:
1. Value Map Expression
ValueExpr ::= ValidateExpr | ExtensionExpr | ValueMapExpr
ValueMapExpr ::= SimpleMapExpr ("~" SimpleMapExpr)*
SimpleMapExpr ::= PathExpr ("!" PathExpr)*
(: Example :)
//flower ~ (count(.) || ' flowers: ' || string-join(name, ', '))
The expression would be similar to the simple map expression (which we could rename to item map expression). The following equivalents would then exist for simple FLWOR expressions:
for $i in (1 to 5) return string($i) ≍ (1 to 5) ! string(.)
let $i := (1 to 5) return count($i) ≍ (1 to 5) ~ count(.)
fn { E }
could be rewritten to fn($c) { $c ~ E }
.
2. Context Value Declaration
ContextExpr ::= "context" "{" Expr "}" EnclosedExpr
(: Example :)
context { //flower } {
count(.) || ' flowers: ' || string-join(name, ', ')
}
The result of the first expression defines the context value, the second expression can reference the context.
fn { E }
could be rewritten to fn($c) { context { $c } { E } }
.
3. Enhanced FLWOR expression (for the sake of completion)
Similar to variables, the dot could be used to bind and reference the context:
LetBinding ::= ("." | ("$" VarName)) TypeDeclaration? ":=" ExprSingle
ForBinding ::= ("." | ("$" VarName)) TypeDeclaration? AllowingEmpty? PositionalVar? "in" ExprSingle
(: Example :)
let . := //flower
return count(.) || ' flowers: ' || string-join(name, ', ')
fn { E }
could be rewritten to fn($c) { let . := $c return E }
.
Assessment
- The first solution looks most appealing to me. I like the analogy with the existing syntax for single items.
- We could choose the second solution if we believe that the expression will be rarely used.
- I‘ve backed away from the third solution; I think it would be too pervasive.
Issue #754 created #created-754
fn:transitive-closure: signature; remarks; too specific?
I have problems grasping why fn:transitive-closure
returns a function. Wouldn’t it be more consistent with the remaining function set, and easier, to pass on the input as the first argument and directly create the result?
fn:transitive-closure(
$node as node(),
$step as function(node()) as node()*,
$min as xs:nonNegativeInteger? := 1,
$max as xs:positiveInteger? := ()
) as node()*
It would also be easier then to use the function within chains:
$nodes =!> transitive-closure(fn { * }) => count()
Issue #744 closed #closed-744
XQFO Examples: minor fixes, formatting
Pull request #753 created #created-753
65: Allow xmlns="xxx" to NOT change the default namespace for NameTests
Fix issue #65. Basically, fix the bug whereby xmlns="xxx"
changes the default namespace for element NameTests, while retaining bug-compatibility.
QT4 CG meeting 050 draft agenda #agenda-10-17
Draft agenda published.
Issue #649 closed #closed-649
xsl:fallback
Issue #650 closed #closed-650
649: fix an xsl:fallback problem
Pull request #752 created #created-752
706: Fix "for member" grammar problems
Fix #706
Pull request #751 created #created-751
QT4CG-048-01: xsl:mode/@as with built-in templates
Fix #750.
Issue #750 created #created-750
xsl:mode/@as and built-in template rules
See ACTION QT4CG-048-01
The question arose, if an xsl:mode declaration specifies an expected type in @as, then all template rules in that mode are expected/required to deliver a value conforming to that type. But what about the default/fallback template rules? Surely they need to deliver a value of that type as well?
For example, suppose the mode specifies as="xs:boolean". Regardless of the value of xsl:on-no-match
, none of the built-in template rules is going to deliver a boolean. You're expecting a boolean result from xsl:apply-templates, and if none of the template rules match, you're going to get something other than a boolean.
I think the answer is to say that you get a type error if the built-in template rule for the mode returns a value that's not of the required type.
I don't think this error should ever be reported statically, because the compiler has no way of knowing whether the set of explicit template rules is sufficient to cover all cases that will actually arise in source documents.
Issue #571 closed #closed-571
XSLT: xsl:for-each-group/@break-when
Pull request #749 created #created-749
653: Add string literals E".." and L".." to control entity expansion
Allows for expressions interoperable between XPath and XQuery. Fix #653.
Issue #748 created #created-748
Parse functions: consistency
The functions for parsing input have been defined by different people, and the current state is quite inconsistent:
Function | Parameters
--- | ---
fn:parse-xml
| $value as xs:string?
fn:doc
| $href as xs:string?
fn:parse-json
| $value as xs:string?, $options as map(*)
fn:json-doc
| $href as xs:string?, $options as map(*)
fn:parse-html
| $html as union(xs:string, xs:hexBinary, xs:base64Binary)?, $options as map(*)
fn:parse-csv
| $csv as xs:string?, $options as map(*)
I believe there’s some need to unify the functions, and we could at least:
- introduce a
fn:XYZ-doc($href, $options)
function for each input format (with at least oneencoding
option), and - restrict the type of the input parameter of
fn:parse-XYZ
toxs:string?
and always name it$value
.
And I wonder if we should tag all fn:XYZ-doc
functions as ·nondeterministic· (if it’s not too late)?
Issue #747 created #created-747
QName literals
It's quite common to want to write a constant QName; I found myself doing this a lot, for example, in examples and test cases for the elements-to-maps()
function. It's clumsy having to call xs:QName()
or fn:QName()
or parse-QName()
for this purpose. It's particularly clumsy with map constructors where you want to write many QNames.
I propose we introduce the syntax Q"prefix:local"
.
The quotes can be either single or double; the prefix is optional. If there is no prefix, the result is a no-namespace QName.
The prefix (if present) must be bound to a namespace in the static context.
Character and entity references are not allowed.
Note that Q{uri}local
is a NameTest, not a QName literal.
Issue #746 created #created-746
break-when -> split-when in fn:partition
We decided to rename xsl:for-each-group/@break-when
as @split-when
. We should make the same change to the name of the second argument of fn:partition
.
Issue #745 created #created-745
Support for inline (anonymous) xslt functions
I propose adding support for inline xslt functions.
Whilst XPath supports this, Xpath functions are limited in what they can do, and how "look" e.g. returning newly constructed elements isnt possible without parse-xml-fragment.
I would suggest the syntax would be basically the same as for xsl:function except with the name omitted, e.g.
<xsl:template name="apply-function" as="xs:integer">
<xsl:param name="input" as="xs:integer"/>
<xsl:param name="function" as="function(xs:integer) as xs:integer"/>
<xsl:sequence select="$function($input)"/>
</xsl:template>
` <xsl:template ....>
<xsl:call-template name="apply-function">
<xsl:with-param name="input" select="1"/>
<xsl:with-param name="function">
<xsl:function as="xs:integer">
<xsl:param name="value" as="xs:integer"/>
<result>
<xsl:sequence select="$value * 2"/>
</result>
</xsl:function>
</xsl:with-param>
</xsl:call-template>
</xsl:template>
`
benefits
- less syntactic "noise" of named functions
- the ability to embed xslt functions inline inside maps (and other data types)
- functional parity with xpath (and more)
- natural generalisation to local function proposal
alternatives
- use reference to explicitly named XSLT function
- use XPath (though problematic when constructing new nodes)
Pull request #744 created #created-744
XQFO Examples: minor fixes, formatting
Editorial: Some XQuery equivalents were buggy, and the formatting was unified.
Issue #743 created #created-743
Extend enumeration types to allow values other than strings
In reviewing and accepting the spec for enumeration types, it was suggested that it might be useful to allow values other than strings.
- There's a difficulty in that not all atomic values can be represented by literals. We have the same problem with function annotations; perhaps we need to bite the bullet and define some kind of "constant atomic expression" construct.
- Aside from that, there don't seem to be any major obstacles.
We change
An EnumerationType has a value space consisting of a set of xs:string values. When matching strings against an enumeration type, strings are always compared using the Unicode codepoint collation.
to
An EnumerationType has a value space consisting of a set of atomic values. When matching values against an enumeration type, values are always compared using the fn:atomic-compare() function (as used for comparing map keys).
The subtyping rules (newly defined in terms of unions of singleton enumeration sets) seem to work in their current form, without change. enum("red", "green")
is still a subtype of xs:string
, because all the enumerated values are instances of xs:string
.
QT4 CG meeting 049 draft minutes #minutes-10-10
Draft minutes published.
Issue #742 created #created-742
xsl:function-library: keep, drop, or refine?
The draft XSLT 4.0 specification (§5.3.2) proposes a new declaration xsl:function-library as a solution to the problem of having to qualify all function names except those in the core namespace. We have not reviewed this proposal.
Issue #688 closed #closed-688
Coercion rules for union types and enumeration types
Issue #691 closed #closed-691
688 Semantics of local union types, enumeration types, etc
Issue #372 closed #closed-372
Separate default namespace for elements from the default namespace for types
Issue #715 closed #closed-715
372 Rollback the default namespace changes
Issue #725 closed #closed-725
Clarification to load-xquery-module
Issue #727 closed #closed-727
725 Add clarification note for load-xquery-module
Issue #52 closed #closed-52
Allow record(*) based RecordTests
Issue #728 closed #closed-728
52 Allow record(*)
Issue #731 closed #closed-731
Capturing accumulators: a couple of minor errors/omissions
Issue #732 closed #closed-732
731 Capturing accumulators: Add error conditions, revise streaming rules
Pull request #741 created #created-741
QT4CG-048-03: Fix copy and paste errors in describing type patterns
Fulfils Action QT4CG-048-03.
Pull request #740 created #created-740
QT4CG-047-01: Rename break-when to split-when, plus minor editorial cleanup
Fulfils action QT4CG-047-01 : the CG decided to rename break-when as split-when. Also applies a few minor editorial corrections in the same general area.
Pull request #739 created #created-739
Apply review comment changes to the HTML DOM XDM mapping.
QT4 CG meeting 049 draft agenda #agenda-10-10
Draft agenda published.
Issue #738 created #created-738
FO: Why is fn:op under section "17.3 Dynamic loading"
FO: Why is fn:op
under section "17.3 Dynamic loading" ?
-
Lexical substitution has little, if anything at all, to do with (dynamic) loading. Nothing is loaded from some external resource, as in the case of
fn: load-xquery-module
andfn:transform
. -
There is nothing dynamic about having a predefined function that has a predefined set of possible values. In fact this could be defined as a strictly/statically defined
map(xs:string, function(item()*, item()*) as item()*)
with all allowed possible keys and as their values - the corresponding functions.
Taking this into account it is suggested to move fn:op
to a section where it truly belongs. Maybe have in this section also other features of the language that are merely lexical substitution, as for example, function(s) for the creation of type-aliases.
Pull request #737 created #created-737
295: Boost the capability of recursive record types
Fix issue #295
The main changes are:
- In place of the special self-reference syntax "..", we now allow recursive use of type aliases, allowing types to be mutually recursive
- We generalise the places that recursive references are allowed, for example the record type used by fn:random-number-generator is now legal
- Subtyping rules for recursive record types are now defined (this was previously a gap in the specification). Acknowledgements to a John Snelson blog post for pointing me in the right direction.
Pull request #736 created #created-736
730: Clarify (and correct) rules for maps as instances of function types
Fix issue #730
Note: the issue led to a wide-ranging discussion about possible enhancements to the type system, for example adding types for empty maps and arrays. I have ignored most of this, and have focussed on fixing the issue as raised (arising originally on the test suite), namely the incorrect use of V?
to define a type that allows either an instance of the sequence type V or or an empty sequence.
QT4 CG meeting 048 draft minutes #minutes-10-03
Draft minutes published.
Issue #735 created #created-735
Local functions in XSLT
I propose that we should add local functions to XSLT: specifically, allowing an xsl:function
declaration to appear within a sequence constructor, declaring a named function that is available for use only within the sequence constructor.
At present this can be achieved by declaring a local variable bound to an anonymous function, but it's clumsy to have to use completely different syntax for local and global functions, and functions defined in this way cannot be mutually recursive.
I propose that such functions should shadow any global functions with the same name, in the same way as happens with local variable declarations. I have an open mind as to whether shadowing of functions in reserved namespaces should be allowed.
The main difficulty is the scoping rules. We don't want the problems Javascript has with "hoisting". I propose that (a) all local function declarations must appear before any instructions (or local variable declarations, but not params) within the sequence constructor, and (b) these function declarations are in-scope throughout the sequence constructor including forwards references from the body of other functions declared earlier within the same sequence constructor.
Pull request #734 created #created-734
517: fn:chain
Added fn:chain.
Took much effort to ensure Unix-style line-endings are used.
Issue #733 closed #closed-733
517: fn:chain
Pull request #733 created #created-733
517: fn:chain
I added fn:chain, which has been discussed in https://github.com/qt4cg/qtspecs/issues/517
Pull request #732 created #created-732
731 Capturing accumulators: Add error conditions, revise streaming rules
Minor tweaks to the spec for capture=yes accumulator rules.
Fix #731
Issue #731 created #created-731
Capturing accumulators: a couple of minor errors/omissions
Two little things in the spec for capturing accumulators that we agreed last week:
(a) We should define an error code for use when the capture attribute is present but phase="start".
(b) The streamability rules are too strict. They say that the select attribute must be motionless or consuming, but this is not necessary, because the select attribute is applied to a snapshot tree, which is instantiated in memory and therefore does not need to be streamable.
QT4 CG meeting 048 draft agenda #agenda-10-03
Draft agenda published.
Issue #211 closed #closed-211
XSLT streaming: capturing accumulators
Issue #717 closed #closed-717
211: add capturing accumulators to XSLT
Issue #730 created #created-730
Equivalence of map and function types
It is stated in XPath §3.6.4.2, and probably elsewhere, that
The function signature of a map matching type map(K, V), treated as a function, is function(xs:anyAtomicType) as V?
But V is a sequence type, not an item type, so you can't just tag a '?' onto the end of it. What is intended here by V?
is a sequence type that is the union of V
and empty-sequence()
.
Issue #729 created #created-729
xsi:schemaLocation
The specifications (XQuery and XSLT) should say something about the effect of requesting validation on a document that contains xsi:schemaLocation
and/or xsi:noNamespaceSchemaLocation
attributes. At present XQuery says nothing, and XSLT says very little.
XQuery 3.1 says: A validate expression can be used to validate a document node or an element node with respect to the [in-scope schema definitions], using the schema validation process defined in [[XML Schema 1.0]] or [[XML Schema 1.1]]. This doesn't really answer the question. The "with respect to" phrase could be read as implying that ONLY the in-scope schema definitions are used. Particular problems occur if xsi:schemaLocation refers to a schema document that attempts to override or redefine the schema components that have been statically imported.
XSLT 3.0 says nothing of interest about what schema (=set of schema components) is used when validation is requested, though it does mention in passing that xsi:schemaLocation
attributes might be interpreted in some way by a schema processor.
If we look to the behaviour of Saxon as a reference implementation, then we'll quickly find fault. There's a configuration option to control whether xsi:schemaLocation
attributes are considered or ignored; if they are considered, then the schema components referenced are added to a global pool of schema components which are used not only to validate the document in question, but to validate any subsequent documents. We're in the process of redesigning this to do something that makes more sense.
Pull request #728 created #created-728
52 Allow record(*)
Fix #52. Implements decision made at meeting 046.
Pull request #727 created #created-727
725 Add clarification note for load-xquery-module
Add a note to clarify the behaviour of load-xquery-module.
Fix #725
Issue #724 closed #closed-724
PR 717 with merge conflicts resolved
Issue #723 closed #closed-723
Updated PR for capturing accumulators
Issue #726 closed #closed-726
PR 723 with merge conflicts resolved
Pull request #726 created #created-726
PR 723 with merge conflicts resolved
Issue #725 created #created-725
Clarification to load-xquery-module
Add a clarification note to load-xquery-module, to correct a misunderstanding by a (very knowledgeable) user: see https://saxonica.plan.io/issues/6209
The function load-query-module does not modify the static or dynamic context in any way. In particular, the variables and functions that are loaded from the query module are not added to the static or dynamic context of the calling code. They are accessible only via the map that is returned from the function call.
Pull request #724 created #created-724
PR 717 with merge conflicts resolved
QT4 CG meeting 047 draft minutes #minutes-09-26
Draft minutes published.
Issue #722 closed #closed-722
This is a test. This is only a test.
Pull request #723 created #created-723
Updated PR for capturing accumulators
Updated to take account of comments
Pull request #722 created #created-722
This is a test. This is only a test.
Had this been a real emergency, we would have fled in terror and you would not have been informed.
DO NOT MERGE THIS! :-)
Issue #721 closed #closed-721
Attempt to fix the problem with PRE elements in autodiffs
Pull request #721 created #created-721
Attempt to fix the problem with PRE elements in autodiffs
Maybe I'm more cleverer today than I was last time I looked into this.
Issue #663 closed #closed-663
Calling xsl:original() with keywords
Issue #674 closed #closed-674
663: Describe how calls to xsl:original with keywords work
Issue #570 closed #closed-570
XSLT: Built-in template rules for maps and arrays
Issue #718 closed #closed-718
Add on-no-match="shallow-copy-all"
QT4 CG meeting 047 draft agenda #agenda-09-26
Draft agenda published.
Issue #720 created #created-720
From Records to Objects
It has become idiomatic to use maps, and record type definitions, to declare a collection of functions; so for example the random-number-generator object offers a "method" next()
that can be called using the syntax $rng?next()
.
The problem is that it's not possible, within the XPath/XQuery language, to implement such a function with implicit access to the object on which it is invoked. The implementation of the function does not have access to any kind of $this
variable.
This issue considers how we can move forwards from supporting simple records to introduce object capabilities, in an incremental and compatible way.
Here are three steps in that direction:
- Where a named record type is declared, also create a corresponding constructor function. So if you declare
declare item type my:loc as record(longitude as xs:double, latitude as xs:double)
you also get a constructor function allowing my:loc(180, 180), allowing both positional or keyword arguments corresponding to the field names,
-
Allow default values to be defined in the record type, which act as default values for the parameters in the constructor function.
-
Allow functions that are defined as part of a record type access to a variable $this. The constructor function provides an implicit binding of this variable to the record/map/object that is being instantiated.
-
Allow self-reference to a named record type (and its constructor function) within the record definition.
So you can now do:
declare type my:counter as record (
value as xs:integer,
increment := fn() as my:counter {my:counter($this?value + 1)}
)
and then
let $x := my:counter(0)
return $x?increment()?value
which returns 1.
Pull request #719 created #created-719
413: Spec for CSV-related functions
This PR contains error fixes (typos, examples that contradicted the spec text), some (hopefully) improved language and one breaking change.
The current draft uses the type map(xs:integer, xs:string)
for the column-names
option to fn:csv-to-xdm
and fn:csv-to-xml
. This PR flips that to map(xs:string, xs:integer)
. It turns out that the examples were already using this, and it seems to me that having the names
entry in the csv-columns-record
record type be the transposed version of the column-names
option that creates it, rather than be the same thing, is counterproductive.
I can think of some examples (a CSV split into several chunks, with only the first containing the headers) where being able to feed the names
entry right back into another invocation of fn:csv-to-xdm
would be useful. If nothing else it's confusing and not obvious, or I wouldn't have messed up the examples, and somebody would have noticed during the review process...
Pull request #718 created #created-718
Add on-no-match="shallow-copy-all"
Enable recursive descent transformation with template rules for maps and arrays.
Fix #570
Pull request #717 created #created-717
211: add capturing accumulators to XSLT
Adds the attribute capture="yes" to xsl:accumulator-rule. This has been available as a Saxon extension for some time and makes many accumulators much easier to implement.
Fix #211
Issue #716 created #created-716
Generators in XPath
What is a generator?
Generators are well known and provided out of the box in many programming languages. Per Wikipedia:
“In computer science, a generator is a routine that can be used to control the iteration behaviour of a loop. All generators are also iterators.[1] A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values. However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.”
The goal of this proposal (major use-cases)
A generator in XPath should be a tool to easily implement the solutions to the following use-cases:
-
Processing a huge collection whose members may not all be needed.
A generator will produce only the next member of the collection and only on demand basis. -
Handling a collection containing unknown or infinite number of members. When requested the next member of the collection the generator will always produce it, if the collection still contains any members. It is the responsibility of the caller to issue only the necessary number of requests for the really needed next members.
What is achieved in both cases:
- A (next) member is produced only on request. No time is spent on producing all members of the collection.
- A (next) member is produced only on request. No memory is consumed to store all members of the collection.
A good problem that is based on these use-cases is to generate a collection of the first N members that have some wanted properties, and are generated from other collection(s), when it is not known what the size of the original input collections would be in order for the desired number of N members to be discovered.
For example: Produce the first 1 000 000 (1M) prime numbers.
Sometimes we may not even know if N such wanted members actually exist, for example: Produce the first 2 sequences of 28 prime numbers where the primes in each of the sequences form an arithmetic progression.
The Proposal
A generator is defined as (and synonym for):
let $generator as record
(initialized as xs:boolean,
endReached as xs:boolean,
getCurrent as function(..) as item()*,
moveNext as function(..) as .. ,
* )
A generator is an extensible record .
It has four fixed-named keys, and any other map-keys, as required to hold the internal state of that specific generator.
Here is the meaning of the four fixed/named keys:
-
initialized is a boolean. When a generator
$gen
is initially instantiated,$gen?initialized
isfalse()
. Any call to$gen?getCurrent()
raises an error. In order to get the first value of the represented collection, the caller must call$gen?moveNext()
-
endReached is a boolean. If after a call to
moveNext()
the value of the returned generator'sendReached
key istrue()
then callingmoveNext()
and/orgetCurrent()
on this generator raises an error. -
getCurrent is a function of zero arguments. It must only be called if the values of
initialized
istrue()
and the value ofendReached
isfalse()
, otherwise an error must be raised. This function produces the current member of the collection after the last call tomoveNext
, if this call didn't return a generator whoseendReached
value wastrue()
-
moveNext is a function of zero arguments. When called on a generator whose
endReached
value isfalse()
then it produces the next (state of the) generator. including a possiblytrue()
value ofendReached
and if this value is stillfalse()
, then callinggetCurrent()
produces the value of the next member of the collection.
Examples of operations on generators
The following examples are written in pseudo-code as at the time of writing there was no available implementation of records. And also, the code for recursion in pure XPath makes any such example longer than necessary for grasping its meaning.
The Empty Generator
emptyGenerator() {
map{
initialized : true(),
endReached: true(),
getCurrent: function($this as map(*)) {error()},
moveNext: function($this as map(*)) {error()}
}
}
Take the first N members of the collection
take($gen as generator, $n as xs:integer) as generator
{
let $gen := if(not($gen?initialized)) then $gen?moveNext()
else $gen,
return
if( $gen?endReached or $n eq 0) then emptyGenerator()
else map{
"initialized": true(),
"endReached": false(),
"getCurrent": $gen?getCurrent,
"moveNext": take($gen?moveNext(), $n -1)
}
}
Skip the first N members from the collection
skip($gen as generator, $n as xs:integer) as generator
{
if($n eq 0) then $gen
else
{
let $gen := if(not($gen?initialized)) then $gen?moveNext()
else $gen
return
if(not($gen?endReached) then skip($gen?moveNext(), $n -1)
else $gen
}
}
Subrange of size N starting at the M-th member
subrange($gen as generator, $m as xs:integer, $n as xs:integer) as generator
{
take(skip($gen, $m -1), $n)
}
Head of a generator
head($gen as generator)
{
take($gen, 1)?getCurrent()
}
Tail of a generator
tail($gen as generator)
{
skip($gen, 1)
}
At index N
at($ind as xs:integer)
{
subrange($ind, 1)?getCurrent()
}
For Each
for-each($gen as generator, $fun as function(*))
{
map:put($gen, "getCurrent", function() { $fun($gen?getCurrent()) } )
}
For Each Pair
for-each-pair($gen1 as generator, $gen2 as generator, $fun as function(*))
{
let $gen1 := if(not($gen1?initialized)) then $gen1?moveNext()
else $gen1,
$gen2 := if(not($gen2?initialized)) then $gen2?moveNext()
else $gen2,
return
if($gen1?endReached or $gen2?endReached) then map:put($gen1, "endReached", true())
else map:put(map:put($gen1, "getCurrent", function() { $fun($gen1?getCurrent(), $gen2?getCurrent()) } ) ,
"moveNext", function() { for-each-pair(skip($gen1, 1), skip($gen2, 1), $fun)}
)
}
Filter
filter($gen as generator, $pred as function(item()*) as xs:boolean)
{
let $getNextGoodValue := function($gen as map(*), $pred as function(item()*) as xs:boolean)
{
let $mapResult := iterate-while(
$gen,
function($gen) { not($pred($gen?getCurrent($gen))) },
function($gen) { $gen?moveNext($gen) }
)
return $mapResult?getCurrent($mapResult)
},
$gen := if($gen?initialized) then $gen
else $gen?moveNext($gen)
return
map {
"initialized": true(),
"endReached": $gen?endReached,
"getCurrent": function($this as map(*)) { $getNextGoodValue($this?inputGen, $pred) },
"moveNext": function($this as map(*))
{ let $nextGoodValue := $getNextGoodValue($this?inputGen?moveNext($this?inputGen), $pred),
$nextGen := iterate-while(
$this?inputGen?moveNext($this?inputGen),
function($gen) { not($pred($gen?getCurrent($gen))) },
function($gen) { $gen?moveNext($gen) }
)
return
map {
"initialized": $nextGen?initialized,
"endReached": $nextGen?endReached,
"getCurrent" : function($x) {$nextGoodValue},
"moveNext" : $this?moveNext,
"inputGen" : $nextGen
}
},
"inputGen" : $gen
}
}
Here are some other useful functions on generators -- with just their signature and summary:
-
concat($gen1 as generator , $gen2 as generator ) - produces a generator that behaves as
$gen1
until$gen1.endReached
becomestrue()
, and then behaves as$gen2
-
append($gen as generator, $value as item()*) - produces a generator that behaves as
$gen
until$gen.endReached
becomestrue()
, and then as a generator that has only the single valuevalue
. -
prepend($gen as generator, $value as item()*) - produces a generator whose first value is
value
and then behaves as$gen
. -
some($gen as generator) as xs:boolean - Produces
true()
if$gen
has at least one value, andfalse()
otherwise. -
some($gen as generator, $pred as function(item()*) as xs:boolean) as xs:boolean - Produces
true()
if$gen
has at least one value for which $pred($thisValue) is true(), andfalse()
otherwise. -
ofType($gen as generator, $type as type) - Produces a new generator from
$gen
that contains all values from$gen
of typetype
-- for this we need to have added to the language the type object. -
skipWhile($gen as generator, $pred as function(item()*) as xs:boolean) - Produces a new generator from
$gen
by skipping all starting values for which$pred($theValue)
istrue()
. -
takeWhile($gen as generator, $pred as function(item()*) as xs:boolean) - Produces a new generator from
$gen
which contains all starting values of$gen
for which$pred($theValue)
istrue()
. -
toArray($gen as generator) - Produces an array that contains all values that are contained in
$gen
. -
toSequence($gen as generator) - Produces a sequence that contains all values that are contained in
$gen
. Values of$gen
that are sequences themselves are flattened. -
toMap($gen as generator) - If the values in $gen are all key-value pairs, produces a map that contains exactly all the key-value pairs from
$gen
.
These and many other useful functions on generators can and should be added to every generator upon construction.
Thus, it would be good to have an explicit constructor function for a generator:
construct-generator($record as
record( initialized as xs:boolean,
endReached as xs:boolean,
getCurrent as function(..) as item()*,
moveNext as function(..) as .. ,
)
) as generator
Pull request #715 created #created-715
372 Rollback the default namespace changes
Implements the CG decision to roll back the changes that introduced two separate default namespaces for elements and types.
Fix #372
Issue #714 created #created-714
Function annotations in XSLT
I propose that the following attributes on an xsl:function
should be accessible as annotations, for example in a call on function-annotations:
- visibility
- streamability
- new-each-time
- cache
plus any extension attribute in a user-defined namespace, for example <xsl:function saxon:debug="yes"/>
should have the annotation %saxon:debug("yes")
. The value is always a single string, the actual attribute value as written.
Issue #703 closed #closed-703
129 (1): XPath and XQuery changes for introduction of context value
Issue #701 closed #closed-701
fn:concat: Support for 0 or more arguments
Issue #702 closed #closed-702
701: fn:concat: Support for 0 or more arguments
Issue #696 closed #closed-696
566: Rework query parameters on build-uri/parse-uri
Issue #694 closed #closed-694
XQFO minor edits, with new examples and notes, 2 through 4.6
Issue #687 closed #closed-687
Constructor functions for user-defined types
Issue #690 closed #closed-690
687 Clarify constructor functions for user-defined types
Issue #668 closed #closed-668
Definition of HTML case-insensitive collation
Issue #680 closed #closed-680
668 define case insensitive collation normatively
Issue #713 created #created-713
Annotations: Editorial notes
Copied from https://github.com/qt4cg/qtspecs/pull/710#pullrequestreview-1630129066:
For avoidance of doubt, we should say in XQuery 4.6.2.4 (Named Function References) that the function created by a named function reference has its annotations taken from the function definition. There are other places where we are not explicit about the annotations of a function item, for example with partial function application. We should add a note that in XPath and XSLT, it is not possible to define function annotations, so this function will always return an empty result. (However, we should consider giving user-defined functions in XSLT annotations based on their attributes, e.g. visibility and streamability).
…and my complementary note in the PR thread:
I decided to merge the PR without changes, and not add the reference to XPath and XSLT, because the function may also return results for XQuery functions imported via fn:load-xquery-module.
Additional notes from today’s meeting:
- For annotations without values, we could assign
true()
as a default value. - Examples could be added to
fn:function-annotations
to demonstrate how to check for annotations without values (or with values whose EBV isfalse()
:0
,""
, etc.). - Dimitre suggested restructuring the spec for features that are not available in XPath.
- Maybe annotations would also be helpful in XPath.
Feel free everyone to add more comments.
Issue #36 closed #closed-36
fn:function-annotations (Allow support for user-defined annotations)
Issue #710 closed #closed-710
36: fn:function-annotations
Issue #712 created #created-712
array:sort: to be aligned with fn:sort
Related: #623. And an editorial note:
https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-sort
I think the type for the 3rd argument of fn:sort
…
$key as (function(item()) as xs:anyAtomicType*)+ := fn:data#1
…should be changed to zero-or-more:
$key as (function(item()) as xs:anyAtomicType*)* := fn:data#1
While it will be rare to encounter queries with key := ()
, there seems to be no urgent reason to enforce at least one sort key. The change would also be in alignment with the corresponding rule (with the current signature, $key
cannot be empty), which says:
The number of sort key definitions is determined by the number of function items supplied in the
$key
argument. If the argument is absent or empty, the default is a single sort key definition using the functiondata#1
.
Issue #711 created #created-711
Using annotations for navigation of JSON trees
This issue develops ideas presented in issue #596, which itself is a continuation of ideas raised in issue #341, issue #350, and elsewhere. It's related to the requirements presented in issue #262 and issue #297.
Firstly, I propose a change to the data model so that annotations can be attached to any item [or perhaps any value?], not only to a function. The annotations are a map of type map{xs:QName, item()*}
. Some general principles:
- Annotations on an item do not affect the result of any operation on that item unless otherwise specified.
- Operations that are described as returning a result that contains items that are present in one of the operation's operands retain the annotations of those items, unless otherwise specified. (So for example $a[C] returns a sequence of items from $a in which the annotations are preserved).
- Operations that construct "new" items (for example $a + $b) return an item with no annotations, unless otherwise specified.
- The function
annotations($x)
(replacingfunction-annotations($x)
) returns the annotations of an item. - The function annotate($x, key, value) returns a "clone" of $x with an additional annotation. (A clone of an item differs from the original only in having different annotations. All operations other than annotation-sensitive operations produce exactly the same result on the clone and the original - including tests for node identity.)
- To avoid confusion, the term "type annotation" is replaced by "type label".
Secondly, we use annotations to aid navigation of JTrees (a term I use to describe trees of arrays and maps such as might be produced by parsing JSON).
We introduce a component of the static context tracked=true|false
, defaulting to false. The construct tracked{expression}
evaluates an expression and its subexpressions in tracked mode. In tracked mode any operator or function that performs selection within an array or map (for example the lookup operator, the map:get
and array:get
function, using the map/array as a function item, or the array:head()
and array:foot()
and map:find()
functions) annotates the items in its result with two properties: "container" whose value is the map or array from which the item was selected, and "key" which is the key or array index of the selected value within that container. The effect is that if an item was found in a JTree using a tracked expression, the annotations on the resulting item can be used in effect to navigate upwards within the tree that was searched.
Note that this is not a new idea: in effect, the result of a tracked selection is a zipper data structure, as described in https://en.wikipedia.org/wiki/Zipper_(data_structure).
A further exploitation of the idea allows us to introduce deep update of JTree structures. For example, modify(root:=$a, selection:=fn{?x?y?z,} change:=fn{.+1})
can evaluate the selection
argument in tracked mode, apply the change
function to the resulting items, and then navigate back using the container
annotation to create modified versions of all traversed containing JTrees, eventually returning a modified version of the root
tree.
QT4 CG meeting 046 draft agenda #agenda-09-19
Draft agenda published.
Issue #673 closed #closed-673
HTML namespace changes
Pull request #710 created #created-710
36: fn:function-annotations
@michaelhkay Again, some hints might need to be added for XPath (?).
Issue #709 created #created-709
(Un)Checked Evaluation
Based on https://github.com/qt4cg/qtspecs/issues/707#issuecomment-1721596055 and the comments following in that thread:
I've been thinking recently about adding
checked{}
andunchecked{}
modes. For example<xsl:apply-templates select="checked{.//item}"/>
would throw an error if there are no items. The mode of execution would propagate downwards, so
checked{a/b/c/d}
would be able to tell you that ab
was found, but it had noc
children.
checked{item[22]}
would have the effect of making sequences behave more like arrays, with bound checking; converselyunchecked{item?22}
would return an empty sequence instead of throwing an error.
Issue #708 created #created-708
Toward a design for generators
Motivation
The motivations for this are to explore the creation of sequences where the next item in the sequence is determined by evaluating a function on the current state of a system. These sequence generators have an initial starting value and state. They can also stop when some condition is met.
The motivating example here is the fn:random-number-generator function, where:
let $rnd := fn:random-number-generator()
initializes a new random number sequence with the default seed as its state;let $value := $rnd?number
returns the current value of the sequence;let $rnd := $rnd?next()
returns the state and value of the next item in the sequence.
This has all the properties needed for a forward (left-to-right) generating sequence. To make it a generalized generator sequence, the number
field should be renamed value
.
NOTE: The fn:random-number-generator
function defines an infinite sequence as it has no termination/end of sequence condition.
Sequence Generators as Record Types
Therefore, a forward generator sequence could look like this:
declare item-type sequence-generator as record(
value as item(),
next as function() as record(value, next, *)?,
*
);
declare function generated-sequence() as sequence-generator?;
If calling ?next()
on a sequence above returns the empty sequence then there are no more values in that sequence.
If the generated-sequence()
returns the empty sequence then there are no items in the sequence.
NOTE: This does not currently define reversible sequence support. A reversed
property could be provided that returns a sequence-generator
that operates on the sequence from right to left. I haven't figured out exactly how this should look, but reversed generator sequences may be better investigated as a separate issue.
NOTE: An implementation can process this sequence iteratively if needed.
Analysis
While this allows generator sequence types to be created, they are -- like fn:random-number-generator()
cumbersome to use. This does have the advantage of being backward compatible with fn:random-number-generator()
, though.
The problem comes when trying to make these work like sequences. The subtype and function coercion rules should be doable. The other sequence operations like filtering are more complicated to define properly.
However, this has the same issues that allowing array()
in fn:*
functions has -- how do you differentiate the use cases where the user is working on a sequence of generators, or the generated sequences?
NOTE: You can't extend the functions to take a
(sequence-type | item()*)
parameter as asequence-type
is a subtype ofitem()
which matchesitem()*
. If the functions were to be extended to handle these, thenfn:head(fn:random-number-generator())
would return a number instead of a map.
Sequence Generator Function
The Kotlin language has a generateSequence function that takes a next function, an optional seed value or construction function, and returns a lazy sequence over that. -- Internally, it is building a Java iterator that produces values from calling the next function. The sequence will terminate when the next value is null.
I propose that -- in addition to the sequence-generator
type above -- XPath defines the following function:
declare function fn:sequence($generator as sequence-generator) as item()*;
This solves the issues in the Analysis section above, and is analogous to array:values
. It is implementation defined how the sequence is constructed. -- This allows an implementer to appropriately map the generator to their internal sequence implementation in order to provide lazy evaluation and other operations.
There should also be the following helper function for random numbers:
declare function fn:random-numbers(
$seed as xs:anyAtomicType? := ()
) as xs:double* {
fn:random-number-generator() => fn:sequence()
};
The user-defined sequences then become e.g.:
let $generator as sequence-generator := map {
value : 1,
next : function () { () }
}
return fn:sequence($generator)
Issue #299 closed #closed-299
Short-circuiting functions, function-arity guards and lazy hints
Issue #707 created #created-707
Dynamic Function Calls: Processing Empty Sequences
A fundamental – and brilliant – property of XPath is that many operations tolerate empty sequences: Instead of throwing an error, the empty result is passed on unchanged to the next operation. While this is unrewardingly confusing for binary operations (() + 1
, () eq 5
), it’s wonderful for pipelines:
(: paths :)
$nodes / a / b / c
(: lookups :)
$data ? 1 ? 2 ? 3
(: simple map operators :)
$data ! do(.) ! something(.)
(: arrow operator works differently, but the syntax is similar: :)
$data => do() => something()
As far as I can judge, it would be a very simple and user-friendly addition if we extended dynamic function calls to return an empty sequence (instead of raising an error) if the base expression is an empty sequence. This way, the following expressions would all run through:
let $map := map { 'giovanni': map { 'city': 'roma' } }
return $map('andrea')('city'),
let $data := ()
return $data(1)(2)(3),
()(123),
()()
Many people use parentheses instead of the lookup operator for accessing maps & arrays, and the proposed change would make the syntax more interchangeable. I believe it would also be useful for function items in general.
Issue #706 created #created-706
FLWOR: for member $m1 in $a1, member $m2 in $a2
Currently, the member
keyword must always be placed directly after the for
clause:
(: valid :)
for $a in 1 to 10
for $m member $m in $array
(: invalid :)
for $a in 1 to 10, member $m in $array
In addition, the keyword applies to all other bindings in the same for
clause:
for member $m1 in $array1, $m2 in $array2
My feeling is that this syntax is a bit odd, as other keywords (allowing empty
, at
) only refer to the currently bound variable. Next, the member
syntax would differ from the semantics of XQuery Full Text: The score
keyword is placed before the variable name, and can be used more than once (or omitted):
let score $s1 := $data1, score $s2 := $data2
I think we should change this. It would also simplify the grammar:
InitialClause ::= ForClause | LetClause | WindowClause
ForClause ::= "for" ForBinding ("," ForBinding)*
ForBinding ::= (SimpleForBinding | ForMemberBinding) PositionalVar? "in" ExprSingle
SimpleForBinding ::= VarBinding AllowingEmpty?
ForMemberBinding ::= "member" VarBinding
AllowingEmpty ::= "allowing" "empty"
PositionalVar ::= "at" "$" VarName
VarBinding ::= "$" VarName TypeDeclaration?
And one more motivation for changing is that map bindings will be easier to define (see #31).
Issue #705 created #created-705
Function Coercion: Function Arities
In 4.6.4 Function Coercion, a rule was added to support functions with an arity lower than the expected one:
If F has lower arity than the expected type, then F is wrapped in a new function that declares and ignores the additional argument; the following steps are then applied to this new function.
If I got it right, this is the resulting 4.0 behavior:
Spoiler: I probably got it wrong, see the next comment.
declare function local:function($a) { };
declare variable $function := function($a) { };
(: now legal :)
filter (1984, true#0)
$function (1984, 'ignored')
fn { } (1984, 'ignored')
map { } (1984, 'ignored')
true#0 ('ignored')
sum(?, ())(1984, 'ignored')
(: still illegal :)
local:function(1984, 'ignored')
(: still legal: RHS items will be supplied one by one :)
map { }(1984, 'processed')
Maybe some more examples should be added in the corresponding sections that refer to function coercion.
The new rule is powerful and allows for greater flexibility (see #516 and other issues), but the behavior may also be unexpected. We should probably document that:
- It may go unnoticed that a passed on argument will be ignored. In other words, we reduce type safety by allowing users to supply more arguments than will be processed.
- It makes a difference whether the invoked function is static or dynamic (dynamic functions will now provide less type safety than static functions).
Issue #704 created #created-704
Context Value Expression → Context Value Reference
The specification defines Variable References for accessing values bound to a variable, and we should rename the equivalent operation for accessing the context from “Context Value Expression” to “Context Value Reference” (even more so if we should decide to introduce a Context Value Declaration later on, as discussed in #755).
Related: https://github.com/qt4cg/qtspecs/pull/703#issuecomment-1719345430
Pull request #703 created #created-703
129 (1): XPath and XQuery changes for introduction of context value
Fix #129 This replaces the previous attempt from several months ago, which had too many conflicts to be salvageable.
This is a wide-ranging and pervasive change, and I would like the changes to be applied promptly and incrementally to reduce the risk of conflicts, even if further work is needed later. This first PR addresses the XQuery and XPath language specifications. Further changes (in subsequent PRs) are needed for F+O and for XSLT. There are also a couple of minor changes affecting Serialization (but none affecting the data model).
Issue #368 closed #closed-368
129: Context item generalized to context value
Pull request #702 created #created-702
701: fn:concat: Support for 0 or more arguments
Closes #701
Issue #701 created #created-701
fn:concat: Support for 0 or more arguments
With #161, we plan to introduce support for variadic functions.
The scope of this issue is much smaller and can be seen as a preparatory one; it’s about allowing the first two arguments of the function optional. I’ll create a little PR for it.
Issue #700 created #created-700
Operators for array mapping and filtering
With issue #129 generalising the context item to a context value, we have the opportunity to define context-based mapping and filtering operators for arrays that work in the same way as the A!B
and A[B]
operators for sequences.
I propose A!!B
as a mapping operator for arrays. Unlike !
, this does not flatten the result. The result is an array whose members correspond one-to-one with the members of A
, each member of the result array being formed by evaluating B
with the corresponding member of A
as the context value.
For an array whose members are singletons, the expression A!!B
has a similar effect to A?*!B
, but (a) it is clearer, (b) it returns an array rather than a sequence, and (c) it performs no flattening. For the more general case, it can be used in place of the higher-order function call A => array:for-each(fn{B})
.
I propose A?[B]
as a filter operator for arrays. Unlike A[B]
, this is not overloaded to perform index-based selection. The result is an array containing those members of A
for which B
has an effective boolean value of true, when evaluated with the corresponding member of A
as the context value. The expression $A?[B]
is equivalent to $A => array:filter(fn{B})
.
For example, $A?[exists(.)]
filters the array $A
to retain only those members that are non-empty.
Issue #645 closed #closed-645
Editorial: Use `\n` instead of `\r\n` in XML documents
Issue #697 closed #closed-697
645: Use \n instead of \r\n in XML documents
Issue #699 created #created-699
GitHub: Signing
@ndw Sorry to keep you busy. I think we should disable the enforced signing of commits. We know the persons who send PRs, and currently we only have three persons (you, Reece, me) who sign their commits.
If signing is disabled, for example, also non-admins will be able to merge those PRs.
Issue #698 created #created-698
GitHub: Line Endings
@ndw I’ve copied your suggestion from https://github.com/qt4cg/qtspecs/issues/645#issuecomment-1657056816 to a new issue:
we can tell git to fix the line endings in comments. I'll see about getting that setup.
In addition, it would be great if line endings were not changed when editing files in place.
Due to my changes in #645, all relevant files should now use Unix-style line endings, so this would be a sane default (e.g. if the modification of files leads to a mixture of \n and \r\n).
Pull request #697 created #created-697
645: Use \n instead of \r\n in XML documents
#645 (editorial)
Pull request #696 created #created-696
566: Rework query parameters on build-uri/parse-uri
Completes action QT4CG-042-02.
- Rework the query segments so that they're a simple map of key/value pairs.
- Rename
query-segments
toquery-parameters
.
Issue #692 closed #closed-692
Use sequences instead of arrays in fn:parse-uri output
Issue #695 created #created-695
Step in RangeExpression
In the XPath specs, it seems that a simple modification from...
[34] | RangeExpr | ::= | AdditiveExpr ( "to" AdditiveExpr )? -- | -- | -- | --
...to...
[34] | RangeExpr | ::= | AdditiveExpr ( "to" AdditiveExpr ("step" AdditiveExpr)? )? -- | -- | -- | --
...would be nonintrusive, and bring some nice benefits customary in other PLs, allowing expressions such as 1 to 9 step 2
and 100 to -100 step -4
.
Thoughts?
Pull request #694 created #created-694
XQFO minor edits, with new examples and notes, 2 through 4.6
Minor edits here are motivated by clarity or localized consistency. In one case, the change of 0
to 0
to 0
to 9
appears to address an important typo.
I have introduced select examples, to illustrate points in the corresponding rules.
The notes I have introduced need some context. Option 9 for the primary format token is somewhat vague, and the attached note of clarification stokes the imagination. I have trimmed that note for clarity (without substantive changes) but introduced a set of notes later, to help caution developers on unexpected behaviors with option 9, and secondarily to caution processor implementers on the challenges inherent in supporting this option. If I had my druthers, I would advocate deprecating option 9. It is--to use a technical term--squishy.
Spec editors: I am happy to pull back on any of this.
QT4 CG meeting 045 draft minutes #minutes-09-12
Draft minutes published.
Issue #160 closed #closed-160
Support named arguments on dynamic function calls
Issue #672 closed #closed-672
XFO minor edits, chap. 1
Issue #671 closed #closed-671
Switch expression without operand (analogous to XSLT choose)
Issue #678 closed #closed-678
671 switch sans operand
Issue #669 closed #closed-669
Typo in XSLT §26.4 - "appearing appearing"
Issue #679 closed #closed-679
669 - fix typo "appearing appearing"
Issue #665 closed #closed-665
Typo in fn:items-before and fn:items-ending-where
Issue #681 closed #closed-681
665: Fix typos in fn:items-XX functions
Issue #637 closed #closed-637
Annotation Values: Booleans
Issue #682 closed #closed-682
637: allow true() and false() as function annotation values
Issue #90 closed #closed-90
Simplified simplified stylesheets
Issue #599 closed #closed-599
90: Simplified stylesheets with no xsl:version
Issue #93 closed #closed-93
Support order by ascending/descending from a string value.
Issue #623 closed #closed-623
93: sort descending
Issue #600 closed #closed-600
fn:decode-from-uri: counterpart of fn-encode-to-uri
Issue #631 closed #closed-631
600: fn:decode-from-uri
Issue #693 created #created-693
QT4 Tests without counterpart in the specs
The following functions are not defined in the current spec:
fn:unparcel
,fn:parcel
→ droppedfn:xdm-to-json
→ #576fn:concat()
→ see #701fn:parts
→ see #463codepoints-to-string()
, etc. (sequence-values arguments)
Pull request #692 created #created-692
Use sequences instead of arrays in fn:parse-uri output
Completes action QT4CG-042-01 on NW.
There is a corresponding PR against the test suite.
QT4 CG meeting 045 draft agenda #agenda-09-12
Draft agenda published.
Pull request #691 created #created-691
688 Semantics of local union types, enumeration types, etc
Fix #688.
This PR fleshes out the detailed semantics of local union types, enumeration types, and type aliases. It fills a number of gaps in the current specification but doesn't aim to change the overall intent.
Pull request #690 created #created-690
687 Clarify constructor functions for user-defined types
Clarifies the rules for constructor functions, especially for list and union types, and for types defined by means of type aliases rather than in an imported schema. Fix #687.
Issue #689 created #created-689
fn:stack-trace: keep, drop, replace with $err:stack-trace ?
The current specification contains a diagnostic function called fn:stack-trace
. Many other languages provide a similar function: The returned output can possibly help to understand which function calls led to an error during the evaluation of a code.
Still, I have strong doubts that it is a good decision to include this function in the standard:
The specification gives you a vast amount of freedom how to implement and optimize things. As a result, it’s completely feasible and reasonable to rewrite the following code…
declare function local:double($f) {
$f * 2
};
(1 to 6) ! local:double(.)
…to (1 to 6) ! (. * 2)
at compile time. If a user adds a fn:stack-overflow
call in the function body, s·he would expect to find the function invocation of the original code representation in the output. As always, there are technical solutions to achieve this (store additional information in the evaluation tree on the original query; suppress optimizations when fn:stack-trace
is found), but all of them can affect the runtime behavior and lead to different evaluation trees, hiding possible bugs in the implementation (which can be a reason to call fn:stack-trace
at all).
A standard should provide a minimum amount of assurance that a function behaves similarly across different implementations. At this time, I don’t believe we can’t give that guarantee.
Related: #55, #686
– As an alternative, a stack trace could optionally be created by an implementation when an error is triggered.
Issue #688 created #created-688
Coercion rules for union types and enumeration types
The coercion rules for enumeration types have not been defined (there is a TODO in the spec).
For union types (including both schema-defined and locally-defined union types), the rules appear to need some further work. Given types R1
and R2
that are defined by restriction from B1
and B2
, if an atomic value V
is an instance of B1
that conforms to the rules of R1
, then the relabelling coercion means it V will now be acceptable where the required type is R1
. But if the required type is union(R1, R2)
, the relabelling coercion is not invoked. This feels inconsistent, since union(R1, R2)
might be expected to accept anything that R1
accepts.
Test case FunctionCall-056 (currently failing) illlustrates the problem.
Issue #687 created #created-687
Constructor functions for user-defined types
This is a deficiency in the 3.1 F+O specification.
Constructor functions for user-defined types are very poorly described:
-
It's unclear how anonymous types are handled. The spec says there is a constructor function for every simple type in the static context. That would include anonymous types. But constructor functions for anonymous types, if they exist, are essentially useless, because their names are not known.
-
The semantics of constructor functions for user-defined list and union types are described very vaguely, by analogy with built-in types; and the analogy points to the section on built-in atomic types which doesn't cover union and list types.
-
For a union type
U
, it says that the return type of the constructor U(x) is defined asxs:anyAtomicType
. Why not define it asU
? Perhaps this predates the ability to use union types as return types.
Issue #686 created #created-686
XQFO presentation of diagnostic functions
From an informal discussion on Slack, I feel that clarity is needed in the Diagnostic tracing section. The problem is that fn:trace()
and fn:log()
introduce the terms "trace output" with no definition or explanation, and this is easily confused with the primary output defined by the function signatures, and affects how readers think about the determinism of the functions. "The serialization of the trace output..." implies that the processor will necessarily serialize something, but I doubt that can or should be presumed. More needs to be said about the responsibilities of the processor in the contract for these functions.
In my opinion, this section would benefit from a brief preamble, providing context to set the stage for the rules. Some draft text for us to discuss:
Diagnostic tracing functions provide a transfer of information, either from the processor to the dynamic context, or vice versa.
The function that transfers information from the processor to the dynamic context,
fn:stack-trace()
, returns a string that can be further processed and used in the XPath expression and elsewhere in a host language.The functions that transfer information from the dynamic context to the processor,
fn:trace()
andfn:log()
, each have two effects. The first effect, the output, pertains to the returned values, defined by the function signature and essential to the XPath expression. Such output is always deterministic. The second effect, processor behavior, concerns the way the processor handles the values bound to the parameters, supplied for diagnostic tracing. Processor behavior is always directed toward the user or environment that invoked the processor. Actions may include sending messages, serializing the values and writing them to a log file or database, or something else. Unlike the output (the first effect), the results of processor behavior are implementation-defined and nondeterministic with respect to order of the parameter values.
The draft above attempts to avoid "output" to describe the processor-side diagnostics, so as to avoid potential confusion when dealing with the return-type defined in the signature. Where "trace output" appears in each of the rules, "processor behavior" can be used instead.
Questions:
- Any objections, corrections, or suggestions?
- Any other examples of how a processor might use trace diagnostics?
- Should a paragraph be added to explain briefly how
trace()
andlog()
differ fromxsl:message
? (E.g., a serialized tree should not be presumed.) - In the informal Balisage birds-of-a-feather discussion this summer, reservations were expressed by participants about the name
log()
. Is it possible to drop the function and simply extend the arity oftrace()
with a parameter$return-input as xs:boolean? := true()
?
Issue #685 closed #closed-685
Style fixes
Pull request #685 created #created-685
Style fixes
- Put the XSLT processor version comment at the end of the file instead of the beginning. Putting it before the
<!DOCTYPE html>
forces browsers into quirks mode. - Improve the XPath Functions stylesheets so that they don't put
div
elements insidep
elements when outputting examples.
Issue #684 closed #closed-684
Ignore this PR
Issue #683 closed #closed-683
XQFO context/focus in/dependent functions clarification note
Pull request #684 created #created-684
Ignore this PR
This is just norm hacking about
Pull request #683 created #created-683
XQFO context/focus in/dependent functions clarification note
Added note clarifying the relationship between context and focus, for the purpose of illustrating the relationships between focus/context in/dependent functions. Wrapped with companion <note>
s in a parent <notes>
.
Pull request #682 created #created-682
637: allow true() and false() as function annotation values
Fix #637
Issue #658 closed #closed-658
Constructor Function: Parameter Name, Zero-Arity
Pull request #681 created #created-681
665: Fix typos in fn:items-XX functions
Fix #665
Pull request #680 created #created-680
668 define case insensitive collation normatively
Fix #668
QT4 CG meeting 044 draft minutes #minutes-09-05
Draft minutes published.
Pull request #679 created #created-679
669 - fix typo "appearing appearing"
Fix #669
Pull request #678 created #created-678
671 switch sans operand
Fix #671
Issue #619 closed #closed-619
XDM ch. 6 minor edits
Issue #633 closed #closed-633
Edits ch. 4.1 through 4.15
Issue #601 closed #closed-601
fn:all → fn:every?
Issue #640 closed #closed-640
601: fn:all → fn:every?
Issue #675 created #created-675
XSLT streaming rules for new constructs
The XSLT spec has rules for the streamability of all system functions and XPath language constructs. These need updating for new 4.0 constructs.
Issue #664 closed #closed-664
663 xsl:original keywords
Pull request #674 created #created-674
663: Describe how calls to xsl:original with keywords work
Rework PR 664 (fix for 663) on new baseline
Fix #663.
Pull request #673 created #created-673
HTML namespace changes
This PR applies my action items for updating the HTML XDM mapping around namespaces and local names.
Note: this currently makes dm:namespace-nodes
return an empty sequence. I'm not currently sure what the best approach is here.
Pull request #672 created #created-672
XFO minor edits, chap. 1
Most substantive change is the trimming of prose held over from before the revision of the diagrams.
Issue #671 created #created-671
Switch expression without operand (analogous to XSLT choose)
By syntax analogy with switch
,
choose
test ($a < $b) return "lesser"
test ($a > $b) return "greater"
test ($a eq $b) return "equal"
default return "Getting the default is hard to explain"
Something like
ChooseExpr ::= "choose" ChooseTestClause+ "default" "return" ExprSingle
ChooseTestClause ::= "test" "(" Expr ")" "return" ExprSingle
I know I can do this by stringing if-then-else together, but would greatly appreciate the cleaner and more manageable syntax for those times when many tests are inescapable.
Issue #670 created #created-670
The trouble with XPath‘s fn:fold-right. A fix and Proposal for fn:fold-lazy
The trouble with XPath‘s fn:fold-right.
Laziness in XPath.
This article discusses the standard XPath 3.1 function fn:fold-right, its definition in the official Spec, its lack of apparent use-cases and its utter failure to reproduce the (lazy) behavior of Haskell’s foldr , which is presumed to be the motivation behind fn:fold-right.
The 2nd part of the article introduces the implementation of short-circuiting and generators, which together unprecedentedly provide laziness in XPath. Based on these, a new XPath function: fn:fold-lazy is implemented, that utilizes laziness, similar to Haskell’s foldr. This behavior is demonstrated in specific examples
Introduction
Higher order functions were introduced into XPath starting with version 3.0 in 2014 and later in version 3.1 in 2017.
The definition of the standard function fn:fold-right closely mimics that of Haskell’s foldr, and anyone acquainted with foldr can be left with the impression that fn:fold-right would have identical behavior (and hence use-cases) as Haskell’s foldr.
Unfortunately, there is a critical difference between the definitions of these two functions. Whereas the definition of foldr explicitly defines its behavior when provided with a function, lazy in its 1st argument – from Haskell’s definition of foldr:
“… Note that since the head of the resulting expression is produced by an application of the operator to the first element of the list, given an operator lazy in its right argument, foldr can produce a terminating expression from an unbounded list.”
The XPath definition of fn:fold-right does not mention any laziness.
There is no official concept of “laziness” in XPath, thus fn:fold-right doesn’t cover some of the most important use-cases of Haskell’s foldr , which can successfully produce a result when passed an infinite (or having unlimited length) list.
This in fact makes fn:fold-right almost useless, and explains why even some of the members of the XPath 3.1 WG have stated on occasions that they do not see why the function was introduced.
fn:fold-right gone wrong – example
This Haskell code:
foldr (\x y -> (if x == 0 then 0 else x*y)) 1 (map (\x -> x - 15) [1 ..1000000])
foldr (\x y -> (if x == 0 then 0 else x*y)) 1 (map (\x -> x - 15) [1 ..10000000])
foldr (\x y -> (if x == 0 then 0 else x*y)) 1 (map (\x -> x - 15) [1 ..])
produces the product of all numbers in the following list, respectively:
[-14, -13, -12, -11, -10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, …, 999985]
[-14, -13, -12, -11, -10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, …, 9999985]
[-14, -13, -12, -11, -10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, …, ] -- up to infinity.
Because all these 3
lists contain a zero as their 15th item, the expected result is 0
when evaluating any of these 3
expressions – even in the last case where the provided as argument list is infinite. And this is indeed what happens:
Not only Haskell produces the correct result in all cases, but regardless of the list’s length, the result is produced instantaneously!
Now, let us evaluate this equivalent XPath expression with BaseX:
let $product := function($input as array(xs:integer)) as xs:integer
{
array:fold-right($input, 1, function($x as xs:integer, $y as xs:integer) as xs:integer
{if($x eq 0) then 0 else $x * $y})
},
$ar := array { (1 to 36) ! function($x as xs:integer) as xs:integer {$x -15}(.)}
return
$product($ar)
Here we are passing a list containing just 36 integers. The result is quite unexpected and spectacular:
Here is what happens:
-
Even though when processing the 15th integer in the array the result is 0, the XPath processor continues to evaluate the RHS (right-hand side) until the last member of the array (
36
). -
On “its way back” the XPath processor multiplies:
(36*35*34*33*32* …*6*5*4)*3
, and the result of the right-most multiplication is bigger than the maximum integer (or decimal) that this XPath processor supports. -
C r r r a s s h … As seen in the screenshot above.
The root cause for this unfortunate behavior is that the XPath processor doesn’t support short-circuiting and laziness. And thus, fn:fold-right is useless even in the normal/trivial case of a collection (array) with only 36 members. Not to speak of collections containing millions of members, or even infinite ones…
Let us see what happens when evaluating similar expressions with another XPath processor: Saxon.
Saxon seems to produce the correct result, however it takes exponentially longer times when the length of the passed array is increased, leading to this one:
It took 261 seconds for the evaluation to be done, but accessing the 15
th member of the array and short-circuiting to 0
should be almost instantaneous…
So what happens in this case? The difference between BaseX and Saxon is that Saxon implements a “Big Integer” and thus can multiply almost 1 000 000
integers without getting a value that cannot be handled… But doing almost 1
M multiplications of big integers obviously takes time …
What is common in these two examples? Obviously, neither BaseX nor Saxon detects and performs short-circuiting. Why is this? What is the reason for this?
I asked a developer of BaseX if I could submit a bug about this behavior. His answer was shockingly unexpected: “This is not a bug, because no requirement in the Specification has been violated”.
Thus, the main cause of the common behavior of both XPath processors to handle the evaluation of these examples, is the specification of the function, which blatantly allows such crap to happen.
Now that we see this, let us try to provide the wanted, useful behavior writing our own function.
The fix: Step 1 – fn:fold-right in pure XPath
Before going in depth with our pure XPath solution, we need as a base a pure-XPath implementation of fn:fold-right .
let $fold-right-inner := function ($seq as item()*,
$zero as item()*,
$f as function(item(), item()*) as item()* ,
$self as function(*)
) as item()*
{
if(empty($seq)) then $zero
else
$f(head($seq), $self(tail($seq), $zero, $f, $self))
},
$fold-right := function ($seq as item()*,
$zero as item()*,
$f as function(item(), item()*) as item()*
) as item()*
{
$fold-right-inner($seq, $zero, $f, $fold-right-inner)
},
$fAdd := function($x, $y) {$x + $y},
$fMult := function($x, $y) {$x * $y}
return
$fold-right((1 to 6) ! function($x){$x - 3}(.), 1, $fMult)
When we evaluate the above with any of the two XPath processors, the correct result is produced:
720
And we certainly do have exactly the same problems as the provided built-in fn:fold-right with a similar example:
The fix: Step 2 – $fold-right-sc
detecting and performing short-circuiting
Now that we have $fold-right
as a base, let us add code to it so that it will detect and perform short-circuiting. We will implement a function similar to $fold-right
but having this signature:
$fold-right-sc := function ($seq as item()*,
$zero as item()*,
$f as function(item(), item()*) as item()*,
$fGetPartial as function(*)
) as item()*
The last of the function’s parameters $fGetPartial
returns a new function that is the partial application of $f
, when its 1st argument is set to the current member of the input sequence $seq
. The idea is that whenever short-circuiting is possible, $fGetPartial
returns not a function having one argument (arity 1
), but a constant – a function with 0
arguments (arity 0
).
If the arity of the so produced partial application is 0
, then our code will immediately return with the value $f($currentItem)
.
Here is the complete code of $fold-right-sc
:
let $fold-right-sc-inner := function ($seq as item()*,
$zero as item()*,
$f as function(item(), item()*) as item()*,
$fGetPartial as function(*),
$self as function(*)
) as item()*
{
if(empty($seq)) then $zero
else
if(function-arity($fGetPartial(head($seq), $zero)) eq 0)
then $fGetPartial(head($seq), $zero) ()
else $f(head($seq), $self(tail($seq), $zero, $f, $fGetPartial, $self))
},
$fold-right-sc := function ($seq as item()*,
$zero as item()*,
$f as function(item(), item()*) as item()*,
$fGetPartial as function(*)
) as item()*
{
$fold-right-sc-inner($seq, $zero, $f, $fGetPartial, $fold-right-sc-inner)
},
$fAdd := function($x, $y) {$x + $y},
$fMult := function($x, $y) {if($x eq 0) then 0 else $x * $y},
$fMultGetPartial := function($x, $y)
{
if($x eq 0)
then function() {0}
else function($z) {$x * $z}
}
return
$fold-right-sc((1 to 1000000) ! function($x){$x - 3}(.), 1, $fMult, $fMultGetPartial)
Do note:
-
If the current item (the head of the sequence) is
0
, then$fMultGetPartial
returns a function with0
arguments (constant) that produces0
. -
$fold-right-sc
(inner) treats differently a partial application of arity0
from a partial application with arity1
. In the former case it simply produces the expected constant value without recursing further. Here is the relevant code fragment
if(empty($seq)) then $zero
else
if(function-arity($fGetPartial(head($seq), $zero)) eq 0)
then $fGetPartial(head($seq), $zero) ()
else $f(head($seq), $self(tail($seq), $zero, $f, $fGetPartial, $self))
And now BaseX has no problems with the evaluation, even though the input sequence is of size 1M. The complete evaluation takes just a fraction of a millisecond (0.04 ms):
With Saxon things are not so good. Even though Saxon produces the correct result, evaluating the expression with an input sequence of size 1M takes 0.5 seconds (half a second), and evaluating the expression with an input sequence of 10M takes 5 seconds (10 times as long):
What is happening?
Even though Saxon performs much faster than the previous 261 seconds, due to detecting the short-circuiting possibility and performing the short-circuit, Saxon still processes all 10M items when evaluating this subexpression (which obviously the more optimized BaseX doesn’t do in advance):
(1 to 10000000) ! function($x){$x - 3}(.)
Therefore, we have one remaining problem: How to prevent long sequences (or arrays) from being fully materialized before starting the evaluation of $fold-right-sc
?
The fix: Step 3 – replacing collections with generators
Generators are well known and provided out of the box in many programming languages. Per Wikipedia:
“In computer science, a generator is a routine that can be used to control the iteration behaviour of a loop. All generators are also iterators.[1] A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values. However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.”
A full-fledged generator (such as implemented in C#) is an instance of a Finite State Machine(FSM), and implementing it in full generality goes beyond the topic and goals of this article. Expect another article soon that will provide this.
Here we will implement a simple kind of generator, that when passed an integer index $N
, produces the $N
th item of a specific sequence. Although this is probably the simplest form of a generator, it can be useful in many cases and is a good illustrative solution to our current problem. The whole approach of replacing “something” with a function that must be called to produce this “something” is known as “lifting”
First, we will add to our $fold-right
just the use of generators, without the detection and performing of short-circuiting:
let $fold-right-lifted-inner := function ($seqGen as function(xs:integer) as array(*),
$index as xs:integer,
$zero as item()*,
$f as function(item(), item()*) as item()* ,
$self as function(*)
) as item()*
{
let $nextSeqResult := $seqGen($index),
$isEndOfSeq := $nextSeqResult(1),
$seqItem := $nextSeqResult(2)
return
if($isEndOfSeq) then $zero
else
$f($seqItem, $self($seqGen, $index+1, $zero, $f, $self))
},
$fold-right-lifted := function ($seqGen as function(xs:integer) as array(*),
$zero as item()*,
$f as function(item(), item()*) as item()*
) as item()*
{
$fold-right-lifted-inner($seqGen, 1, $zero, $f, $fold-right-lifted-inner)
},
$NaN := xs:double('NaN'),
$fSeq1ToN := function($ind as xs:integer, $indStart as xs:integer, $indEnd as xs:integer) as array(*)
{
if($ind lt $indStart or $ind gt $indEnd)
then array{true(), $NaN}
else array{false(), $ind}
},
$fSeq-1-6 := $fSeq1ToN(?, 1, 6),
$fAdd := function($x, $y) {$x + $y},
$fMult := function($x, $y) {$x * $y}
return
$fold-right-lifted($fSeq-1-6, 1, $fMult)
Here we see an example of a simple generator – the function $fSeq1ToN
.
This function returns an array with two members: a Boolean, which if true()
indicates the end of the sequence, and the 2nd member is the current head of the simulated sequence.
The generator has two other parameters which are the values (inclusive) for the start-index and the end-index. Whenever the passed value of $ind
is outside of this specified range, $fSeq1ToN
returns a result array with its first member set to true()
(the 2nd member of the result must be ignored in this case), which indicates end-of sequence.
Otherwise it returns array{false(), $ind}
. It is the responsibility of the caller to stop calling the generator:
$fSeq1ToN := function($ind as xs:integer, $indStart as xs:integer, $indEnd as xs:integer) as array(*)
{
if($ind lt $indStart or $ind gt $indEnd)
then array{true(), $NaN}
else array{false(), $ind}
}
Evaluating the complete XPath expression above produces the correct result both in BaseX and in Saxon: the product of the integers 1
to 6
:
Now that we have successfully implemented the last missing piece of our complete solution, let us put everything together:
The fix: Step 4 – putting it all together
Finally we can replace the input sequence in $fold-right-sc with a generator:
let $fold-right-sc-lifted-inner := function ($seqGen as function(xs:integer) as array(*),
$index as xs:integer,
$zero as item()*,
$f as function(item(), item()*) as item()* ,
$fGetPartial as function(*),
$self as function(*)
) as item()*
{
let $nextSeqResult := $seqGen($index),
$isEndOfSeq := $nextSeqResult(1),
$seqItem := $nextSeqResult(2)
return
if($isEndOfSeq) then $zero
else
if(function-arity($fGetPartial($seqItem, $zero)) eq 0)
then $fGetPartial($seqItem, $zero) ()
else $f($seqItem, $self($seqGen, $index+1, $zero, $f, $fGetPartial, $self))
},
$fold-right-sc-lifted := function ($seqGen as function(xs:integer) as array(*),
$zero as item()*,
$f as function(item(), item()*) as item()*,
$fGetPartial as function(*)
) as item()*
{
$fold-right-sc-lifted-inner($seqGen, 1, $zero, $f, $fGetPartial, $fold-right-sc-lifted-inner)
},
$NaN := xs:double('NaN'),
$fSeq1ToN := function($ind as xs:integer, $indStart as xs:integer, $indEnd as xs:integer) as array(*)
{
if($ind lt $indStart or $ind gt $indEnd)
then array{true(), $NaN}
else array{false(), $ind}
},
$fSeq-1-6 := $fSeq1ToN(?, 1, 6),
$fSeq-1-1M := $fSeq1ToN(?, 1, 1000000),
$fSeq-1-1M-minus-3 := function($n as xs:integer)
{
array{$fSeq-1-1M($n)(1), $fSeq-1-1M($n)(2) -3}
},
$fAdd := function($x, $y) {$x + $y},
$fMult := function($x, $y) {$x * $y},
$fMultGetPartial := function($x, $y)
{
if($x eq 0)
then function() {0}
else function($z) {$x * $z}
}
return
$fold-right-sc-lifted($fSeq-1-1M-minus-3, 1, $fMult, $fMultGetPartial)
Now this expression (and even one involving a sequence of 10M items take 0
seconds to be evaluated in both BaseX and Saxon, producing the correct result 0
:
Summary
This article demonstrated the problems inherent to the standard XPath fn:fold-right and correctly determined the root causes for these problems: no short-circuiting and no collection generators.
Then a step-by-step solution was built that shows how to implement lazy evaluation in XPath based on short-circuiting and collection generators. This fixed the error raised by BaseX and dramatically reduced the evaluation time of
Saxon from 261
seconds to 0
seconds.
The new function produced can be called $fold-lazy
and is a good candidate for inclusion in the XPath 4.0 standard functions.
A complete design and implementation of a general collection-generator will be published in a separate article.
Issue #667 closed #closed-667
XPath minor edits, 4.16 through end
Issue #642 closed #closed-642
561: Editorial (abbreviation fn=function, drop lambda syntax)
Issue #646 closed #closed-646
508: Editorial, examples revised (array:split, array:slice, others)
Issue #627 closed #closed-627
624: Adjusted function category descriptions
Issue #662 closed #closed-662
658b: changes to constructor functions
Issue #656 closed #closed-656
Better return type for map pair
Issue #654 closed #closed-654
Add covers-40 attribute to generated tests
Issue #644 closed #closed-644
Adjusted CSS to target classes, not element + classes
Issue #643 closed #closed-643
414, 546: Adjusted XDM description of xs:string, added coding
Issue #669 created #created-669
Typo in XSLT §26.4 - "appearing appearing"
The word "appearing" is doubled.
Issue #668 created #created-668
Definition of HTML case-insensitive collation
The semantics of the collation URI http://www.w3.org/2005/xpath-functions/collation/html-ascii-case-insensitive are described (in F&O section 5.3.4) are described by reference to the HTML5 "living spec". The cross-reference to a changing spec is inevitably fragile and I suggest we make it non-normative. I also suggest that we define the ordering implied by this collation rather than leaving it implementation defined.
A sufficient definition is: the comparison of two strings A and B under this collation delivers the same result as the comparison of ascii-lower-case(A) to ascii-lower-case(B) under the Unicode codepoint collation, where ascii-lower-case($S) function is translate($S, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcedfghijklmnopqrstuvwxyz").
Perhaps we should also consider defining a collation URI that is unicode-case-blind, with the same definition except that ascii-lower-case(X) is replaced by fn:lower-case(X).
Pull request #667 created #created-667
XPath minor edits, 4.16 through end
Various small edits. I added two simple examples to help readers quickly understand the difference between =>
and =!>
before going to more complex examples.
Issue #666 created #created-666
Polyfill function implementations
For transition purposes, it may be useful to provide or enable "polyfill" implementations of functions that are specified in QT40, but not yet available in all implementations. Currently this is not possible because the relevant namespaces are reserved.
I propose that we relax the rule on reserved function namespaces:
(a) in XQuery, if the function has an annotation along the lines `%polyfill('http://www.w3.org/2005/xpath-functions'), where the parameter indicates that the function should be injected into the specified namespace instead of the namespace of the containing module.
(b) in XSLT, if the attribute xsl:function/@override-extension-function="no"
is present. A function is allowed to be in a reserved namespace if this attribute is present.
I've chosen syntax here that's already available in 3.0/3.1, to minimise the impact on existing processors. Of course, we can't retrospectively change the 3.0/3.1 specs to authorise older processors to make this work as intended. But we can suggest that they bend the rules.
In both cases, (a) if the annotation is present then the rules on reserved namespaces don't apply, and (b) the function declaration is ignored if the processor provides its own internal implementation of the function.
I'm proposing to publish polyfill implementations of many of the new functions (not the complex ones like parse-html!). But I don't intend to make this a QT4 deliverable, I'm thinking of doing it as an open source project in GitHub/Saxonica space.
Issue #665 created #created-665
Typo in fn:items-before and fn:items-ending-where
The specifications refer to $seq in place of $input.
Pull request #664 created #created-664
663 xsl:original keywords
Fix #663.
A simple fix, just specify that if xsl:original is called with keywords, the keywords used are those of the overridden function.
Issue #663 created #created-663
Calling xsl:original() with keywords
We need to define what happens if xsl:original() is called with keyword arguments.
The answer isn't trivial, because (for 3.0 compatibility reasons) an overriding function isn't required to use the same parameter keywords as the function it overrides.
Perhaps we should recognize that there are functions that do not support argument keywords. This might be the case, for example, with Java or C# extension functions.
Pull request #662 created #created-662
658b: changes to constructor functions
Changes argument name to "value", makes argument default to context item.
Issue #661 closed #closed-661
658: constructor functions
Pull request #661 created #created-661
658: constructor functions
Changes argument name of constructor functions to value
, makes the argument optional, revises the way the proto
markup is used to indicate emptyOk arguments.
Fix #658
Issue #638 closed #closed-638
Editorial: Avoid e.g.
Issue #660 created #created-660
Static functions, default parameters
In the current XQuery draft in 5.18.3 Function Parameters, it’ stated that:
If a parameter is optional, then all subsequent parameters in the list must also be optional. In other words, the parameter list includes zero or more required parameters followed by zero or more optional parameters.
I would suggest raising XPST0017
if that’s not the case.
Pull request #659 created #created-659
647: schema location hints
Fix #647
Note: Need to check that this builds successfully. In my working copy, the new text for "import schema" found its way into the "xquery-assembled.xml" file but not into the final HTML. I can't see any reason for this.
Issue #658 created #created-658
Constructor Function: Parameter Name, Zero-Arity
The parameter name for constructor functions is $arg
: https://qt4cg.org/specifications/xpath-functions-40/Overview.html#constructor-functions
We should change it to $value
, in alignment with the XQFO functions:
fn:string(value := 123),
xs:string(value := 123)
Issue #657 created #created-657
User-defined functions in main modules without `local` prefix
I wonder where this has been discussed before (but I can’t find it):
For simple scripts and main modules, the necessity to prefix all functions with a local
prefix is cumbersome. Next, it’s counterintuitive as the prefix is not required for variable declarations:
declare function local:f() { 1 };
declare variable $x := 1;
$x + local:f()
It’s possible to use declare default function namespace '...';
, but that’s doesn’t feel much easier:
declare default function namespace 'x';
declare function f() { 1 };
f()
I wonder if we can do one of the following things?
- Allow functions without namespace (
declare function x() {}
,declare function Q{}x() {}
). - Assign functions without namespace to the default function namespace.
Pull request #656 created #created-656
Better return type for map pair
Uses a record type (record(key, value)) for the return type of map:pair, to give more precision and to align with map:pairs() and map:of-pairs()
Issue #655 created #created-655
fn:sort-with: Comparators
See https://github.com/qt4cg/qtspecs/issues/93#issuecomment-1017937220:
One solution to more powerful sorting would a variant of fn:sort that uses a comparator function. We've resisted this in the past because we can't trust a user-supplied comparator function to be well behaved (e.g. transitive). I wonder how serious an obstacle this is?
We should try to trust; there are much more hidden pitfalls in the existing language.
fn:sort
could be extended with a comparator
parameter with two arguments and returning xs:boolean
:
(: returns John, Joe, Jim, Jack :)
sort(('Jack', 'Joe', 'Jim', 'John'), comparator := op('>'))
(: returns -1, 2, -3 :)
sort((-3, -1, 2), comparator := fn($a, $b) { abs($a) < abs($b) })
Obviously, if a comparator is supplied, other parameters (collation(s)
, key(s)
, ascending
, see #623) must not be specified in parallel.
Pull request #654 created #created-654
Add covers-40 attribute to generated tests
Add covers-40 attribute to the generated test set for function keywords, to avoid failures in app-CatalogCheck test Catalog014
Issue #653 created #created-653
XQuery - option to suppress entity expansion
As an enabler for #652 (and for other reasons) I propose that XQuery should have an option to suppress recognition of entity references. This is appropriate for any context in which XQuery expressions are embedded in XML (including our own test suite), where entity expansion will have already been done before the XQuery text is parsed. With this change, any XPath expression becomes a valid and equivalent XQuery expression.
This could be done using a new prolog declaration such as
declare entity-expansion on|off;
If set to "off", &
is treated as a normal character in contexts such as string literals and direct element constructors, rather than signalling an entity reference. The default remains "on".
Issue #652 created #created-652
Defining a common function library for XPath, XSLT, and XQuery applications
This issue is motivated by Mary Holstege's talk at Balisage 2023: Adventures in Single-Sourcing XSLT and XQuery
https://www.balisage.net/Proceedings/vol28/html/Holstege01/BalisageVol28-Holstege01.html
It is also motivated by other issues that have been raised here proposing improved capabilities for writing applications in pure XPath.
I propose that we define a syntax for creating a module containing a set of function definitions that can be used to define a function library available to both XQuery and XSLT applications, and potentially also by pure XPath applications. A file should contain all the functions in one namespace. The format should be suitable for translation into other formats such as XSLT and XQuery, which means it should be in XML (though we could also consider JSON). We should provide XSLT stylesheets that convert libraries in this format into XSLT stylesheet modules or XQuery library modules.
Function signatures should be expressed in an XML syntax similar to xsl:function
in XSLT.
Function bodies should generally be written in the form of a single XPath expression, though there should be fallback mechanisms to allow XSLT and XQuery implementations to be supplied in cases where XPath lacks sufficient expressive power.
There should be mechanisms to define the more important components of the static context, such as namespace bindings and dependencies on other function libraries.
We could consider using this format to publish "polyfill" implementations of many of the new XPath 4.0 functions.
Issue #651 created #created-651
fn:log → fn:message
I think we've made a mistake in the choice of name for this function. Without the namespace prefix, it looks far too much like a function that computes logarithms. Also, it would be nice in the future to find some way of allowing the math functions to be called without a namespace prefix, and this choice scuppers that possibility.
I think my choice would be fn:message.
Pull request #650 created #created-650
649: fix an xsl:fallback problem
Ensures that an xsl:fallback instruction is not processed in forwards compatibility mode, so that errors in the instruction are reported rather than being silently ignored; informally encourages adoption of the same rule in 3.0 and earlier processors where possible.
Fix #649
Issue #649 created #created-649
xsl:fallback
I made the mistake of writing a test that said:
<xsl:array version="4.0">
<xsl:fallback select="array{1 to 10}"/>
<xsl:sequence select="1 to 10"/>
</xsl:array>
Now, the xsl:fallback
instruction doesn't allow an @select
attribute; and there's not much point in adding one, because xsl:fallback is only there for a processor implementing XSLT 1.0, 2.0, or 3.0, and such processors will ignore anything the XSLT 4.0 specification says.
However, we could supply some clarification of what such a processor is expected to do when it finds an xsl:fallback
element with an unexpected @select
attribute. At present, it seems that because we don't say anything else, the xsl:fallback
element is itself evaluated in forwards compatibility mode, which means that the select attribute is ignored. Since the whole purpose of xsl:fallback is to provide code that an earlier XSLT processor can handle, I think it would make much more sense to say: "the effective version for an xsl:fallback element and its descendants, unless overridden with an explicit [xsl:]version attribute, is the version of the processor in use", so a 3.0 processor executing an xsl:fallback
instruction in a 4.0 stylesheet reports a static error if it finds a construct like the above.
Although the 4.0 spec cannot dictate what a 3.0 processor does, we could add a non-normative note encouraging this interpretation.
Issue #648 created #created-648
Schema for FN namespace should block extension and substitution
Weird things can happen if the user defines a schema that imports the schema for the FN namespace and then adds members to its substutition groups or extends its complex types. We can prevent this happening by blocking substitution and extension. We should also specify that when we validate the input to fn:xml-to-json, xsi:schemaLocation must be disabled.
Issue #647 created #created-647
XQuery: import schema with multiple location hints
XQuery 3.1 clarified what is intended when an "import module" declaration provides multiple location hints - there's now a clear indication that the processor should expect to load multiple modules all with the same module namespace.
For "import schema" it's still completely vague what is intended, and there's no analogy in XSD or XSLT which only ever allow a single location URI to be supplied. Currently we say:
The URILiterals that follow the at keyword are optional location hints, and can be interpreted or disregarded in an implementation-dependent way. Multiple location hints might be used to indicate more than one possible place to look for the schema or multiple physical resources to be assembled to form the schema.
I propose changing this to:
The URILiterals that follow the at keyword are optional location hints, intended to allow a processor to locate schema documents containing definitions of the required schema components in the target namespace. Processors may interpret or disregard these hints in an implementation-dependent way. The recommended default strategy is as follows (but this may be varied through user options):
- All the location hints are dereferenced, treating them as relative URIs
- If any location hint cannot be dereferenced, or fails to resolve to a valid schema document with the required target namespace, then that location hint is disregarded (optionally with a warning); but if none of the location hints can be resolved to a valid schema document with the required target namespace, then a static error is reported.
- If multiple location hints are dereferenced, yielding multiple schema documents A, B, and C, then they should be treated as if there were a single schema document containing
xs:include
declarations referencing A, B, and C. This implies that the schema documents must together comprise a valid schema, for example there cannot be two different type definitions with the same name. - Notwithstanding the previous rule, if a processor is able to establish that two or more location hints refer to identical or equivalent schema documents, then the duplicates should be ignored.
This text gives users and implementors alike a much better sense of what is intended, while still retaining flexibility for implementations to do something different.
Pull request #646 created #created-646
508: Editorial, examples revised (array:split, array:slice, others)
@ndw @michaelhkay I wonder what we should do with minor edits like this (which fixes an example)? Should we merge them by ourselves, or wait for someone to approve and merge it?
Issue #645 created #created-645
Editorial: Use `\n` instead of `\r\n` in XML documents
If we use a consistent newline encoding in all XML documents, it will be easier to perform cleanups and to edit documents in the GitHub frontend. Most documents already use Unix-style newlines. If I am correct, only three documents need to be updated:
specifications/xslt-xquery-serialization-40/src/errors.xml
specifications/xslt-xquery-serialization-40/src/ns-xslt-xquery-serialization.xml
specifications/xslt-xquery-serialization-40/src/xslt-xquery-serialization.xml
Pull request #644 created #created-644
Adjusted CSS to target classes, not element + classes
Pull request #643 created #created-643
414, 546: Adjusted XDM description of xs:string, added coding
Per meeting today, made good on my comment in #546
Pull request #642 created #created-642
561: Editorial (abbreviation fn=function, drop lambda syntax)
Old ->
lambda syntax removed from various examples; minor unifications.
I believe it’s ready to merge; please jump in otherwise.
Issue #634 closed #closed-634
471: Quotes (missing cases)
QT4 CG meeting 043 draft minutes #minutes-07-25
Draft minutes published.
Issue #641 created #created-641
Serialization fallback.
I propose that we drop some serialization errors in favour of producing a fallback representation of the supplied value.
The rationale is that (a) serialization is often used in contexts like xsl:message where the primary purpose is diagnostic, and the last thing you want when producing diagnostics is a secondary error; and (b) seeing a fallback representation of an inappropriate value often shows you much more clearly what you have done wrong than any error message can do.
Compare with the .toString()
method in Java and similar languages, which always outputs something even if it's not quite what you wanted.
I'm not proposing to change the principle that the output should always be syntactically valid (e.g. well formed XML or JSON).
I think some of the specific error conditions we might drop are:
-
In sequence normalization rule 7, instead of raising an error when an attribute, namespace, or function (including a map or array) is encountered, serialize that item using the adaptive output method, treat the result as a text node, and insert the text node into sequence S6.
-
In the JSON output method: when a sequence of two or more items is encountered, instead of raising SERE0023, treat it as an array containing those items.
Closely related, and perhaps best considered together: should the fn:string()
function accept anything as input, and never raise an error?
Issue #632 closed #closed-632
SENR0001: Error description updated
Issue #630 closed #closed-630
XPath spec ch. 3 minor edits
Issue #574 closed #closed-574
fn:log: Trace and discard results
Issue #629 closed #closed-629
574: fn:log: Trace and discard results
Issue #508 closed #closed-508
New Map & Array Functions: Inconsistencies
Issue #609 closed #closed-609
508: New Map & Array Functions: Inconsistencies
Issue #23 closed #closed-23
Extending element and attribute tests to NameTest unions
Issue #606 closed #closed-606
Allow element(A|B) and attribute(A|B)
Issue #602 closed #closed-602
Semi-strict static typing: reporting implausible expressions
Issue #603 closed #closed-603
602 Implausible Expressions
Issue #561 closed #closed-561
Alias for `function` keyword; drop thin arrow syntax?
Issue #589 closed #closed-589
561: abbreviation fn=function, drop lambda syntax
Issue #575 closed #closed-575
359: fn:void: Absorb result of evaluated argument
Issue #414 closed #closed-414
Lift character set restriction of xs:string
Issue #546 closed #closed-546
414: Attempt to implement expanding the allowed character repertoire
Issue #533 closed #closed-533
413: Spec for CSV parsing with fn:parse-csv()
Pull request #640 created #created-640
601: fn:all → fn:every?
@ndw Should be ready to be merged
Issue #514 closed #closed-514
Lambda expression: Annotations
Issue #639 created #created-639
fn:void: Naming, Arguments
A new function fn:void
was added to the spec (see #359 for details).
This issue can be used to discuss alternative names for the function, as was suggested by @dnovatchev.
Issue #638 created #created-638
Editorial: Avoid e.g.
Occurrences of e.g.
should be replaced with alternatives, such as (for example)
.
See https://github.com/qt4cg/qtspecs/pull/629#issuecomment-1649952964
Issue #637 created #created-637
Annotation Values: Booleans
Functions annotations in XQuery have become a popular feature to attach vendor-specific information (for unit testing, locking, RESTXQ, etc.) to functions.
Annotation values are limited to literals, though. It would often be helpful to supply boolean values, but we don’t have literals for that in the language.
I suggest enhancing the existing grammar…
Annotation ::= "%" EQName ("(" Literal ("," Literal)* ")")?
…and allowing the strings false()
and true()
as values:
Annotation ::= "%" EQName ("(" AnnotationValue ("," AnnotationValue)* ")")?
AnnotationValue := Literal | "false()" | "true()"
The suggestion is upward compatible if we should decide later on that we want to allow arbitrary expressions for annotation values.
Issue #636 closed #closed-636
Ternary operator
Issue #636 created #created-636
Ternary operator
Per #171 the WG decided to allow the ternary operator. I'm looking at 4.16 of the current version of XPath 4.0 and the ternary operator is presented as illustrative, but the operator does not appear to have been properly introduced and defined in the specs. The terms "ternary", "??", and "!!" appear only twice each, and in contexts that could be confused as being illustrative.
By my reading, the definition of [11] ExprSingle
should be expanded to allow a new option, call it TernaryOption
, and to the grammar should be new entry TernaryOption ::= Expr "??" ExprSingle "!!" ExprSingle
. Does such a definition allow any ambiguous constructs?
I propose that 4.16 be subdivided into two subsections, the first one briefly introducing the ternary operator and the second handling if/then statements. Slight aside: the latter should also include a note pointing out how to avoid a then branch (and an example), a courtesy to developers who are overly accustomed to the unbraced approach as to not give the braced approach much thought.
Pull request #635 created #created-635
451: Schema compatibility
This PR addresses part (not all) of issue 451.
It recognises that an application may use more than one schema; for example in a pipeline using multiple stylesheets, it must be possible for the first stylesheet to produce valid output that is valid input to the second, without requiring that the two stylesheet have absolutely identical schema imports. It recognises that there are cases (for example involving substitution groups) where two schemas X and Y may both include the same type T, but produce different results when an element is validated against T. So it defines a concept of schema compatibility and defines its limitations, especially on the semantics of item types such as element(*,T)
and schema-element(E)
. The rules for schema compatibility between different modules of a query and between different packages in a stylesheet are tightened up and brought into line with each other.
Pull request #634 created #created-634
471: Quotes (missing cases)
See Matt’s comment in https://github.com/qt4cg/qtspecs/pull/533#issuecomment-1647945368
Issue #389 closed #closed-389
The fn:build-uri function needs to perform URI encoding for path and query segments
Issue #556 closed #closed-556
Serialization phase 5 note unclear
Issue #621 closed #closed-621
Removed chapter 4 from XDM
Issue #626 closed #closed-626
Adjusted serialization step 5 note
Issue #625 closed #closed-625
XPath minor edits, chh. 1-2
Pull request #633 created #created-633
Edits ch. 4.1 through 4.15
Note, this batch of edits includes a shift from "built-in" to "system" when describing functions, per edits made in XDM.
Pull request #632 created #created-632
SENR0001: Error description updated
Observed by @line-o: https://xmlcom.slack.com/archives/C01GVC3JLHE/p1689946671105499
Pull request #631 created #created-631
600: fn:decode-from-uri
I did my best to define rules for a counterpart of the fn:encode-for-uri
function, including various edge cases.
I’m convinced that the function has been requested often enough to justify its inclusion in the spec. I’m also aware that users may have different expectations regarding the details of the conversion rules. On the other hand, this discussion can be observed for other languages as well, and that’s mostly due to the… heterogeneous history of URIs, not the actual implementations. For example, URLDecode.decode
in Java converts the plus character to a space, and JavaScript’s decodeURI
adopts it unchanged. I decided to convert it as well, as fn:encode-for-uri
encodes the plus sign to %2B
.
@ndw My rules have largely been inspired by your decoding rules for fn:parse-uri
. I hope these rules can be dropped and replaced with a reference to this new function (analogous to fn:build-uri
, which references fn:encode-for-uri
).
QT4 CG meeting 043 draft agenda #agenda-07-25
Draft agenda published.
Pull request #630 created #created-630
XPath spec ch. 3 minor edits
Minor edits to ch. 3 of XPath spec. Note, this PR includes an adjustment to the XDM spec, in the form of a cross-reference, because the terms "string value" and "typed value" are heavily used in XDM but defined only in XPath.
Pull request #629 created #created-629
574: fn:log: Trace and discard results
- New function
fn:log
- Rules of
fn:trace
revised
Issue #620 closed #closed-620
[616] Converted X Node to x node
Issue #622 closed #closed-622
XDM minor edits, back material
Issue #628 created #created-628
distinct-values and duplicate-values: order of results
I've noticed that a few tests have appeared in QT4tests distinct-values() that assume the order of results is "order of first appearance" (search for assert-deep-eq
). We should either change the tests, or change the spec to require order of first appearance.
Since no implementors have objected to these tests, it seems likely that implementations are delivering results in "order of first appearance", and if that is the case, then I think it would be a convenience to users to guarantee this in the spec.
To allow for parallel implementations, we could say that the order is undefined if ordering mode is unordered.
Pull request #627 created #created-627
624: Adjusted function category descriptions
Attempted revision in light of #624. Perhaps not everything is exactly right, but it should be a step in the right direction.
Pull request #626 created #created-626
Adjusted serialization step 5 note
This brief edit addresses issue #556, which I've chosen to handle apophatically.
Pull request #625 created #created-625
XPath minor edits, chh. 1-2
All minor edits. Some edits are motivated by an attempt to address broken parallelism or some other form of localized inconsistency. Does not include questions about function descriptions, raised at #624.
Issue #624 created #created-624
XPath function definition clarification
In the XPath specs, 2.2, function definitions, the reader is informed that every (statically known) function definition takes one of three mutually exclusive categories: application, system, or external.
[Definition: Application functions are function definitions written in a host language such as XQuery or XSLT whose semantics are defined in this family of specifications. Their behavior (including the rules determining the static and dynamic context) follows the rules for such functions in the relevant host language specification.]
The first sentence appears to point to user-written functions in the host languages, and the second sentence appears to point to functions defined by the host language specifications. I assume that this category is meant to include both, and I propose the language be tightened up to make that clear. If my assumption is incorrect, then some other type of revision is needed.
Later on, when the term “built-in function” is introduced, it is not as clearly stated as it should be how that term maps onto the three-way division. It seems that “built-in function” encompasses all system functions and only those application functions that are defined by the specifications, and not user-written functions. Whatever the case, and whatever the best path of revision, this paragraph would be most effective if moved up with the tripartite category discussion.
The term "external function" is introduced here in the static context, with language that is highly suggestive of a definition. But the definition proper is reserved for the dynamic context, and in shorter, different prose that I think loses some of the considerations provided in the static context. I propose that the two passages be consolidated and located in the static context, where the two other types of functions are defined, with an xref in the dynamic context to the meaning of external function.
Down in the dynamic context:
The dynamically known function definitions may include external functions.
This sentence is a bit puzzling, because of course they are allowed to include all three types of functions -- none are forbidden, since, after all, the dynamically known functions are a superset of statically known ones. Perhaps the point is to drawn the reader's attention to those functions that are known dynamically but not statically? Perhaps this revision?: "Many of the function definitions known dynamically but not statically will be external functions, but they may include user-written application functions written in a host language, known only dynamically, e.g., through fn:transform
."
Cumulatively the above points could go beyond mere minor touch-up, so comments are welcome before I attempt any edits.
Issue #573 closed #closed-573
Node construction functions
Pull request #623 created #created-623
93: sort descending
Enhances fn:sort to allow multiple major-to-minor sort keys each of which can independently specify a collation and an ascending/descending option.
Also includes infrastructure changes to allow occurrence indicators on function arguments or results that reference a named record type.
Similar changes will be needed for array:sort; to reduce the risk of rework I propose to make those changes after this PR has been reviewed and accepted.
Fix #93
Pull request #622 created #created-622
XDM minor edits, back material
Note, the example was invalid because of a failure to access the included xsd file.
Pull request #621 created #created-621
Removed chapter 4 from XDM
As noted in Slack, chapter 4 of the XDM specs is out of place. In this PR I have moved the very general material in chapter 4 to the preamble of chapter 6 (now 5), which provided the opportunity to orient the reader to the structure of that chapter.
Cross-references to chapter 4 have been search for and dealt with. The CSS deletion comes from my observation that there is no such class infoset-mapping
in the resultant HTML file; section 4's infoset-mapping
is an id.
Pull request #620 created #created-620
[616] Converted X Node to x node
Per #616 names of nodes have been set lowercase. This PR does not address the good suggestion that xrefs and styling be selectively applied. That is reserved for a future pass.
Pull request #619 created #created-619
XDM ch. 6 minor edits
Issue #615 closed #closed-615
Xdm minor edits, chh. 3-5
Issue #128 closed #closed-128
fn:replace: Tweaks
Issue #612 closed #closed-612
128: fn:replace: Tweaks
Issue #329 closed #closed-329
Keyword parameters: Error codes
Issue #611 closed #closed-611
329: Keyword parameters: Error codes
Issue #506 closed #closed-506
fn:error: parameter names
Issue #610 closed #closed-610
506: fn:error: parameter names
Issue #607 closed #closed-607
XQFO Examples: Fixes, Formatting
Issue #21 closed #closed-21
New reserved function names
Issue #605 closed #closed-605
21: Revise appendix for reserved function names
Issue #39 closed #closed-39
URILiteral is defined in the EBNF grammar but not used
Issue #604 closed #closed-604
[Editorial] Drop the unused symbol URILiteral from the XPath grammar appendix
Issue #123 closed #closed-123
fn:duplicate-values
Issue #614 closed #closed-614
123: fn:duplicate-values
Issue #618 created #created-618
Symmetry: fn:html-doc, fn:csv-doc
If we keep fn:html-parse
and if we add fn:csv-parse
, we should also add fn:html-doc
and fn:csv-doc
.
Issue #617 created #created-617
Implicit constructor functions for record types and union types
See also Issue #397 and Issue #322, which this proposal may partially supersede.
I propose that when declaring a named record type in the static context, this should automatically create a constructor function definition for records of that type.
So in XQuery if you write
declare item type my:location as record(longitude: xs:double, latitude: xs:double);
then the static context acquires both a type (let $loc as my:location := ....
) and a function which you can call with either positional or keyword arguments (let $loc := my:position(-2.03, 50.95)
or := my:position(longitude := -2.03, latitude := 50.95)
.
The semantics of the function are roughly map{longitude: -2.03, latitude: 50.95} treat as my:position
, except that the function arguments are first coerced to the required types (in this case the decimals are coerced to double).
If the record type is extensible, the constructor function does not provide any capability to set values for extension fields. I'm not sure yet whether it will be possible to distinguish fields set to an empty sequence from fields that are absent.
This is consistent with user-defined atomic types where you automatically get a constructor function.
Similarly for named union types. If you declare
declare item type my:binary as union(xs:hexBinary, xs:base64Binary)
then you should automatically get an arity-1 constructor function my:binary($value)
with the same semantics as if my:binary
were an XSD-defined union type (that is, the same semantics as cast $value as union(xs:hexBinary, xs:base64Binary)
.
Issue #616 created #created-616
XDM: X Node vs. x node
At the risk of being branded a pedant....
Currently the XDM specs have a predilection for Attribute Node, Element Node, Comment Node, Document Node, Text Node, Processing Instruction Node, Namespace Node. But in a healthy minority of cases in the XDM specs, they are rendered lowercase.
To complicate matters further, the other specs, which rely upon the XDM specs for the definition of the terms, universally prefer the lowercase form, and they use the terms preponderantly more. A healthy sample of four of the terms across five of the specifications ("elective" excludes instances where that particular case is required, e.g., capitalization in headers):
Term | Spec | Capitalized | Electively Capitalized | Electively Noncapitalized -- | -- | -- | -- | -- Attribute node | XDM | 71 | 41 | 18 Element node | XDM | 79 | 71 | 23 Comment node | XDM | 44 | 39 | 0 Document node | XDM | 55 | 48 | 10 Attribute node | Serialization | 0 | 0 | 15 Element node | Serialization | 0 | 0 | 39 Comment node | Serialization | 0 | 0 | 2 Document node | Serialization | 0 | 0 | 15 Attribute node | XQuery | 1 | 0 | 82 Element node | XQuery | 0 | 0 | 141 Comment node | XQuery | 0 | 0 | 9 Document node | XQuery | 2 | 0 | 50 Attribute node | XFO | 1 | 0 | 28 Element node | XFO | 1 | 0 | 67 Comment node | XFO | 1 | 0 | 4 Document node | XFO | 1 | 0 | 57 Attribute node | XSLT | 8 | 0 | 100 Element node | XSLT | 16 | 0 | 77 Comment node | XSLT | 1 | 0 | 6 Document node | XSLT | 12 | 0 | 144
Options:
- Do nothing.
- Change all specs to capitalize these terms universally.
- Change only XDM to capitalize these terms universally.
- Determine a principle that would differentiate the contexts in which the term should be capitalized or not within XDM only.
- Change all specs to uncapitalize these terms universally.
My personal preference is no. 5. I do not see any passages in XDM where the reader would be confused by the term being uncapitalized (the minority cases being witnesses). Lowercase would bring XDM into conformity with the other specifications, and with how the term is commonly used outside the specs. Further, the XDM specs do not capitalize other terms it specially defines, e.g., accessor. And in all the specs, including XDM, the clear preference is to use the lowercase for the names in the raw (i.e., without "node"): element, attribute, namespace, etc.
I wanted to bring this to the group before doing any edits. I am happy to do the work, but do not want to do it if it is unwelcome, or if there is a clear preference for another option.
Pull request #615 created #created-615
Xdm minor edits, chh. 3-5
QT4 CG meeting 042 draft agenda #agenda-07-18
Draft agenda published.
Issue #491 closed #closed-491
Fix more examples in the FO 4.0 spec
Pull request #614 created #created-614
123: fn:duplicate-values
I decided to create a PR for the initial proposal of this function, as I came across at least two other use cases for it since the issue was created.
I believe that the group by
clause is the best choice for more complex operations, such as advanced comparisons or creating histograms.
Issue #613 created #created-613
Allow "union" as synonym for "|" everywhere
It seems silly to allow "union" as a synonym for "|" in some places and not others. For example it's not allowed in a "case" clause of a typeswitch, nor in a catch clause of try/catch.
We've introduced the ability to write child::(a|b)
as a synonym for child::a | child::b
, but we don't allow child::(a union b)
as a synonym for child::a union child::b
.
Pull request #612 created #created-612
128: fn:replace: Tweaks
Pull request #611 created #created-611
329: Keyword parameters: Error codes
Pull request #610 created #created-610
506: fn:error: parameter names
Pull request #609 created #created-609
508: New Map & Array Functions: Inconsistencies
map:of
renamed tomap:of-pairs
(as a hint that the input must match a specific format)array:of
renamed toarray:of-members
(as a hint…)- add
map:pair
for creating a single pair - add
array:split
for decomposing arrays to singleton arrays
Issue #608 created #created-608
Formatting Monospace (II)
Bugging you again, @ndw …
My feedback has been triggered by #607. I’ve moved various code examples into eg
blocks. In principle, the result is promising, but it produces cases such as the following one (see https://qt4cg.org/pr/607/xpath-functions-40/Overview.html):
Maybe the result is better to read if single-line and multi-line monospace is formatted identically (i.e., without lines at the top and bottom, and with a transparent background)?
Next, it would be fine if we could get rid of the scrollbars. Often, it’s not apparent from the rendering that the presentation includes results at all:
We could possibly fix it by wrapping even more examples manually. That can be challenging, though, for example if long strings are used (such as "http://www.w3.org/2013/collation/UCA?lang=en;alternate=blanked;strength=primary"
).
Finally, the formatting on mobile devices (here: an Android tablet with a Chrome renderer) differs from the desktop view: code blocks are rendered much smaller than the rest. – That’s just to mention it; I’d probably be annoyed if I was tasked with fixing that.
Pull request #607 created #created-607
XQFO Examples: Fixes, Formatting
This PR contains lots of fixes in the XQFO examples and improves the formatting of examples with nested arguments and multiline expressions.
It’s probably helpful to merge it as soon as possible to avoid conflicts with other PRs.
Pull request #606 created #created-606
Allow element(A|B) and attribute(A|B)
Fix #23
Pull request #605 created #created-605
21: Revise appendix for reserved function names
Fix #21
Pull request #604 created #created-604
[Editorial] Drop the unused symbol URILiteral from the XPath grammar appendix
Fix #39
Pull request #603 created #created-603
602 Implausible Expressions
Fix #602
Issue #602 created #created-602
Semi-strict static typing: reporting implausible expressions
Strict static typing, as originally defined for XQuery 1.0, was not a success, because it prohibits too many constructs that are perfectly reasonable to write. However, the alternative, pure dynamic typing, prevents a processor reporting many obvious errors at compile time. A compromise is optimistic static typing, where the processor is allowed to report a type error statically in the cases where it can be shown that evaluation of an expression is bound to fail at run-time.
Optimistic static typing has proved a reasonably successful compromise, but there are a number of cases where things that are obviously user mistakes cannot be reported as static errors. I propose that a processor should be allowed (not required) to treat some of these conditions as static errors.
The first of these conditions is exemplified by passing an argument whose static type is xs:integer*
to a function where the declared parameter type is xs:string*
. Under optimistic static typing this cannot be reported as a static error, because it is not bound to fail; if the actual value at run-time turns out to be an empty sequence, the call will succeed. So the proposal is that where the inferred supplied type and the required sequence types are both emptiable (that is, occurrence indicator is "?" or "*"), but their respective item types are disjoint, the processor should be allowed to report a static error.
The second condition is what I call a void path expression. Specifically, if we know statically that the result of $A/B
, or $A!B
, or $A?B
will be an empty sequence for any possible value of $A (given its inferred type), this almost certainly means the user has made a mistake, and we should be allowed to report a static error. This extends to the unary or implicit forms of these operators, based on the inferred type of the context item. This is most likely to occur with schema-aware code, where it should be possible to report a path such as A/B/C/D as incorrect if the schema does not allow such a path. But it also arises for example for $A?B
if the inferred type of $A is a non-extensible record type and B is not one of its known fields; and it arises for inappropriate combinations of axes such as @code/text()
.
Note that I'm not proposing (as XQuery 1.0 static typing did) that any expression whose result is bound to be empty is a static error; the rule is confined to a few specific operators.
Perhaps, for backwards compatibility and interoperability, we should require processors to provide an option to switch this kind of static error detection off. (For example, XSLT 1.0 code sometimes deliberately uses /..
to represent an empty sequence, and this construct would be flagged under these rules.)
Issue #601 created #created-601
fn:all → fn:every?
Feedback we got (translated):
There are the every
and some
keywords, and there are the new functions fn:some
and fn:all
. They appear to be more or less similar, so it would be consistent if fn:all
was renamed to fn:every
.
If fn:all
is called this way because there’s also fn:all-different
and fn:all-equal
, we should also have fn:some-different
and fn:some-equal
.
Issue #585 closed #closed-585
Editorial: dynamic function calls
Issue #597 closed #closed-597
Editorial fixes from #566
Issue #588 closed #closed-588
Incompleteness of xsl:sort specification
Issue #595 closed #closed-595
588: (Editorial, XSLT) minor clarifications regarding xsl:sort
Issue #592 closed #closed-592
XSLT §5.6 xsl:decimal-format - no explanation of exponent-separator
Issue #594 closed #closed-594
592: (XSLT, Editorial) Add missing description of exponent-separator
Issue #591 closed #closed-591
Show defaults in XSLT element templates
Issue #593 closed #closed-593
591: [XSLT, editorial] Add defaults to XSLT element syntax summaries
Issue #343 closed #closed-343
$collation argument: Unification
Issue #590 closed #closed-590
343: make $collation uniformly optional
Issue #365 closed #closed-365
switch, typeswitch: Optional braces
Issue #587 closed #closed-587
365: Allow braces in switch and typeswitch expressions
Issue #586 closed #closed-586
585: [Editorial] Rearrange text (and grammar) for dynamic function calls
Issue #584 closed #closed-584
Editorial: Correction to map:filter examples
Issue #317 closed #closed-317
fn:format-integer: $lang → $language ?
Issue #578 closed #closed-578
317: fn:format-integer: $lang → $language
Issue #577 closed #closed-577
Editorial: improve generator for keyword tests
Issue #555 closed #closed-555
464: Revised narrative of normalization steps for serialization
Issue #547 closed #closed-547
Action QT4CG-036-02: Further elaboration of the rules for function identity
Issue #539 closed #closed-539
FLOWR where clause with a "do when false" option
Issue #600 created #created-600
fn:decode-from-uri: counterpart of fn-encode-to-uri
Adopted from https://github.com/qt4cg/qtspecs/issues/566#issuecomment-1607397586:
The initial suggestion in #72 was to provide a function for decoding a URI. Maybe we should still think about adding a fn:decode-uri
or fn:decode-from-uri
function, in which we could tackle the open issues that need to be solved for fn:parse-uri
. The current URI decoding rules could then be replaced with a reference to this new function.
Pull request #599 created #created-599
90: Simplified stylesheets with no xsl:version
Fix #90
Issue #598 closed #closed-598
Issue90 XSLT: simplified stylesheets with no xsl:version
Pull request #598 created #created-598
Issue90 XSLT: simplified stylesheets with no xsl:version
Fix #90. Allow simplified stylesheets with no xsl:version attribute (and therefore no XSL namespace declaration)
QT4 CG meeting 041 draft agenda #agenda-07-11
Draft agenda published.
Pull request #597 created #created-597
Editorial fixes from #566
This PR addresses some minor comments from issue #566. There are more substantive comments but I think they warrant discussion first.
Issue #596 created #created-596
Pinned values: Transforming Trees
Pinned Values: Transforming JTrees
This is a continuation of ideas raised in issue #341, issue #350, and elsewhere. It's related to the requirements presented in issue #262 and issue #297.
I have also presented ideas on transforming JSON trees at XML Prague and at Balisage, and I have tried out ideas over the years in Saxon extension functions. The proposal here owes a lot to those ideas, but consolidates them in a slightly different way.
I'll use the term JTree to refer to a tree structure of maps and arrays. The key difference between a JTree and a node tree (let's call it an XTree) is that the nodes in a JTree have no identity and no parent pointers.
As a result, some operations are remarkably difficult. Let's take one example:
Consider the JSON structure
{ "cities": [
{ "name" : "Paris",
"size" : 300
},
{ "name" : "Berlin",
"size" : 300
}
]}
and suppose we want to return a modified version of this in which the size of Berlin is changed to 400. It would be nice to be able to write something like:
modify($input, $input?cities[?name="Berlin"]?size, 400)
But of course, we can't do this. The result of the second argument is simply a number, 300, and we don't want to change all instances of the number 300 to 400, we only want to change one specific instance. Without value identity, the concept of "one specific instance" has no meaning.
I'll leave it as an exercise to the reader to work out how to do this transformation using our current XSLT and XQuery capabilities. It's far more difficult than it should be. In addition, the tree-walking approach in XSLT of applying template rules recursively is inefficient, because its cost typically depends on the size of the tree, not on the size of the modification. With immutable/persistent data structures underpinning XDM maps and arrays, it should be possible to perform this modification in constant time, regardless of the size of the tree.
My solution to this is that the expression in the second argument should return a pinned value. The pinned value behaves just like a plain integer 300 when used in operations such as arithmetic, but being pinned means that its location in the original JTree is retained, meaning that it becomes possible to replace it in the JTree with a different value.
Data Model
We need a change to the data model. Any value (any item or sequence) can have the property of being pinned. If a value is pinned, then it has a property called its locus which identifies its position within a JTree.
With a small number of exceptions, specifically noted below, the fact that a value is pinned and has a locus does not change the effect of any operations on the value. For example, the fact that an integer with value 300 is pinned does not change the result of any arithmetic or comparison operations on the number.
A locus may identify a value as being the root of a JTree, or it may identify its position within a JTree. In the latter case it has two basic properties called its container and its slot. The container is a pinned sequence, array, or map, and the slot identifies the value's position within the container: if the container is a sequence or an array, then the slot is an integer position; if the container is a map, then the slot is a key value.
Operations on Pinned Values
A value (any value) can be pinned as the root of a JTree using the function fn:pin(value)
. This returns a value that is in every way identical to the original (including node identity, if it is a node or contains nodes) except for being pinned.
Some selected operations have their definition changed so that if the input is a pinned value, then the result is a pinned value. These are of two kinds:
- Operations that return an existing value unchanged generally retain the pinned property and the locus. For example if a value is bound to a variable, then the result of a variable reference will retain these properties.
- Operations that select a value within a sequence, array, or map, when that sequence, array, or map is pinned, return a pinned value whose locus identifies the container and the value's slot within that container.
So in the above example, if $input
is pinned, the expression $input?cities[?name="Berlin"]?size
returns an integer (300) whose locus is (C, "size")
, where C is the map having name="Berlin". C in turn has a locus (A, 2) where A is the array of cities, and A has a locus (R, "cities") where R is the root of the JTree (the original $input
value).
It now becomes possible to define the modify
function as follows: The first argument is a value which must be pinned. The second argument must return a pinned value which must be within the tree identified by the first argument (that is, recursively finding the container must lead to this root). The result of the modify()
function is formed by recursively replacing each container, all the way up to the root, with a new container in which the contents of the relevant slot are replaced.
Feasibility
Let's pause to ask ourselves two questions: is this reasonably feasible to implement, and is it realistically possible to expect users to understand what's going on?
Implementation
I've had an implementation of something rather similar in Saxon for years, and I don't think it's especially difficult. In effect we have a Java class PinnedValue
which extends Value
and which delegates nearly all operations (including "instance of") to a contained Value
. The one thing you need to be careful of is assuming (for example) that if a value represents an XDM array, then it will be an instance of a Java class such as XdmArray
. (The terminology in the current Saxon implementation is quite different, so don't expect to find this in the current code).
Usability
I think that basic features like the modify()
function won't be too difficult to explain. We just have to explain that (a) you can only modify a JTree if the root is first pinned, and (b) the expression used in the second argument must use a restricted set of operations: basically, those that do downward selection of values within a container.
Further Operations
I've illustrated the benefits with one particular operation, a modify() function, but the feature opens up many other possibilities as well. Here are a few:
- For pinned values we can expose the container and slot properties through functions (or through custom syntax such as axes). For example this means that in XSLT, if you are doing a recursive traversal of a JTree, a template rule for processing a particular value has access to its ancestors in the same way as is possible for XTrees.
- In addition derived properties of a pinned value can be exposed, for example the preceding and following "siblings" within an array (or indeed, within a sequence).
I'll explore some additional use cases in further posts.
Pull request #595 created #created-595
588: (Editorial, XSLT) minor clarifications regarding xsl:sort
The main issue turned out to be spurious; there is in fact an explanation of the order attribute in the right place. But I made a couple of other minor editorial changes, including dropping the default for @data-type in the (non-normative) schema for XSLT 4.0 - the default of data-type="text"
is inappropriate because it forces conversion, e.g. of dates to strings, and the default should be no conversion.
Fix #588.
Pull request #594 created #created-594
592: (XSLT, Editorial) Add missing description of exponent-separator
Fix #592
Pull request #593 created #created-593
591: [XSLT, editorial] Add defaults to XSLT element syntax summaries
Fix #591. Add default values to e:attribute entries in the XSLT syntax summaries; change the DTD to allow these; change the stylesheet to render them; add a paragraph to the Notation section to explain the conventions.
Issue #592 created #created-592
XSLT §5.6 xsl:decimal-format - no explanation of exponent-separator
This problem is inherited from XSLT 3.0
Section 5.6 gives a brief description of the purpose of every attribute of xsl:decimal-format
, with the exception of exponent-separator
.
Issue #591 created #created-591
Show defaults in XSLT element templates
This came out of issue #588, but is separable.
It would be useful in the element templates in the XSLT specification to show default values for attributes.
Pull request #590 created #created-590
343: make $collation uniformly optional
Fix #343
Pull request #589 created #created-589
561: abbreviation fn=function, drop lambda syntax
Allows "fn" as an abbreviation for "function", and drops the "thin-arrow" lambda function syntax. Fix #561.
Issue #588 created #created-588
Incompleteness of xsl:sort specification
These are problems inherited from XSLT 3.0.
The effect of the attribute xsl:sort/@order
is never explained in the text; it is assumed that the meaning is obvious.
The only explicit statement that the default is "ascending" is in the schema-for-xslt40, which is non-normative.
The schema also gives a default of "text" for the data-type attribute, which is incorrect: when data-type is set to text, values will be cast to string before comparison, which is not what happens if the actual sort key values are numeric.
We have added prose to define what we mean by "effective value". This clarifies that the effective value of the order attribute, if omitted, is the default value. But we don't have a systematic and formal way of saying what the default value is. This affects the result of test merge-021, which depends on deciding whether the two xsl:merge-source elements have the same effective value for "order", when one is omitted and the other is set to "ascending".
Pull request #587 created #created-587
365: Allow braces in switch and typeswitch expressions
Fix #365
Pull request #586 created #created-586
585: [Editorial] Rearrange text (and grammar) for dynamic function calls
This PR is purely editorial. It addresses the problems described in issue #585, in particular, the syntax of dynamic function calls is now described before the semantics.
Issue #159 closed #closed-159
Support named arguments on static function calls
Issue #585 created #created-585
Editorial: dynamic function calls
Secion 4.4.2.1 describes the semantics of a dynamic function call, but there is no explanation of the syntax.
The syntax is defined in section 4.5.2.
It's rather odd for the syntax and semantics to be separated in this way, and in particular for the semantics to be explained first.
It's also rather odd that both sections claim to have definitions of the term "dynamic function call", and the definitions are different.
Furthermore the grammar productions in 4.5.2 reference, but don't include, PositionalArgumentList; instead they include ArgumentList which is not referenced. PositionalArgumentList is included under 4.23 Arrow Expressions.
Pull request #584 created #created-584
Editorial: Correction to map:filter examples
Corrects the syntax of the lambda functions in these two examples. Also improves the formatting.
Issue #583 created #created-583
array:replace(), etc
Some observations:
- array:replace would be more versatile if multiple positions could be specified, rather than just a single position.
- if all positions are selected, the function becomes identical to array:for-each. So perhaps we should scrap array:replace and instead add an optional parameter
$positions as xs:integer*
to array:for-each. However, that could be confusing: people might imagine that the items at positions not present in the list are discarded, rather than being returned unchanged in the result. So I propose we don't do that. - if the function is useful on arrays, then it's also useful on sequences. But
fn:replace
does something completely different. - We have a similar function on maps called
map:substitute
(but it's not quite the same, because it processes every entry in the map).
In the interests of alignment, I propose we have three functions:
fn:substitute($input as item()*, $positions as xs:positiveInteger*, $action as function(item()) as item())
equivalent to for $it at $pos in $item return if ($pos = $positions) then $action($it) else $it
array:substitute($array as array(*), $positions as xs:positiveInteger*, $action as function(item()*) as item()*)
equivalent to array{for member $it at $pos in $item return if ($pos = $positions) then $action($it) else $it}
map:substitute($map as map(*), $keys as xs:anyAtomicValue*, $action as function(anyAtomicValue, item()*) as item()*)
For the first two functions, we don't really need to allow the second argument to be omitted, because it would then be equivalent to the corresponding for-each() function. Unfortunately that's not quite true of map:for-each(), because it doesn't return a map. However if you want to do a functional replacement of every entry in a map, it can be done easily enough with
map:build(map:key-value-pairs($map), function{?key}, function{$action(?value)})
so we're not really losing anything.
Issue #582 closed #closed-582
Fix examples to be consistent with spec
Pull request #582 created #created-582
Fix examples to be consistent with spec
I know there are open comments on the actual spec, but in the short term, let's at least make the examples correct.
Issue #581 closed #closed-581
Fix schema error
Pull request #581 created #created-581
Fix schema error
I can't explain why this didn't turn up initially in my local testing. :-(
Issue #580 closed #closed-580
Automatically generate app/fo-spec-examples on build
Pull request #580 created #created-580
Automatically generate app/fo-spec-examples on build
This PR should update the build so that the test-suite app/fo-spec-examples.xml
file is automatically build when the spec is published. The PR test won't tell us anything, so we'll have to merge it and see what happens...
Issue #579 closed #closed-579
Support role=wide on fos:examples
Pull request #579 created #created-579
Support role=wide on fos:examples
This is a purely style related change. It supports role=wide on fos:example elements. If an example is identified as "wide" then the presentaiton is in sequential rows of the table rather than adjacent columns. See, for example, parse-uri()
.
Pull request #578 created #created-578
317: fn:format-integer: $lang → $language
Parameter name aligned with other functions (fn:format-dateTime
, fn:lang
, others).
Closes #317.
Pull request #577 created #created-577
Editorial: improve generator for keyword tests
Improves the stylesheet that generates the BuiltInKeywords.xml test set, using example values for arguments where appropriate, taking these from the function catalog.
Issue #576 created #created-576
JSON serialization: Sequences, INF/NaN, function items
As far as possible, the JSON serialization method should be aligned with the new fn:items-to-json() (aka xdm-to-json) function.
Firstly, the results should be the same when the input consists entirely of atomic values, or sequences, maps, and arrays consisting entirely of atomic values. This only requires one change: serializing a sequence of length >1 should output the sequence as if it were an array, rather than raising an error. This is a compatible change.
Secondly, the new fn:items-to-json() function should have an option to output elements (for example elements appearing within maps or arrays) in the same way as the JSON serialization method does, by serializing the nodes to lexical XML or HTML contained within a JSON character string.
Finally, we should align the rules on how to output a function item (other than a map or array). We could either adopt the serialization approach (raise an error), or the items-to-json() approach (output some kind of placeholder saying "here be dragons") but they should be aligned.
Note that the two operations are still very different. items-to-json() is primarily about converting XML to JSON, while JSON serialization is primarily about turning maps and arrays into their JSON lexical form.
Issue #361 closed #closed-361
Named arguments: $input vs. $value
Issue #562 closed #closed-562
361: Named arguments: $input vs. $value
Issue #567 closed #closed-567
Errors in schema for XSLT
Issue #568 closed #closed-568
567: schema for xslt40
Issue #569 closed #closed-569
Minor editorial corrections, XDM chh. 1, 2
Issue #106 closed #closed-106
Decorators' support
Issue #175 closed #closed-175
In XQuery, allow a semicolon at the end of the module
Issue #457 closed #closed-457
Support parsing numeric, alphabetic, and additive number systems.
Pull request #575 created #created-575
359: fn:void: Absorb result of evaluated argument
#359: fn:void: Absorb result of evaluated argument
Issue #574 created #created-574
fn:log: Trace and discard results
See https://github.com/qt4cg/qtspecs/issues/359#issuecomment-1465971781
fn:dump
Summary
Outputs trace information and discards the result.
Signature
fn:dump(
$value as item()*,
$label as xs:string? := ()
) as empty-sequence()
Properties
This function is ·deterministic·, ·context-independent·, and ·focus-independent·.
Rules
Similar to fn:trace:
, the values of $value
, converted to an xs:string
, and $label
(if supplied and non-empty) may be directed to a trace data set. The destination of the trace output is ·implementation-defined·. The format of the trace output is ·implementation-dependent·, as is the ordering of the output.
In contrast to fn:trace
, the function returns an empty sequence.
Issue #573 created #created-573
Node construction functions
I propose introducing a set of functions that allow node construction in XPath. The basic functions are
new-document($children as node()*)
new-element($name as QName, $content as node()*)
new-attribute($name as QName, $value as xs:string)
new-namespace($prefix as xs:string, $uri as xs:string)
new-comment($content as xs:string)
new-processing-instruction($name as xs:string, $content as xs:string)
new-text($content as xs:string)
The semantics would be essentially identical to the current node constructors in XQuery and/or the corresponding instructions in XSLT (there are very few differences: a few minor ones, like if there are multiple attributes with the same name, XSLT takes the last, while XQuery throws an error).
As always, it's difficult to know where to draw the line in functionality between XPath and XQuery. One of the guidelines I use is that if an addition to XPath is likely to be useful to XSLT users, then it's worth including. Clearly these functions are not strictly necessary in XSLT (they could easily be user-supplied as wrappers around XSLT instructions). But to take advantage of some of the other capabilities we're introducing in XPath, node construction functions are increasingly handy. Consider for example:
<xsl:copy-of select="interleave(para, new-element(xs:QName(xhtml:br)))"/>
That's a one-liner that replaces
<xsl:for-each select="para">
<xsl:if test="position() ne 1"><br/></xsl:if>
<xsl:copy-of select="."/>
</xsl:for-each>
The new functions are also useful in XQuery because although they duplicate existing syntax, the fact that they are functions rather than custom syntax makes them more versatile.
As with existing constructs in XQuery and XSLT, a naive implementation that follows the semantics literally (which involves copying a subtree when adding an element to a new parent) would be rather inefficient. However, I think that the same established optimizations are equally applicable, for example lazy tree construction and/or push-mode evaluation.
Issue #572 created #created-572
fn:evaluate-xpath() function
XSLT 3.0 introduced an instruction, xsl:evaluate, for dynamic evaluation of XPath expressions. But there is no way of doing this in XQuery.
It was done as an instruction in XSLT because functions at the time were not flexible enough to cope with a range of optional parameters. This situation has changed.
An fn:evaluate()
function might operate successfully with the following parameters:
- xpath - the XPath expression as a string.
- namespaces - the namespace bindings as a map from prefix to URI, defaulting to the namespace bindings in the static context of the caller
- parameters - parameter values as a map from QName to value, defaulting to an empty map
- context-item - the context item for evaluation, defaulting to the context item of the caller
- options:
- default collation
- base URI
- schema-aware = true/false
- allow-external-access = true/false
- cache = true/false (whether to cache compiled expressions for reuse)
Before xsl:evaluate
as introduced, there was some concern in the WG about security and the risk of injection attacks. I think that the allow-external-access switch is sufficient for this: if you don't trust the expression, set it to false, and all dangerous things like access to external documents and other resources, use of extension functions, etc, is disabled.
QT4 CG meeting 040 draft agenda #agenda-06-27
Draft agenda published.
Issue #571 created #created-571
XSLT: xsl:for-each-group/@break-when
The new @break-when
attribute on xsl:for-each-group
(a) has not been reviewed by the CG
(b) is not mentioned in the changes appendix
(c) is not included in the schema-for-XSLT40
Issue #570 created #created-570
XSLT: Built-in template rules for maps and arrays
The current built-in template rules for maps and arrays do not work well for a recursive-descent traversal of a JSON-like tree of maps and arrays. The effect of on-no-match="shallow-copy" is to do a deep copy, and the effect of on-no-match="shallow-skip" is to do a deep skip. I therefore propose two new values for on-no-match, provisionally "shallow-copy-all" and "shallow-skip-all".
For more details see my Balisage 2022 paper: https://balisage.net/Proceedings/vol27/html/Kay01/BalisageVol27-Kay01.html
We do in fact have some test cases that use this syntax, see attr/mode-4001
et seq, but it has never found its way into the spec.
Pull request #569 created #created-569
Minor editorial corrections, XDM chh. 1, 2
Minor corrections to XDM chapters 1, 2.
expanded-QName
versusexpanded QName
. The latter outnumbered the former ca. 3:2, and is better, so I with with it.- some language pointing to tables & lines, language rendered obsolete by @ndw 's nice new graphs, excised
- other minor edits for clarity, consistency
Pull request #568 created #created-568
567: schema for xslt40
This PR applies the errata to the XSLT 3.0 version of the schema, and extends the schema to support new syntax that has been introduced in XSLT 4.0.
Fix #567.
Issue #567 created #created-567
Errors in schema for XSLT
Erratum E10 for XSLT 3.0 notes a number of errors in the schema for XSLT 3.0. The errata can be found in the repository at https://github.com/w3c/qtspecs/blob/master/errata/xslt-30/errata.xml but they are not published or linked from the spec; they were produced by the XSL WG after publishing 3.0 and before the group disbanded.
The version of the schema found in the 3.0 repository does not include these corrections; therefore, neither does the one in the 4.0 repository.
There is another version of the schema in the xslt30test repository, and this one does include the corrections (as far as I can see).
However, new additions for changes needed for 4.0 have been applied inconsistently to the two versions.
There's therefore a pressing need to bring everything back into line. A good step would be to cut out the duplication: once we've got it clean, the build process should copy qtspecs/specifications/xslt-40/src/schema-for-xslt40.xsd to xslt40-test/tests/misc/catalog/schema-for-xslt40.xsd and we should only maintain the former.
Issue #566 created #created-566
fn:parse-uri, fn:build-uri: Feedback
Feedback on fn:parse-uri
(thanks, @ndw, for the comprehensible rules in the spec):
- With the current rules, the port is not detected in
http://x:80
. Maybex:
is misinterpreted as the beginning of a Windows path? - The uri decoding should be revised. It’s currently possible to create strings with invalid Unicode characters:
parse-uri('%FF')
. With the current rules, the following expression returnsdf83
anddc00
:
parse-uri('%FF00')
=!> map:get('filepath')
=!> string-to-codepoints()
=!> format-integer('16^XX')
More to come (or not).
Issue #560 closed #closed-560
Formatting Monospace
Issue #565 closed #closed-565
More typographic changes
Pull request #565 created #created-565
More typographic changes
Further to #560, I'll just merge this if it passes.
- Make the
<pre>
font the same size as the other monospaced environments - Tighten up the spacing in function signatures
- Make table-formatted examples use valign=top
- Add subtle shading to table-formatted examples so the rows are easier to distinguish.
Issue #564 created #created-564
Sorted maps
Based on a requirement from Michael Müller-Hillebrand on xsl-list.
We could define a variant of map:build() that constructs a sorted map. If a map is sorted, then:
- map:is-sorted() returns true.
- map:keys() returns the keys in sorted order.
- map:range(map, min, max) returns a sorted sub-map whose keys lie within a particular range
- map:get-by-prefix() returns the values whose keys start with a given substring
Issue #563 closed #closed-563
Style and other editorial fixes
Pull request #563 created #created-563
Style and other editorial fixes
This PR mostly attempts to fix #560 by making the formatting of code more consistent.
- I removed the reference to the W3C CSS and made a local copy
- Added CSS variables for code parameters and tried to make use of them everywhere
- I removed the use of "font:small" in index tables, so they're a little easier to read
- I noticed and fixed two places where we had old-style type hierarchy diagrams
These fixes aren't going to be visible in the PR build, so I'm just going to merge it.
Pull request #562 created #created-562
361: Named arguments: $input vs. $value
#361: Just an editorial one to get one more issue closed.
QT4 CG meeting 039 draft minutes #minutes-06-20
Draft minutes published.
Issue #526 closed #closed-526
load-xquery-module() needs changes to account for functions with an arity range
Issue #548 closed #closed-548
Space separation in lambda expressions
Issue #561 created #created-561
Alias for `function` keyword; drop thin arrow syntax?
See https://github.com/qt4cg/qtspecs/issues/548#issuecomment-1591023928 and the subsequent comments:
[…] I would still be in favor to have
fn
orf
as a plain alias forfunction
(and optionally drop the thin arrow syntax), simply because function items and declarations can get so frequent:
fn($a) { $a + 1 }
fn { . + 1 }
Issue #552 closed #closed-552
Editorial: Updates to back matter and status section of F+O spec
Issue #559 closed #closed-559
Minor editorial edits
Issue #558 closed #closed-558
Added fn:items-X cross-references
Issue #316 closed #closed-316
Function fn:differences
Issue #551 closed #closed-551
316: Drop the fn:differences function
Issue #550 closed #closed-550
548: require parens around lambda arguments
Issue #549 closed #closed-549
526 load xquery module
Issue #82 closed #closed-82
Should the mode attribute for apply-templates in templates of enclosed modes default to #current?
Issue #112 closed #closed-112
Abbreviate `map:function($someMap)` to `$someMap?function()`
Issue #331 closed #closed-331
Extend fn:path to support arrays and maps.
Issue #376 closed #closed-376
add documentation prefix attribute to xsl:stylesheet
Issue #399 closed #closed-399
fn:deep-equal: Using Multilevel Hierarchy and Abstraction when designing and specifying complex functions
Issue #425 closed #closed-425
Structural proposal (ThinLayer:tm:) : Add a layer of thin spec between XPath and the XPath Derived Language
Issue #560 created #created-560
Formatting Monospace
It would be great if we could further tweak and unify the representation of monospaced text. On my machine, I see…
monospace
(13px) for the function signature (using<code>
instead of<pre>
, with increased line heights)monospace
(11.7px) for<code>
Menlo, Consolas, "DejaVu Sans Mono", ...
(13px) for<pre>
Menlo, Consolas, "DejaVu Sans Mono", ...
(14,4px) for<pre>
in examples
…and I guess there are other cases. Would it be possible to use the same font style, size & line height for monospaced text? Maybe a friendlier light grey color for all <pre>
blocks, similar to the GitHub rendering.
Pull request #559 created #created-559
Minor editorial edits
The submitted edits are based upon a fresh reading of the serialization specs, and should all be relatively minor. Review would be appreciated, as small errors can have significant consequences.
Pull request #558 created #created-558
Added fn:items-X cross-references
This PR provides cross-references in the form of notes between members of the pairs items-before()
& items-ending-where()
and items-after()
& items-starting-where()
. The goal is a light editorial intervention to enhance function discovery. (Anticipating a user who, at any one of these spec descriptions, wonders how they might exclude/include the first matching item.)
QT4 CG meeting 039 draft agenda #agenda-06-20
Draft agenda published.
Issue #512 closed #closed-512
256: Context for default function parameter expressions
Issue #3 closed #closed-3
Allow tokens in xsl:mode/@name
Issue #557 created #created-557
fn:unparsed-binary: accessing and manipulating binary types
Dear All, When working with binary types currently one has to fall back to string conversions and/or extensions. A few ideas on nice to have functions that operate on binary types: -accessing a single byte at a specific position -splitting binary data at byte boundary (aka binary-subsequence) -converting to/from a sequence of byte(s) -joining binary data together -(optionally) loading data directly as (base64)binary (some extensions are using unparsed-text with proprietary decoding 'x-binarytobase64' to retrieve a base64 castable string) -standard bit-wise operators and, or, xor, not, rshift, lshift
If considered, then each needs to be discussed in detail separately -ex. which type to support signed/unsigned, 8 bit etc.-, I merely intend as conversation starter in case others encountered similar issues/limitations.
p.s.: Not sure if this is the right channel to raise this issue, so feel free to close/move/split accordingly.
Issue #556 created #created-556
Serialization phase 5 note unclear
Section 4 of the Serialization spects ends with this note:
Serialization is only defined in terms of encoding the result as a stream of octets. However, a serializer MAY provide an option that allows the encoding phase to be skipped, so that the result of serialization is a stream of Unicode characters.
What is a stream of Unicode characters? AFAIK, Unicode characters cannot be streamed an sich but require some encoding. And how does would such a stream not consist of bytes/octets of one sort or another? A stream of what exactly?
Pull request #555 created #created-555
464: Revised narrative of normalization steps for serialization
This PR acts on #464 by revising the description of steps involved in normalizing a sequence that is input to serialization. I normally would wield a light editorial hand, but the issues raised in #464 as well as closer reading of the prose convinced me that a wholesale revision would be beneficial. For example, new sequences were described as if the reader already knew about them, but they are really only introduced as the last sentence in many steps.
I have capitalized on the original version of step 1's appeal to array:flatten
to abbreviate the description for two of the steps.
Issue #518 closed #closed-518
transitive-closure() function
Issue #554 created #created-554
The Transitive Closure function produces an incomplete result, completeness/success and number of actual iterations must also be returned
@michaelhkay and CG members,
I initially reopened the original issue, because there is some useful and needed functionality that the PR and the text of the FO specification do not provide, and we missed to discuss this at the June 13th meeting.
I believe that typically a developer would need to know if the "whole" TC was completely produced, or not, and maybe how many iterations were needed.
This can be provided if the result of the function was for example:
map{
"TC" : $thefinalTCNodeset,
"WasTCComplete": $boolForTCCompletion,
"NumberOfIterations: $someInteger
}
One could argue that if $result? NumberOfIterations < max
, then we know that the complete TC was produced and thus we don't need the member "WasTCComplete".
However, this is not the case when $result? NumberOfIterations eq max
. In this case both outcomes are possible: the complete TC was produced exactly in max
iterations (so the value of "WasTCComplete"
must be true()
), or max
iterations were performed and the max + 1
st iteration produced additional nodes -- in this case "WasTCComplete"
must be false()
.
Failure to produce the complete TC in many cases will be regarded as error, and the developer needs to be sure that this was (or this wasn't) the case.
thus, the current specification of this function needs additional work to accomodate the full needed functionality.
Issue #553 created #created-553
New function fn:substitute()
The discussion on the parse-csv() use cases suggests there would be value in a function
fn:substitute($value as item()*, $pos as xs:positiveInteger, $mangler as function(item()) as item()) as item()*
whose effect is to return the input sequence $value
with the item at position $pos
replaced by the result of invoking $mangler
on that item.
This should be aligned with similar functions for maps and arrays.
Pull request #552 created #created-552
Editorial: Updates to back matter and status section of F+O spec
Mainly updates to the changes appendix. Also improve the status section, and correct the bibref to the EXSLT specs.
Issue #256 closed #closed-256
Function declarations: static and dynamic context for default parameter values
Issue #275 closed #closed-275
Problems with nt/xnt links to grammar terms
Pull request #551 created #created-551
316: Drop the fn:differences function
Fix #316
The draft specification for fn:differences was added to the spec before we had a review process in place. The spec is complex and incomplete; issue #316 points out some of the difficulties. This PR removes it from the spec.
Note that fn:deep-equal() now has a debug option which outputs diagnostic information in implementation-defined format; although not interoperable, this meets the main use case, which is to discover for diagnostic purposes why and in what way two values are not deep-equal to each other.
Pull request #550 created #created-550
548: require parens around lambda arguments
Require parentheses around the parameter list in a lambda expression, even when there is only one parameter, to avoid the problem of needing whitespace before the arrow. Resolves issue #548.
Issue #327 closed #closed-327
Tokenisation
Issue #333 closed #closed-333
Equality of function items
Issue #382 closed #closed-382
Improve whitespace handling in deep-equal
Issue #536 closed #closed-536
Re: Mathematical Operator Symbols
Issue #513 closed #closed-513
Arrow operator: Inline functions without parens
Pull request #549 created #created-549
526 load xquery module
Updates the spec of fn:load-query-module to handle functions with optional parameters. Resolves issue #526
QT4 CG meeting 038 draft minutes #minutes-06-13
Draft minutes published.
Issue #521 closed #closed-521
518: Add transitive-closure() function
Issue #545 closed #closed-545
513: after arrow operator, inline function no longer needs parens
Issue #544 closed #closed-544
536: disallow mixing of symbols in operator tokens
Issue #543 closed #closed-543
382 simplify rules for whitespace in fn:deep-equal
Issue #542 closed #closed-542
Fixes a simple error in the description of XSLT error XTSE4020
Issue #541 closed #closed-541
Fix typo in XPath §2.4.5 - E1 should be tagged as code not as var.
Issue #548 created #created-548
Space separation in lambda expressions
Currently the expression $a -> {$a+1}
requires a space before the arrow, because the hyphen is otherwise tokenized as part of the variable name.
Not only is this a very easy trap to fall into, it also tends to result in poor diagnostics because we're in lookahead territory where we are also considering an alternative parse along the lines $a- > EXPR
(where the variable name is a-
), and this parse only gets disqualified because an expression can't begin with {
.
If we later introduce expressions that start with {
, writing $a->{3}
with no space would no longer be an error, it would just mean something completely different from what you intended.
One solution, not a very pretty one, is to require parentheses around the argument list even when there is only a single argument. Almost as inconvenient as requiring the space, but with better diagnostics if you forget.
Another solution might be to restrict the parameter name so it can't contain a hyphen. The problem with that is that the tokenization becomes sensitive to the grammatical context, which is not totally unexplored territory, but complicates the fact that we're already in lookahead territory where we are looking at alternative parses. (The Saxon implementation already does the lookahead by retokenizing, so it's not impossible.)
Would it be too horrendous to contemplate a backwards-incompatible change to say variable names cannot end in a hyphen, anywhere? Perhaps with a mode bit for compatibility? After all, all this complication is caused by the need to cater for something which no sane user ever does.
Pull request #547 created #created-547
Action QT4CG-036-02: Further elaboration of the rules for function identity
Following review and acceptance of the proposal introducing the concept of function identity (PR #525, Issue #520) this PR makes some refinements in response to comments raised during the review, especially in the following areas:
- clarification as regards named function references to context-dependent functions
- relationship to (in)determinacy of a function
- avoiding the phrase "new function item"
- stating that the identity of a function item such as fn:count#1 applies even across execution scopes, e.g. calls to fn:transform.
Pull request #546 created #created-546
414: Attempt to implement expanding the allowed character repertoire
Fix #414
This PR addresses ACTION QT4CG-036-01 on me.
QT4 CG meeting 038 draft agenda #agenda-06-13
Draft agenda published.
Pull request #545 created #created-545
513: after arrow operator, inline function no longer needs parens
Resolves issue #513 by removing the requirement for an inline function expression on the RHS of an arrow operator to be enclosed in parentheses.
Pull request #544 created #created-544
536: disallow mixing of symbols in operator tokens
As proposed in issue #536, this change disallows mixing of ordinary and full-width angle brackets in the same token.
Pull request #543 created #created-543
382 simplify rules for whitespace in fn:deep-equal
My previous attempt to make a pull request for this change got lost somewhere in the process; here is a renewed attempt. The changes are in response to comments made during the review of the orginal deep-equal() proposal, recorded in issue #382
Issue #497 closed #closed-497
https://qt4cg.org/specifications/xpath-functions-40/Overview-diff.html#func-map-pairs has wrong function syntax order
Pull request #542 created #created-542
Fixes a simple error in the description of XSLT error XTSE4020
A comment to issue #82 identified this typo.
Pull request #541 created #created-541
Fix typo in XPath §2.4.5 - E1 should be tagged as code not as var.
A trivial markup error that leads to the meta-variable E1 being wrongly rendered.
Issue #66 closed #closed-66
ThinArrowTarget should use FunctionBody
Issue #78 closed #closed-78
Specify strict order of evaluation for a subexpression
Issue #98 closed #closed-98
Support ignoring whitespace/indentation differences in fn:deep-equal.
Issue #125 closed #closed-125
array:partition → fn:partition: empty results; examples
Issue #384 closed #closed-384
Definition of "effective value" in XSLT
Issue #418 closed #closed-418
array and map attribute in xsl:iterate and xsl:for-each-group
Issue #503 closed #closed-503
Reinstate focus functions
Issue #381 closed #closed-381
Deep-equal comparisons without errors
Issue #520 closed #closed-520
Function identity
Issue #540 created #created-540
Add fn:system-property() to XQuery
XSLT has specific additions to the XPath function library to facilitate identifying the running implementation:
- https://www.w3.org/TR/xslt-30/#func-system-property
- https://www.w3.org/TR/xslt-30/#func-available-system-properties
These would be useful for XQuery too. There should be something better than the fragile sadness of https://github.com/AndrewSales/XQS/blob/461a90a8e2f49d9ef646ff6940c6962f18c0f43a/port.xqm#L3-L12
Issue #539 created #created-539
FLOWR where clause with a "do when false" option
This is a request for an enhancement.
Fairly often, I'll have a query arranged as
let $step1 := do some processing where exists($step1) let $step2 := processing based on step1 where exists($step2)
and so on.
This is a convenient pattern until I want to emit some sort of message about where the process stops.
It would be convenient to have
where expression else expression
with the else as an optional extension of the where clause to allow emitting information about which where clause the FLOWR expression stopped on.
It might be more congruent to the style of the language as
where expression return expression
but then again having multiple return keywords isn't obviously a good thing.
QT4 CG meeting 037 draft minutes #minutes-06-06
Draft minutes published.
Issue #531 closed #closed-531
grammar production LambdaParams has "(" and ")" incorrectly under the choice
Issue #532 closed #closed-532
fix error in LambdaParams rule
Issue #534 closed #closed-534
530: escape solidus in JSON
Issue #535 closed #closed-535
Editorial: add an entry to the changes appendix
Pull request #538 created #created-538
480: Attempt to allow xs:string to be 'promoted to' xs:anyURI
I think it might be a little cheeky to call it "promotion" in both directions, but this really has more to do with a kind of conversion, so I'm willing to let it slide.
If accepted, this PR resolves ACTION QT4CG-035-03.
Issue #537 closed #closed-537
Editorial: present F&O examples as tables.
Pull request #537 created #created-537
Editorial: present F&O examples as tables.
This is a purely editorial change to the stylesheet that formats examples in the F&O spec; it changes the presentation to be a two-column table containing expressions and results. The intention is to reduce clutter and to improve the readability where code samples (either expressions or results) need multi-line rendition.
In the vast majority of cases the change is clearly (IMHO) an improvement, but further tweaking is possible:
- There may be scope for tailoring the CSS (for example, I don't like the fact that table cells are centred vertically).
- Some of the tables (e.g. parse-uri examples) take too much horizontal space; the code should be edited to reduce the line length
- Examples are sometimes introduced with a free-standing paragraph tag rather than with fo:preamble, which separates the introduction from the code into a separate table row.
- Some of the examples that were designed for inline rendition could usefully take advantage of the opportunity to turn them into multi-line code samples.
QT4 CG meeting 037 draft agenda #agenda-06-06
Draft agenda published.
Issue #536 created #created-536
Re: Mathematical Operator Symbols
#460
=!>
is not mentioned in A 3.3
Also, I do not think it makes sense to allow mixing both characters in one operator, like in <<
. It combines the disadvantages of <<
and <<
without any advantages
Pull request #535 created #created-535
Editorial: add an entry to the changes appendix
Pull request #534 created #created-534
530: escape solidus in JSON
Add escape-solidus
serialization parameter for the JSON output method.
Pull request #533 created #created-533
413: Spec for CSV parsing with fn:parse-csv()
This is a spec proposal for fn:parse-csv()
from #413.
I've tried to cover off most of what was discussed in that issue, but I have avoided dealing with backlash escapes (per @ChristianGruen's early comment), sticking with the RFC 4180 quoting approach.
There are some issues with the structure where I tried to follow the existing structure of chapter 15, but that leaves the function definition in 15.4 separated by a lot of text before the wider format discussion in 15.7. The split between function def and context affects the JSON and HTML parsing functions too, so I have avoided trying to fix that as well in this PR.
If this meets with approval, I'll squash commits and rebase before merging.
Pull request #532 created #created-532
fix error in LambdaParams rule
fixes https://github.com/qt4cg/qtspecs/issues/531#issue-1733539612
LambdaParam | "(" | (LambdaParam ("," LambdaParam))? | ")" should be: LambdaParam | ( "(" (LambdaParam ("," LambdaParam))? ")" )
Issue #531 created #created-531
grammar production LambdaParams has "(" and ")" incorrectly under the choice
The grammar production
LambdaParams ::= LambdaParam | "(" | (LambdaParam ("," LambdaParam)*)? | ")"
should be
LambdaParams ::= LambdaParam | ("(" (LambdaParam ("," LambdaParam)*)? ")")
Issue #530 created #created-530
Escaping of forward slash in JSON output method
The fact that we escape forward slash in the JSON output method has proved unpopular with quite a few users.
The rationale for doing it is discussed at https://stackoverflow.com/questions/1580647/json-why-are-forward-slashes-escaped
The short summary is that it's only needed when JSON is inserted into an HTML script
element, and then only when immediately followed by >
.
There's a workaround using character maps but it's really clumsy.
I don't want to escape forward slashes by default in the xdm-to-json() function because they appear so often in namespaces and it adds an awful lot of visual clutter. That means that our JSON formatter needs an option to suppress this escaping, which means we might as well provide user control over it as an output property...
Adding a new output property just for this purpose is rather heavyweight both in the specs and in our implementation, but I can't think of a better solution. So I propose adding escape-solidus
, values yes/no, default (for compatibility) yes.
Yes, someone will ask about the name. Surely no-one calls it a solidus? Well, Unicode does, and I think we should use the official name. And it avoids arguing about whether it should be slash or forwards-slash or forward-slash. At least I'm not proposing virgule.
Issue #367 closed #closed-367
Focus for RHS of thin arrow expressions
QT4 CG meeting 036 draft minutes #minutes-05-30
Draft minutes published.
Issue #524 closed #closed-524
503: reinstate focus functions
Issue #525 closed #closed-525
520: add function identity and use it in deep-equal
Issue #519 closed #closed-519
237: Revise tokenisation appendix
Issue #527 closed #closed-527
Editorial: more corrections to F+O examples
Pull request #529 created #created-529
528 fn:elements-to-maps
Revises the detail of the proposed json() function, including a name change to xdm-to-json().
The proposed changes give us a starting point for implementation, but I would expect there might be further tweaks to the spec once we try applying the function to real examples.
QT4 CG meeting 036 draft agenda #agenda-05-30
Draft agenda published.
Issue #528 created #created-528
fn:elements-to-maps (before: Review of the fn:json() function)
I've been writing tests for the fn:json function, whose spec I haven't read for quite a while, so it's an opportunity (a) to request WG review of the spec, and (b) for some minor comments.
- I think a better name for the function might be
fn:to-json
. Any other suggestions? - Where we specify that a JSON object should be output with particular properties, I think we should be consistent about whether or not we prescribe the order. Writing tests is a lot easier if the order is always prescribed!
- Document nodes: it would be better to output both the document URI and the base URI where available.
- Under "The children of the element are processed as follows" there are four rules. In the case where an element has just one element node child, I think rule 4 should apply rather than rule 3.
- Under Processing-Instruction nodes: typo "A JSON object with a two properties".
- The section starting "Strings are escaped as follows" should be promoted up a level.
- Representing functions: I propose a different set of rules. (a) for a function that is a reference to a built-in or user-defined function definition, output "Q{uri}local#arity". (b) for an anonymous function, output "#anonymous-function". The rationale is that the JSON output here isn't going to be useful except as a placeholder to indicate that a function item is present.
- We might want to be more prescriptive about how numbers are formatted (or to provide user options)
- Like many XML-to-JSON libraries, there's the problem that two instances of the same element type might be output very differently depending on which children are present. For example the representation of a book with two authors might be very different from a book with one author. I would suggest that rather than the boolean
element-map
option, we allow the options to include a list of element names for whichobject
representation rather thanarray
representation is to be used.
Pull request #527 created #created-527
Editorial: more corrections to F+O examples
Fixes errors in the fn:replace examples, updates elsewhere to reflect changes to the syntax of lambda expressions and focus functions.
Issue #526 created #created-526
load-xquery-module() needs changes to account for functions with an arity range
The spec of load-xquery-module()
assumes that each function declaration in a query module has a single integer arity; this doesn't allow for default parameters which mean it now has an arity range.
Because the returned map contains function items, which always have a fixed arity, I think it needs to contain one entry for each arity in the arity range. This involves evaluating the defaults for any parameters that have a default value defined; if the default value is context dependent, this is going to have to use the context of the load-xquery-module()
function call, which isn't very meaningful, but I can't see what else to do.
An alternative is to only include one function item in the result, corresponding to the maximum arity.
If we introduce sequence-variadic functions, the arity range becomes infinite, which makes both of these ideas problematic. But presumably sequence-variadic functions will be callable with all the values supplied in a single array?
Pull request #525 created #created-525
520: add function identity and use it in deep-equal
I believe this PR resolves issues
issue #520 - function identity issue #333 - equality of function items issue #381 - deep-equal comparison without errors
The PR introduces a concept of function identity in the data model, and for all expressions that create functions, explains what the identity of the returned function is.
The concept of function identity is used initially in two places: in fn:deep-equal(), when the operands include function items; and in the F+O prose defining the concept of determinism, which in turn is relied on by the definition of memo functions in XSLT.
I had hoped to go further and address issue #119, generalising what kinds of values are allowed as keys in maps, but as explained in a comment on that issue, I hit obstacles.
Pull request #524 created #created-524
503: reinstate focus functions
This PR reinstates "focus functions", using the syntax function{EXPR}
rather than ->{EXPR}
. If accepted, this resolves issue #503.
Issue #523 created #created-523
Dealing with component name conflicts with library packages
Override with visibility='hidden'
<override>
<template name='foo' visibility='hidden'/>
</override>
This change allows the using package to override a component without running into a potential naming conflict with another component in the using package or in another used package. Because the visibility is hidden, the component is not invokable from the using package.
Accept with alias
<accept component='template' names='foo' aliases='fu'/>
This change allows the using package to accept components but with a different name.
Issue #522 closed #closed-522
function-catalog.xml: Original line endings reverted (modified by GitHub’s 'direct edit' feature)
Pull request #522 created #created-522
function-catalog.xml: Original line endings reverted (modified by GitHub’s 'direct edit' feature)
I learned it’s NOT advisable to use GitHub’s features to directly edit files in the browser. The newlines of function-catalog.xml
of the original file seem to be changed.
This PR is restoring the original newlines. Sorry for that.
Pull request #521 created #created-521
518: Add transitive-closure() function
QT4 CG meeting 035 draft minutes #minutes-05-23
Draft minutes published.
Issue #504 closed #closed-504
Merge map:keys and map:keys-where
Issue #515 closed #closed-515
504: Merge map:keys and map:keys-where
Issue #396 closed #closed-396
333: Deep-equal, no failure when comparing functions
Issue #520 created #created-520
Function identity
To make deep-equal error-free for all arguments (issue #333), and to support the introduction of sets (issue #34), we need to be able to test whether two functions are "the same function". This is a proposed pragmatic solution.
We change the data model for functions so that functions, like nodes, have an identity that is acquired when the function is created; two functions are identical if and only if they have the same identity.
In general any expression that returns a new function allocates it an identity that is different from all other existing functions (as with nodes). However:
- Repeated evaluation of a function reference such as count#1 returns the same function each time, provided that the target function is context-free.
- Optimizers are allowed to rewrite expressions (for example by loop-lifting, etc) so that expressions that would in principle return distinct functions actually return the same function, provided the optimizer can determine that the two functions are equivalent in all respects other than their identity. For example if the expression
contains(?, 'xxx')
appears in a loop, the expression can be lifted out of the loop so there is no requirement that it returns different functions each time (as there is with nodes)
Benefits of this approach:
- identical($x, $x) is always true (function identity survives binding to variables)
- functions obtained by repeated evaluation of the same expression in the same context are likely to return identical results in cases that are simple enough for an optimizer to analyse
- the results are likely to be reasonably intuitive
- optimisers aren't constrained by rules on identity to restrict the rewrites they can attempt
This does mean that expressions that return functions become a little impure - but only in the same way that expressions that create nodes are a little impure. The impurity is well understood and tolerated.
Maps and arrays do not have identity as a property separate from their content.
Pull request #519 created #created-519
237: Revise tokenisation appendix
This PR is an extensive revision to the rules for tokenisation that corrects a number of errors:
- The problem mentioned in issue 237, namely the lack of clarity in the "longest token rule". This PR fixes this by clarifying what this rule means and where it applies. In particular it tackles the issue of "complex terminals" such as element constructor expressions and string templates where a symbol that is a single token at one level (in the sense that whitespace is constrained) also contains enclosed expressions.
- Some productions/tokens were misclassified or omitted from the relevant lists of tokens in the appendix. This has been fixed in part by using general rules in the grammar2spec stylesheet to generate lists of tokens, rather than relying on annotations in the grammar file.
The PR includes changes to the grammar2spec stylesheet.
Issue #518 created #created-518
transitive-closure() function
I've just found myself writing, yet again, a transitive closure function, and I feel we could add this to the spec.
I'm afraid it's another case where we really need set operations and therefore a universal equality operator. For the moment I'll just define it over nodes, which shelves the problem.
fn:transitive-closure($start as node()*, $step as function($node as node()) as node()*) as node()*
returns the set of all nodes reachable from a node in $start by zero or more applications of the $step function, in document order with duplicates removed.
Can probably define it formally something like
let $next-iteration := $start =!> $step()
return if (empty($next-iteration except $start))
then $start
else transitive-closure($start | $next-iteration, $step)
Issue #517 created #created-517
fn:chain (before: fn:multi-compose)
FO: fn:multi-compose : Evaluate a chain of functions
As per Wikipedia:
" In mathematics, function composition is an operation ∘ that takes two functions f and g, and produces a function
h = g ∘ f such that h(x) = g(f(x)).
In this operation, the function g is applied to the result of applying the function f to x. That is, the functions
f : X → Y and g : Y → Z
are composed to yield a function that maps x in domain X to g(f(x)) in codomain Z.
Intuitively, if z is a function of y, and y is a function of x, then z is a function of x.
The resulting composite function is denoted g ∘ f : X → Z, defined by:
(g ∘ f )(x) = g(f(x)) for all x in X "
In this Proposal we generalize function composition to the case when a sequence of functions are composed together, so that the last one is applied on an argument $x, then the last-but-one is applied on the result of this application, and so on … until finally the first function in the sequence is applied on the result produced so far.
This is an effective way of chaining a sequence of functions together, and we don’t need to invent or use any special operators or syntax, but we just pass this sequence of functions as argument to fn:multi-compose.
fn:multi-compose := function($funs as function(*)*, $x)
Here is an XPath 3.0 implementation of fn:multi-compose:
let $apply := function($f, $x) {fn:apply($f, [$x])},
$multi-compose := function($funs as function(*)*, $x)
{
fold-right($funs, $x, $apply)
},
(: The functions $incr and $times are needed just to show this example :)
$incr := function($x) {op("+")(?, $x)},
$times := function($y) {op("*")(?, $y)}
return
$multi-compose(($times(5), $incr(1)), 2)
As wanted, the result of evaluating this is
15: (2 +1) * 5
Remarks
-
In this implementation the type of the 2nd (last) argument of $multi-compose and $apply is
item()*
(any) and as such it is omitted. In case the function to be applied first, needs more than one argument, all of its arguments must be presented in the function call as a single sequence , and are passed (in order) as the members of a single array, as already implemented by the standard fn:apply. -
It is a dynamic error if any of the function applications produces a result which does not belong to the Domain of the function immediately preceding it in the function sequence.
Issue #181 closed #closed-181
HOF Sequence Functions with Positional Arguments
Issue #516 created #created-516
Add position argument to HOF callbacks
The coercion rules now allow a supplied function item to have lower arity than the signature of the declared type; the effect is that the information supplied in the additional arguments is ignored.
One of the intended use cases for this was to allow existing higher-order functions to be extended while retaining backwards compatibility. For example, in fn:filter
, we can change the required type of the predicate function from function(item()) as xs:boolean
to function(item(), xs:positiveInteger) as xs:boolean
, with the second argument supplying the position of the item being tested. A function that isn't interested in the position can just ignore it, so existing calls will continue to work.
I propose that we add a position argument to the callbacks for:
fn:filter
fn:for-each
fn:for-each-pair
fn:partition
fn:items-after
fn:items-before
fn:items-starting-where
fn:items-ending-where
array:filter
array:for-each
array:for-each-pair
Other candidates include
fn:all
fn:some
fn:index-where
array:index-where
but I suggest we leave these unless someone can think of a use case.
QT4 CG meeting 035 draft agenda #agenda-05-23
Draft agenda published.
Issue #471 closed #closed-471
Unify formatting (function calls, code blocks, quotes) in the specification
Issue #511 closed #closed-511
471: <code> elements, simple/typographic quotes
Pull request #515 created #created-515
504: Merge map:keys and map:keys-where
Issue #514 created #created-514
Lambda expression: Annotations
Edit 2023-05-21: Feedback was incorporated.
In the current grammar rules, there are subtle differences in the InlineFunctionExpr
and LambdaExpr
rules that we should dissolve.
Annotations are not supported in lambda expressions, which I believe is an unnecessary restriction:
(: currently legal :)
let $delete-texts := %updating function($nodes) { delete nodes $nodes//text() }
return $delete-texts(//city)
(: currently illegal :)
let $delete-texts := %updating ($nodes) -> { delete nodes $nodes//text() }
return $delete-texts(//city)
It should suffice to extend one rule in the grammar:
(: old :)
LambdaExpr ::= LambdaParams "->" EnclosedExpr
(: new :)
LambdaExpr ::= Annotation* LambdaParams "->" EnclosedExpr
We could also type declarations (as @michaelhkay has indicated below, though, this might not be as simple to realize as I hoped):
(: currently legal :)
let $find-john := function($node as node()) as xs:boolean { contains($node, 'john') }
return $find-john($node)
(: currently illegal :)
let $find-john := ($node as node()) as xs:boolean -> { contains($node, 'john') }
return $find-john($node)
The type declarations cannot be allowed if parentheses are omitted (unless we make them mandatory):
(: without parens :)
$i -> { ... }
(: parens :)
($i as xs:int) as xs:int -> { ... }
These are the current grammar rules:
FunctionItemExpr ::= NamedFunctionRef | InlineFunctionExpr | LambdaExpr
InlineFunctionExpr ::= Annotation* "function" FunctionSignature FunctionBody
FunctionSignature ::= "(" ParamList? ")" TypeDeclaration?
ParamList ::= Param ("," Param)*
Param ::= "$" EQName TypeDeclaration?
FunctionBody ::= EnclosedExpr
LambdaExpr ::= LambdaParams "->" EnclosedExpr
LambdaParams ::= LambdaParam | "(" | (LambdaParam ("," LambdaParam)*)? | ")"
LambdaParam ::= "$" VarName
As the InlineFunctionExpr
and LambdaExpr
both generate anonymous functions, we shouldn’t make a difference, and this is what I would recommend:
FunctionItemExpr ::= NamedFunctionRef | InlineFunctionExpr
InlineFunctionExpr ::= Annotation* (InlineFunction | LambdaFunction) FunctionBody
InlineFunction ::= "function" FunctionSignature
LambdaFunction ::= (Param | FunctionSignature)) "->"
FunctionSignature ::= "(" ParamList? ")" TypeDeclaration?
ParamList ::= TypedParam ("," TypedParam)*
TypedParam ::= Param TypeDeclaration?
Param ::= "$" VarName
Disclaimer: I could have raised this earlier, but I didn’t want to prolong the ongoing discussion on the open pull requests.
Issue #513 created #created-513
Arrow operator: Inline functions without parens
See also https://github.com/qt4cg/qtspecs/issues/435#issuecomment-1508228624: If an inline function expression is used as the right-hand operand of the arrow operators, parentheses must be used:
(: now :)
$seq => (function($x) { ... })()
(: desirable :)
$seq => function($x) { ... }()
This could be changed by adding the InlineFunctionExpr
to the ArrowDynamicFunction
rule:
[115] SequenceArrowTarget ::= "=>" ((ArrowStaticFunction ArgumentList) | (ArrowDynamicFunction PositionalArgumentList)) [151] ArrowStaticFunction ::= EQName [152] ArrowDynamicFunction ::= VarRef | ParenthesizedExpr |
InlineFunctionExpr
[142] ArgumentList ::= "(" ((PositionalArguments ("," KeywordArguments)?) | KeywordArguments)? ")" [143] PositionalArgumentList ::= "(" PositionalArguments? ")"
It will be best to tackle this after we’ve resolved #503, and we’ll have to check if the simplification doesn’t cause ambiguities.
Issue #53 closed #closed-53
Allow function keyword inline functions without parameters
Issue #436 closed #closed-436
Allow inline function expressions in arrow operator call chains
Issue #435 closed #closed-435
Remove the inlined function expression variant of the thin arrow operator
Pull request #512 created #created-512
256: Context for default function parameter expressions
This is a renewed attempt to tackle issue 256, which concerns how to define the static and dynamic context for default value expressions for optional function parameters in XQuery and XSLT. The resolution is to define the static and dynamic context for these expressions in detail.
To make this work, some refinement of the static and dynamic context definitions is needed:
- default collation is moved from the static context to the dynamic context, with a note that it is always known statically except in the case when defining the default for a function parameter.
- static base URI (in the static context) and executable base URI (in the dynamic context) are now formally separated; previously we fudged this by saying they could be different, but without recognizing separate context components
- the base URI for resolving relative collation URIs is now implementation defined. This allows implementors to use either the compile-time or run-time base URI, or some other URI defined using a processor API.
Pull request #511 created #created-511
471: <code> elements, simple/typographic quotes
That was a work-intensive one, as expected, but I’m optimistic that the PR improves the overall situation.
I’ll be happy to see subsequent PRs if I missed something (e.g., I didn’t touch ebnf.xml
).
Closes #471
Issue #509 closed #closed-509
471 (2): Remove more fn: prefixes
Issue #510 closed #closed-510
471 (3): Render false/true/NaN/INF/-INF/+INF as code
Issue #375 closed #closed-375
256: Context for default parameter values
Issue #507 closed #closed-507
125: Rename array:partition as fn:partition
Issue #505 closed #closed-505
418: Correct and expand an XSLT example
Issue #447 closed #closed-447
435, 53, 436: lambda expressions, thin arrows
Issue #410 closed #closed-410
Converting doubles to decimals, fractional digits
Issue #455 closed #closed-455
410: Converting doubles to decimals, fractional digits
Issue #483 closed #closed-483
452: window: make 'start' and 'when' optional
Pull request #510 created #created-510
471 (3): Render false/true/NaN/INF/-INF/+INF as code
NaN
, INF
, -INF
and +INF
was easy, boolean values were trickier:
- I used
<code>
for “The result/value/option/property is true/false”, “is set to true/false” and similar. - I didn’t tag “This is true/false”, “The condition is true/false” and similar.
I hope there won’t be too many conflicts if this is directly merged after #509.
Pull request #509 created #created-509
471 (2): Remove more fn: prefixes
I’ve removed additional fn:
prefixes from examples and eg
code blocks. I have kept prefixes in the rules and formal code snippets untouched.
In the initial comment of #471, I have listed the remaining cleanups for which I want to prepare PRs. I’ll wait until this and possibly some other PRs have been merged.
Issue #508 created #created-508
New Map & Array Functions: Inconsistencies
XQFO 3.1
…provides the following functions/constructs for creating and accessing maps & arrays:
Maps | Singleton Maps
:--- | :---
Decompose | –
Compose | map:merge($maps)
Create single | map:entry($key, $value)
map { $key: $value }
Extract keys | map:keys($map)
Extract values (flat) | $map?*
Arrays | Singleton Arrays
Decompose | –
Compose | array:join($arrays)
Create single | [ $value ]
Extract values (flat) | –
XQFO 4.0 Draft
…provides new functions for singletons and map representations:
Maps | Singleton Maps | Pairs (Key-Value Pair Maps)
:--- | :--- | :---
Decompose | map:entries($map)
| map:pairs($map)
Compose | map:merge($maps)
| map:of($pairs)
Create single | map:entry($key, $value)
map { $key: $value }
| –map { 'key': $key, 'value': $value }
Extract keys | map:keys($map)
| $pairs?key
Extract values (flat) | map:values($map)
$map?*
| $pairs?value
Arrays | Singleton Arrays | Members (Value Maps)
Decompose | – | array:members($array)
Compose | array:join($arrays)
| array:of($members)
Create single | [ $value ]
| array { 'value': $value }
Extract values (flat) | array:values($array)
| $members?value
The following terminology can be derived from the function names:
- A key-value pair map with a single map pair is called a Pair.
- A value map with a single array member is called a Member.
- A singleton map is called an Entry (due to
map:entry
) - We have no name for a singleton array.
Complete the Picture
I believe we should:
- rename
map:of
tomap:of-pairs
ormap:merge-pairs
(as a hint that singletons are not the expected input) - rename
array:of
toarray:of-members
orarray:join-members
(as a hint…) - add
map:pair
for creating a single pair - add
array:split
(array:tokenize
, …?) for decomposing arrays to singleton arrays
I’m not sure if we should add array functions for creating singletons or value maps; we also have array:build
.
Pull request #507 created #created-507
125: Rename array:partition as fn:partition
Reworked this PR to deal with merge conflicts. Technical change was already accepted. Made a correction to the "equivalent expression" published in the spec, which has now been tested (and was found wanting...)
Issue #454 closed #closed-454
125: array:partition
Issue #506 created #created-506
fn:error: parameter names
We should rename the $error-object
parameter to $value
, as it will be bound to $err:value
later on:
try {
error(value := 123)
} catch * {
$err:value
}
Next, “object” is rarely used in the specs.
Issue #467 closed #closed-467
map:keys-where: Return Keys That Match a Predicate
Pull request #505 created #created-505
418: Correct and expand an XSLT example
Makes a further correction to an example identified in issue #418, and adds to the example giving an alternative solution
Issue #504 created #created-504
Merge map:keys and map:keys-where
I propose that we merge map:keys
and map:keys-where
into a single function, with map:keys#1
behaving like it does now, and map:keys#2
taking over from map:keys-where#2
. Effectively the default for the second argument becomes true#0
.
Issue #30 closed #closed-30
Improve the discoverability and parseability of the mathematical operator symbols
Issue #204 closed #closed-204
Non-ascii alternative operator symbols
Issue #460 closed #closed-460
Mathematical Operator Symbols
Issue #443 closed #closed-443
@select on xsl:matching-substring and xsl:non-matching-substring
Issue #32 closed #closed-32
try/catch: New variable for all error information
Issue #452 closed #closed-452
window: make 'start' and 'when' optional
Issue #503 created #created-503
Reinstate focus functions
As a result of accepting PR #447, we have lost the ability to write simple "focus functions" that take the context item as an implicit argument, for example sort(//emp, (), ->{@salary})
.
The new status quo is that people have to write sort(//emp, (), $e->{$e/@salary})
which feels clumsy in comparison.
This issue examines options for reinstating such a capability, and perhaps making it more powerful.
A reason for dropping the syntax was that it didn't play well with the "thin arrow" operator in pipelines, but we have now changed the symbol for that to =!>
so the objection no longer applies so strongly.
Ideally we want something that not only replaces focus functions (arity one arguments accepting an argument of type item()), but also meets some or all of the following additional requirements:
-
Works well on the RHS of the
=>
and=!>
operators, in a construct that we might write as$list =!> {.+1}()
. -
Also allows arity-one functions whose argument is a sequence (
item()*
)
This becomes a lot easier if we can solve issue #129 which generalises the context item to a context value. Let's assume we do that, and keep an open mind for the moment as to whether the generalized context value is referenced as .
or ~
. I'll use ~
for now. So we want a compact notation for functions of arity one in which the function body refers to the argument value as ~
. For aesthetic reasons, because it's going to be used on the RHS of an arrow operator, we really don't want to introduce it with a leading arrow like the previous syntax ->{.+1}
. Use of "bare braces" (simply {~+1}
) is very tempting, but I think there is a good argument for leaving that part of the syntactic space unused, for extensibility and for diagnostics. I think my preference is for fn{~+1}
. Using a keyword (such as map, array, validate) before a braced expression is a uniform device and keeps the grammar coherent.
So in a callback such as fn:sort, we can write sort(//emp, (), fn{@salary})
, and in a pipeline we can write $list =!> fn{.+1}()
. (To allow this, all we need to do is to generalise what's allowed as an ArrowDynamicFunction
).
A separate question is whether we can (and should) allow the empty argument list to be omitted. I think I'm persuaded by the arguments that it's better to keep it, as a visual signal that the function is being applied, not just returned.
QT4 CG meeting 034 draft minutes #minutes-05-16
Draft minutes published.
Issue #478 closed #closed-478
467: map:keys-where
Issue #481 closed #closed-481
When we have array:build and map:build, then why do we also need array:of and map:of ?
Issue #466 closed #closed-466
460: Fix math symbols
Issue #487 closed #closed-487
485: Predeclare the prefixes math, map, array, and err
Issue #489 closed #closed-489
443: Allow select attribute on xsl:[non-]matching-substring
Issue #492 closed #closed-492
Fix examples, change filepath definition slightly
Issue #493 closed #closed-493
32: try/catch: New variable for all error information
Issue #500 closed #closed-500
Fix errant typographic quotes in XPath Data Model
Issue #502 closed #closed-502
Fix typographic quotes in XPath Data Model
Pull request #502 created #created-502
Fix typographic quotes in XPath Data Model
Close #500
On closer inspection, there were only a few places where typographic quotes were not used in prose. I've fixed those. I think the DM spec could probably use an editorial pass to add code
around some literals, but I'm not doing that in this PR.
I've left typographic quotes around code and literals because I don't think straight quotes would be an improvement: the literal “<code>3</code>”
doesn't need straight quotes because the quotes are not part of the literal.
Issue #501 created #created-501
Error handling: Rethrow errors; finally block
Re-throw errors
In https://github.com/qt4cg/qtspecs/pull/493, a function/expression was suggested to re-throw errors:
try {
(: wild stuff :)
} catch * {
module:log($err:description),
rethrow($err:map)
}
Alternatives
- Use and extend the existing error function:
fn:error(rethrow := $err:map)
- Use an expression:
throw $err:map
In principle, the error information can also be constructed by the user. If we extend fn:error
to also accept a map, it could be used to both throw and re-throw errors:
try {
1 + <empty/>
} catch * {
(: ... :)
fn:error($err:map)
}
Missing information in the map could be added as if fn:error
is raised.
let $map := map { 'column-number': 12, 'line-number': 3 }
return fn:error(xs:QName('oob'), 'Out of bounds', map := $map)
Finally clause
It can be helpful to have a code block that is always executed, even if errors occur:
let $tmp := file:create-temp-file()
return try {
(: I/O stuff :)
} finally {
file:delete($tmp)
}
Issue #500 created #created-500
Fix errant typographic quotes in XPath Data Model
In a comment on #471, @ChristianGruen observes that there are some errant typographic quotes in code examples in the XPath Data Model specification. I assume these are errors, mostly likely on my part, and should be corrected.
Issue #499 closed #closed-499
Use natural language sort order for glossary
Pull request #499 created #created-499
Use natural language sort order for glossary
This is a small stylesheet change which has the effect that the glossary in the XQuery specification (and elsewhere) now uses natural language sort order, so upper-case terms like Gregorian
and NaN
now appear in their proper alphabetic sequence.
QT4 CG meeting 034 draft agenda #agenda-05-12
Draft agenda published.
Issue #48 closed #closed-48
Create a schema-for-xslt40.xsd file for the current draft spec.
Issue #494 closed #closed-494
Remove legacy materials from the working master branch
Issue #495 closed #closed-495
separator example in https://qt4cg.org/specifications/xslt-40/Overview-diff.html#for-each-separator has xsl:sequence-of instead of xsl:sequence element
Issue #498 closed #closed-498
Fix typo, replace sequence-of with sequence
Pull request #498 created #created-498
Fix typo, replace sequence-of with sequence
Fix #495
Issue #497 created #created-497
https://qt4cg.org/specifications/xpath-functions-40/Overview-diff.html#func-map-pairs has wrong function syntax order
In https://qt4cg.org/specifications/xpath-functions-40/Overview-diff.html#func-map-pairs the explanation of the new function map:pairs
is given as:
map:for-each($map, ($k, $v) -> {map{"key":$k, "value":$v}})
I think the right syntax would be map:for-each($map, -> ($k, $v) {map{"key":$k, "value":$v}})
.
Issue #496 closed #closed-496
Ignore legacy directories entirely
Pull request #496 created #created-496
Ignore legacy directories entirely
This PR is supposed to fix the action that builds PRs so that it ignores directories we never edit.
Issue #495 created #created-495
separator example in https://qt4cg.org/specifications/xslt-40/Overview-diff.html#for-each-separator has xsl:sequence-of instead of xsl:sequence element
While looking through the XSLT 4 draft spec, I have found the following example in https://qt4cg.org/specifications/xslt-40/Overview-diff.html#for-each-separator:
<xsl:for-each select="6, 3, 9" separator="|">
<xsl:sort select="."/>
<xsl:sequence-of select="., .+1"/>
</xsl:for-each>
xsl:sequence-of
should be xsl:sequence
.
Pull request #494 created #created-494
Remove legacy materials from the working master branch
This is intended to be an entirely uninteresting change. This PR removes a whole bunch of historical artifacts from the master
branch, things like the requirements and use-cases documents that we aren't maintaining for QT4, the errata which don't apply to QT4, etc.
I will push a separate branch, legacy-documentation
, to the repository that contains all of the the files removed by this PR so that the aren't lost and can easily be recovered. (I won't do that as a PR, I'll just push it to the repository.)
In the meantime, I think this trimmed down master branch works just fine and it's a lot simpler and easier to explain.
This PR does remove support for the legacy ant builds, but I doubt they've worked for a while now.
Pull request #493 created #created-493
32: try/catch: New variable for all error information
Pull request #492 created #created-492
Fix examples, change filepath definition slightly
This PR fixes the examples in the parse-uri()
function. It also makes a small change to the filepath
property, eliding it when the scheme is known not to be file.
Pull request #491 created #created-491
Fix more examples in the FO 4.0 spec
Further corrections to example code in the F+O specification, found by testing (app-spec-examples in the test suite).
Issue #490 created #created-490
Control over schema validation in parse-xml(), doc(), etc.
I'm struggling with a problem with the stylesheet that generates QT4 tests from the examples in the function catalog, and I think it's an example of a more general problem in schema-aware processing.
The spec gives this example (for json-to-xml):
The expression json-to-xml('{"x": "\\", "y": "\u0025"}', map{'escape': true()}) returns
(with whitespace added for legibility):
<map xmlns="http://www.w3.org/2005/xpath-functions">
<string escaped="true" key="x">\\</string>
<string key="y">%</string>
</map>
But the test we actually generate expects the result:
<map xmlns="http://www.w3.org/2005/xpath-functions">
<string escaped="true" key="x" escaped-key="false">\\</string>
<string key="y" escaped="false" escaped-key="false">%</string>
</map>
and the test is failing because the result produced by Saxon correctly excludes the escaped-key="false"
attributes which the test is expecting. How did the attributes get there?
The answer is that the stylesheet is doing parse-xml()
followed by some transformation to normalise whitespace, followed by serialize()
. The parse-xml()
call is invoking schema validation, which adds default attributes.
We probably don't want schema validation here; if we do want it, we probably don't want default attribute values to be expanded. But parse-xml() doesn't give us the choice. It says it's implementation-defined and it gives no options for the user to control it. Saxon provides configuration-level options but they aren't fine-grained enough to use here.
Without being able to control this, the only option seems to be for the stylesheet to transform the result to take out the defaulted attributes that the schema processor has added.
We need options on functions like doc()
and parse-xml()
to control whether and how schema validation is performed.
One of the options we need whenever we do validation is probably "validate+strip" - validate the input, report errors if it's invalid, but return the untyped data that was supplied to the validator, not the type-annotated data with expanded defaults.
Pull request #489 created #created-489
443: Allow select attribute on xsl:[non-]matching-substring
Issue #488 closed #closed-488
433: Allow select attribute on xsl:[non-]matching-substring
Pull request #488 created #created-488
433: Allow select attribute on xsl:[non-]matching-substring
Allow a select
attribute on xsl:[non-]matching-substring
in place of the contained sequence constructor.
Issue #484 closed #closed-484
Update FO test generation stylesheet
Issue #486 closed #closed-486
Fix some errors in examples, as revealed by testing
Pull request #487 created #created-487
485: Predeclare the prefixes math, map, array, and err
In 3.1 XQuery processors were allowed to predeclare these prefixes; in 4.0 they are now required to do so.
Pull request #486 created #created-486
Fix some errors in examples, as revealed by testing
Corrects errors in examples; changes other examples to make them testable. Further test failures remain to be investigated (some may be bugs in the Saxon implementation; others require improvements to the test generation mechanism).
Issue #485 created #created-485
Predeclared namespaces in XQuery
XQuery defines that the prefixes xml, xs, xsi, fn, and local are predeclared, and states:
Additional predeclared namespace prefixes may be added to the [statically known namespaces]) by an implementation.
I propose that we add map, array, and math to this list, so that these can be used interoperably without pre-declaring them. It is already permitted for an implementation to do this, but it is not required. The change is backwards compatible, because user-defined namespace declarations override predeclared declarations.
Pull request #484 created #created-484
Update FO test generation stylesheet
Updates the stylesheet that generates tests from examples in the FO spec; plus supply a missing record definition in the function catalog so that it becomes ID/IDREF valid.
Issue #63 closed #closed-63
fn:slice, array:slice: Signatures, Examples
Issue #477 closed #closed-477
63: array:slice (editorial)
Issue #29 closed #closed-29
array:values (resolved: map:values, map:entries)
Issue #473 closed #closed-473
NaN ne NaN
Issue #321 closed #closed-321
relax $input in fn:serialize
Issue #325 closed #closed-325
Operator precedence table needs updating
Issue #482 closed #closed-482
473: NaN Comparisons (bug fix)
Issue #476 closed #closed-476
29: array:values
Issue #475 closed #closed-475
471: fn: prefix removed from function calls in the examples
Issue #472 closed #closed-472
321: Add new note and examples demonstrating adaptive serialization method
Issue #468 closed #closed-468
325 Update operator precedence table
Issue #462 closed #closed-462
434: Added examples for parse-integer()
Pull request #483 created #created-483
452: window: make 'start' and 'when' optional
Pull request #482 created #created-482
473: NaN Comparisons (bug fix)
Drops the incorrect statement suggesting that NaN xx NaN is always false, for all six operators xx. In fact NaN ne NaN is true, as statements elsewhere in the spec make clear. Specifically, the operator mapping appendix of the XPath/XQuery language spec makes clear that X ne Y
maps to not(op:numeric-equal(X, Y))
.
Issue #481 created #created-481
When we have array:build and map:build, then why do we also need array:of and map:of ?
Looking at the current specification of the pairs of functions: (array:build, array:of) and (map:build, map:of), it is impossible not to notice that the second function in each of these pairs is a weak duplicate of the first.
Also, the examples provided for array:build and array:of, seem to have a good deal of common content / duplication / overlap.
Another issue is that array:of requires as input a sequence of value records, whose definition is challenging to understand (and whose meaning seems to be solely to represent a sequence of sequences), and what is also really challenging is how to construct this argument to array:of. If this is unnatural and challenging, one would probably prefer to use just array:build.
Is there an example where it is possible to construct an array (or a map) with array:of (or with map:of) but it is impossible (or significantly more difficult) to construct the same array/map with the function array:build (or with map:build)?
If there are no such significant and convincing examples, then why do we need the xxx:of functions?
Thus the question naturally arises: "Why is the function xxx:of
necessary at all?"
Issue #480 created #created-480
Allow type promotion of xs:string to xs:anyURI
If it hasn't already been considered and ruled out, I'd like to propose adding a type promotion rule to XPath 4 that would allow xs:string
to be type-promoted to xs:anyURI
, so that functions with parameters whose types are declared as xs:anyURI
can directly take xs:string
values, without having to first cast these to xs:anyURI
.
This would empower function authors to select xs:anyURI
as a type - signaling that they’re expecting a URI - without forcing users of the function into explicitly casting their string-typed URIs.
The motivation behind this proposal is that many eXist users are frustrated when using the eXist extension functions that properly declare parameters as xs:anyURI
. If this proposal isn’t adopted, that’s ok; we’ll just eliminate the use of xs:anyURI
in our functions, as proposed in https://github.com/eXist-db/exist/issues/4632. But this would a bit unfortunate for authors of functions who see the use of xs:anyURI
as a proper expression of intent in their functions.
The change would be to https://www.w3.org/TR/xpath-31/#promotion - and I guess would be a 3rd item, called "String type promotion", saying:
A value of type
xs:string
can be promoted to the typexs:anyURI
. The result of this promotion is created by casting the original value to the typexs:anyURI
.
Issue #479 created #created-479
fn:deep-equal: Input order
#383 is about the specific order of children of element nodes.
I think we should also provide an option to ignore the top-level order of the input items:
(: returns true: both input arguments contain the same items, but in a different order :)
deep-equal(
(1 to 10),
reverse(1 to 10),
map { 'unordered': true() }
)
(: returns false: the compared elements are different :)
deep-equal(
<a><b/><c/></a>,
<a><c/><b/></a>,
map { 'unordered': true() }
)
(: returns false: the second sequence contains duplicates :)
deep-equal(
(1, 2),
(2, 1, 1),
map { 'unordered': true() }
)
Pull request #478 created #created-478
467: map:keys-where
Pull request #477 created #created-477
63: array:slice (editorial)
Pull request #476 created #created-476
29: array:values
Issue #423 closed #closed-423
[XSLT 4.0] 2.2 Notation is incomplete
Pull request #475 created #created-475
471: fn: prefix removed from function calls in the examples
#471: I’ve removed the fn:
prefixes from the function calls in the examples.
I have left pretty much all true
/false
strings untouched, since I’m not sure what would be the most consistent approach to clean them up. It will be better anyway to create a separate PR for that.
Issue #474 closed #closed-474
Per comments on #465, improve presentation of multi-line expressions
Issue #465 closed #closed-465
80: fn:iterate-while: Examples revised
Pull request #474 created #created-474
Per comments on #465, improve presentation of multi-line expressions
This PR addresses points raised in the comments on #465.
If an fos:expression
element is a code block, nest an eg
inside it:
<fos:expression><eg><![CDATA[let $input := 3936256
return fn:iterate-while(
$input,
function($result) { abs($result * $result - $input) >= 0.0000000001 },
function($guess) { ($guess + $input div $guess) div 2 }
)]]></eg></fos:expression>
I found four examples where an fos:expression
contained more than one newline and I added eg
wrappers in those cases.
There are many more fos:expression
elements that contain a single newline, but automatically formatting them as code blocks was often unsuccessful. Many of those cases seem to be just newlines entered for authoring convenience.
I also fixed the CSS for code blocks and attempte to remove trailing newlines from code blocks.
Issue #473 created #created-473
NaN ne NaN
It seems that ever since 2.0, the section in Functions and Operators "Comparison Operators on Numeric Values" (currently §4.3) has stated "If either, or both, operands are NaN, false is returned."
This is incorrect. If the operator is ne
, then the correct result is true.
(And editorially, the first two commas in this sentence should be dropped).
Pull request #472 created #created-472
321: Add new note and examples demonstrating adaptive serialization method
Per Issue 321, I've added a new note and two additional simple examples noting the adaptive serialization method to draw attention to this feature in the existing specs.
Issue #471 created #created-471
Unify formatting (function calls, code blocks, quotes) in the specification
Todos (2023-05-18):
- [x] Initial cleanup of
fn:
prefixes → #475 - [x] Remove more
fn:
prefixes → #509fn:
prefixes in examples that raise an error.fn:
prefixes ineg
code blocks- other documents:
expressions.xml
,query-examples.xml
, …
- [x] Render
false
,true
,NaN
,INF
,+INF
as code → #510 - [x] Render string values as code and use quotes:
"yes"
,"true"
,"0"
→ #511 - [x] Omit quotes for single characters:
\b
,\f
, … → #511 - [x] Rewrite simple to typographic quotes → #511
Inspired by https://github.com/qt4cg/qtspecs/pull/454#issuecomment-1534633089 ff.
The syntax of the examples in the XQFO specification is inconsistent. Sometimes, functions in the standard function namespace have an fn
prefix…
fn:fold-right(1 to 5, "", fn:concat(?, ".", ?))
fn:substring("motor car", 6)
…sometimes they don’t…
data(123)
concat("http://www.example.com/", encode-for-uri("~bébé"))
…sometimes it’s both:
fn:fold-right(1 to 5, "$zero", concat("$f(", ?, ", ", ?, ")"))
fn:concat(01, 02, 03, 04, true())
fn:tokenize(fn:unparsed-text($href), '\r\n|\r|\n')[not(position()=last() and .='')]
Should we drop or keep the prefix – or doesn’t it really matter? If there’s interest, I can create a PR (I’d tend to drop the prefixes).
In addition, there doesn’t seem to be a consistent rule for representing booleans. We have:
Syntax | Comment
--- | ---
…returns false | mostly used in the rules (seems appropriate to me)
…returns false
| used in the rules; maybe we should use replace it with the first syntax?
…the result is fn:false()
| used in the rules; maybe we should use replace it with the first syntax?
…returns false()
| mostly used in the examples (seems appropriate to me)
Pull request #470 created #created-470
369: add fixed-prefixes attribute in XSLT
A solution to some of the problems identified in issue #369. This proposal affects XSLT only.
Issue #469 created #created-469
array:of-members, map:of-pairs: Signatures, Examples
Just trivia:
a) The parameter name of array:of-members
is $input
.
$members
may be a better choice (or we should change map:of-pairs($pairs)
to map:of-pairs($input)
).
b) The type of $pairs
is record(key as xs:anyAtomicType, value as item()*, *)*
.
Shouldn’t it be record(key as xs:anyAtomicType, value as item()*)*
(without the trailing , *
)?
If the current syntax is correct, an explanatory comment could be helpful.
c) ~One map:of
example needs to be fixed: map:of((map:entry(0, "no"), map:entry(1, "yes")))
.~ See #607
See #508 for the proposal to rename map:of
to map:of-pairs
.
Pull request #468 created #created-468
325 Update operator precedence table
Add "otherwise" and thin arrow to the table. Editorial.
Issue #467 created #created-467
map:keys-where: Return Keys That Match a Predicate
Edit, 23/05/17: Reopened to discuss map:keys($map, $predicate)
as an alternative.
Motivation
We have fn:index-where
and array:index-where
to locate items/members in a sequence/an array that match a specific predicate, and we could introduce an equivalent function for maps. A recent use case can be found in https://github.com/qt4cg/qtspecs/issues/413#issuecomment-1531288167d.
Proposal
Summary
Returns keys of map entries for which the value matches a supplied predicate.
Signature
map:keys-where(
$map as map(*),
$predicate as function(item()*) as xs:boolean
) as xs:anyAtomicValue*
Properties
This function is ·deterministic·, ·context-independent·, and ·focus-independent·.
Rules
The function takes any ·map· as its $map
argument and applies the supplied function to the value of each map entry. The function supplied as $predicate
takes the value of the corresponding map entry as an argument, and the result is a sequence containing the keys of those entries for which the function returns true.
More formally, the function returns the result of the expression:
map:for-each(
$map,
function($key, $value) {
if($predicate($value)) then $key else ()
}
)
Examples
let $numbers := map { 0: 'zero', 1: 'one', 2: 'two', 3: 'three' }
return map:keys-where($numbers, function($string) { $string = 'two' })
Comments
- Edit (2023-05-04): Renamed from
map:key-where
tomap:keys-where
. - Similar functions (
index-of
,index-where
) use the singular form. - An alternative would be to add an optional
$predicate
function argument tomap:keys
. - If we decide to introduce a shorter syntax (see #129 and #436), we could have:
map:keys-where($numbers, { . = 'two ' })
Pull request #466 created #created-466
460: Fix math symbols
(1) drops the mathematical operator symbols appendix, which allowed an extensive range of non-ASCII characters as synonyms for language keywords, (2) retains × and ÷ as synonyms for multiplication and division, (3) allows full-width <
and >
in operator symbols in place of the usual ASCII characters, to avoid the need for XML escaping.
QT4 CG meeting 033 draft minutes #minutes-05-02
Draft minutes published.
Issue #449 closed #closed-449
Actions from review of PR #420
Issue #456 closed #closed-456
Revises numeric literal syntax
Issue #458 closed #closed-458
Update parse-integer and format-integer following review
Issue #224 closed #closed-224
Infrastructure changes/improvements
Issue #461 closed #closed-461
Make code more visually distinct
Pull request #465 created #created-465
80: fn:iterate-while: Examples revised
This PR is editorial. I’ve reformatted the examples for fn:iterate-while
…
…to make them better readable.
QT4 CG meeting 033 draft agenda #agenda-05-02
Draft agenda published.
Issue #464 created #created-464
Serialization sequence normalization step 3 needs clarification
The specifications currently read:
If the item-separator serialization parameter is absent, then for each subsequence of adjacent strings in S2, copy a single string to the new sequence equal to the values of the strings in the subsequence concatenated in order, each separated by a single space. Copy all other items to the new sequence. Otherwise, copy each item in S2 to the new sequence, inserting between each pair of items a string whose value is equal to the value of the item-separator parameter. The new sequence is S3.
As written ("If...then.... Otherwise...."), this implies that the process whereby a block of adjacent strings are joined into a single string is performed only when the item separator parameter is absent. I.e., if the item-separator parameter is not absent, it will not be used to string-join adjacent groups of strings. Perhaps that is as intended, but I want to make sure.
Also, as written, this implies that when the parameter is absent, the sequence begins by finding all subsequences of adjacent strings, performing concatenation. Then all non-strings are added to the sequence. That seems wrong, because it appears to advise the processor to rearrange the sequence of input items.
I propose a revision along these lines:
Copy each item in S2 to a new sequence. If a given pair of adjacent items are both strings, then separate them with a string whose value is equal to the value of the item-separator parameter or is a single space if the item-separator parameter is absent. If a given pair of adjacent items are not both strings, insert between each pair of items a string whose value is equal to the value of the item-separator parameter. Once this is finished, take each adjacent group of strings and concatenate them into a single string. The new sequence is S3.
My revision is based upon what I imagine happens, but implementers will know better than I.
I am working on some editorial touch-ups of the Serialization specifications, and can incorporate comments/suggestions made in this thread in that larger enterprise.
Issue #463 created #created-463
fn:parts() - extract the parts of a (not-really) atomic value
We have a whole raft of functions to extract the parts of date, time, and duration values: month-from-dateTime(), etc etc.
These aren't particularly convenient to use, for example getting multiple components of a duration is clumsy; and there are gaps, for example there are no functions to extract the parts of a gMonthDay
.
I propose a general-purpose function fn:parts()
which turns any of these composite atomic values into a map, enabling you to replace a call on get-month-from-dateTime($value)
with parts($value)?month
.
So far this dupllicates existing functionality perhaps with a bit of added convenience. However, the mechanism is much more extensible and flexible than what we have now:
- we can apply it to atomic types that currently have no decomposition operators, such as gMonthDay
- we can easily add additional components such as
day-of-week
orquarter
orday-of-year
orjulian-day
that are currently not available, or only available clumsily using format-dateTime(). - the parts() function is polymorphic, so the same code can be used to get the year (say) from a date, a dateTime, or a gYearMonth.
Pull request #462 created #created-462
434: Added examples for parse-integer()
Supplemental examples related to pr #434 .
Pull request #461 created #created-461
Make code more visually distinct
Close #224
Thanks @ChristianGruen for the reminder that this was still open!
Issue #460 created #created-460
Mathematical Operator Symbols
Appendix B.3 of the specification proposes a set of non-ASCII symbols that can be used in place of language keywords, for example "∃" for "some" and "∀" for "every".
I haven't detected a great deal of enthusiasm for this idea, and I can see it causing some confusion, partly because Unicode offers such a wide choice of symbols some of which are visually very similar.
I propose retaining a much smaller set of these symbols:
-
"÷" (xF7) for "div" because the symbol is widely recognised and "div" here is pretty ugly
-
"≺" (x227A) and "≻" (x227B) as alternatives to "<" and ">" in all operator symbols (other than XML markup contexts) that use these characters: for readability in contexts, especially XSLT, where the "<" and ">" characters need to be escaped
Issue #4 closed #closed-4
[XPath] [XQuery] Better names for ThinArrowTarget and FatArrowTarget
Issue #59 closed #closed-59
[FO] fn:replace no longer has the 3 an 4 argument variants
Issue #459 created #created-459
Eager and lazy evaluation
In #359, different approaches were discussed for eager and lazy evaluation. This issue could be used to
- clarify if we have the same notion of eagerness and laziness, and
- define language constructs for how eager/lazy evaluation.
Pull request #458 created #created-458
Update parse-integer and format-integer following review
Following review and acceptance of the parse-integer and format-integer functions, make changes suggested during the review. See actions QT4G-032-03 to -06.
Issue #457 created #created-457
Support parsing numeric, alphabetic, and additive number systems.
This proposal is based on the work done in https://www.w3.org/TR/css-counter-styles-3/ when defining CSS rules for formatting the numbers in list items.
The idea is to define 3 parsing strategies:
numeric
-- number-like systems such as decimal;alphabetic
-- alphabetical-like systems such as spreadsheet columns (A, B, ..., Z, AA, AB, ...)additive
-- systems like roman and hebrew where the symbol represents a fixed value that is added together
Parsing these, we have 3 properties:
system as enum("numeric", "alphabetic", "additive") := "numeric"
-- which of the parsing strategies (number systems) to use;symbols as xs:string := "0123456789"
-- the list of characters used to represent a digit;additive-symbols as map(xs:integer, xs:string) := map {}
-- a map of the symbols in an additive system with the corresponding value of that symbol.
Consideration 1 -- Should these also allow any whitespace and optional "+"/"-" symbols like the radix-based parse-integer?
Consideration 2 -- Should we define decimal format options for these, so the decimal format name can format/represent other number systems (binary, hex, hebrew, tamil, roman numerals, etc.). -- Note: this would make system
, symbols
, and additive-symbols
properties of the decimal format object with the above defaults. The formatting would work in the same way as it is defined in the CSS Counter Styles specification.
Design 1 -- Separate functions
fn:parse-numeric-integer($value as xs:string,
$symbols as xs:string := "0123456789") as xs:integer
fn:parse-alphabetic-integer($value as xs:string,
$symbols as xs:string := "ABCDEFGHIJKLMNOPQRSTUVWXYZ") as xs:integer
fn:parse-additive-integer($value as xs:string,
$additive-symbols as map(xs:integer, xs:string)) as xs:integer
Design 2 -- Combined functions
fn:parse-integer($value as xs:string,
$system as xs:string := "numeric",
$symbols as xs:string := "0123456789",
$additive-symbols as map(xs:integer, xs:string) := map {}) as xs:integer
fn:parse-integer($value as xs:string,
$radix as xs:integer) as xs:integer
Pull request #456 created #created-456
Revises numeric literal syntax
Following actions from review on 25 Apri 2023 (QT4CG-032-02), revises the new syntax of numeric literals to disallow trailing underscores. Also adds more notes and examples.
QT4 CG meeting 032 draft minutes #minutes-04-25
Draft minutes published.
Issue #429 closed #closed-429
Hexadecimal and binary literals
Issue #241 closed #closed-241
Functions integer-to-string and string-to-integer with radix
Issue #434 closed #closed-434
Functions to parse and format hex integers
Issue #433 closed #closed-433
429 Add hex and binary literals and allow underscores
Pull request #455 created #created-455
410: Converting doubles to decimals, fractional digits
@michaelhkay In this PR, I tried to undo the changes that were introduced to make comparisons transitive. I haven’t made any changes to distinct-values
and group by
, because I am uncertain if I have spotted all the relevant parts of the specification. Maybe/hopefully we can address them in a next step.
Any feedback is welcome.
Issue #293 closed #closed-293
Error in fn:doc-available specification
Issue #430 closed #closed-430
fn:doc et al, error handling: inconsistencies. Closes #293
Pull request #454 created #created-454
125: array:partition
This PR revisits array:partition, with extra editorial clarification of the spec; including but not confined to fixing issue #125.
I suggest we schedule this PR for discussion since we have not previously discussed it.
One question for the group is what the name of the function should be (including the choice of namespace).
Another is whether the polarity of the callback function should be changed (from break-when
to continue-when
or similar).
We could also consider returning an array of sequences rather than a sequence of arrays. (But in my view sequences of arrays are rather easier to manage at the moment.)
Issue #453 closed #closed-453
Fix issue #86 (incorrect default timezone format)
Pull request #453 created #created-453
Fix issue #86 (incorrect default timezone format)
Trivial bug fix.
Issue #89 closed #closed-89
[XQuery] DirPIConstructor permits ':' in the PI name.
QT4 CG meeting 032 draft agenda #agenda-04-25
Draft agenda published.
Issue #452 created #created-452
window: make 'start' and 'when' optional
Every time I use tumbling window
, I write start when true()
. If start would be optional, default to true, that would be easier
Issue #450 closed #closed-450
Fix issue #418 (editorial corrections)
Issue #438 closed #closed-438
What are the "non-whitespace control characters"?
Issue #442 closed #closed-442
Attempt to clearify XML serialization of control characters
Issue #451 created #created-451
Multiple Schemas
There are many situations in which a single transformation wants to deal with multiple schemas: for example when transforming from v1 of some industry standard to v2 of the same standard, or when processing a collection of input documents each of which references its own schema using xsi:schemaLocation
.
This is currently possible only if the schemas are compatible (that is, if the union of the schemas is itself a valid schema). And even where it is possible, validation against the union of S1 and S2 may produce a different outcome from validation against S2, for example because a strict wildcard allows content that S2 would not allow. Substitution groups are a particular problem: if v1 and v2 have elements with different substitution group membership, then validating against the union of v1 and v2 allows the union of the substitution groups, which means that you haven't actually verified that the result document is valid against v2.
The problem is confounded by considerations that are outside the scope of the spec. What happens when you run two different stylesheets against the same source document? If the source document has been validated against S1, this means that both stylesheets must use schemas that are supersets of S1. The way this requirement is managed in Saxon is to introduce the concept of a Configuration in which transformations run; a Configuration has a single schema, and all source documents and stylesheets within the Configuration must use compatible subsets of this schema. A source document validated using one Configuration cannot be used in a different Configuration, because the type annotations would be meaningless against a different schema.
My proposal is to introduce the idea of a named schema (that is, a named collection of schema components). When we do xsl:import-schema
, we can give the imported schema a name, and there is no requirement that the components in this schema should be compatible with the components in any other schema. When we refer to a schema type (for example in $s cast as QName
) we should be able to qualify the type name with a schema name (we can postpone discussions of syntax, let's say cast as my:part-number§v1
for now). When we request validation, we should be able to nominate the schema to be used for validation, for example <xsl:element name="e" validation="strict" schema="v2">
.
The trickiest part is handling source documents, mainly because validation of source documents (especially those read using doc() or collection()) is at present almost entirely implementation-defined. I believe that we need explicit options to request validation of source documents against a specific schema. There should also be an option to validate a document against the schema identified in its own xsi:schemaLocation
, in which case there should be no requirement that that schema is compatible with any schema known statically to the stylesheet.
Issue #49 closed #closed-49
[XQuery] The 'member' keyword is still present on ForMemberBinding
Issue #74 closed #closed-74
[FO] Support parsing HTML
Issue #87 closed #closed-87
[XSL] Support for "master files"
Issue #109 closed #closed-109
[xslt4] xsl:note for structured documentation
Issue #113 closed #closed-113
[xslt] Constructing arrays
Issue #239 closed #closed-239
Terminology concerning function items and their access to static and dynamic context
Issue #373 closed #closed-373
apparent copy/paste error in annotation documentation of simple type yes-or-no-or-maybe
Pull request #450 created #created-450
Fix issue #418 (editorial corrections)
Pull request #449 created #created-449
Actions from review of PR #420
Actions from review of PR #420 (QT4CG-031-01, -02); new functions map:entries()
and map:values()
from issue #29
Issue #445 closed #closed-445
Editorial updates to XSLT spec
Issue #448 created #created-448
Support extended dateTime formats of ISO-8601:2019?
The ISO 8601:2019 standard supports extended dateTime formats including support for uncertain or approximate times and new quantifiers. Apparently, the extension are documented in Extended Date/Time Format (EDTF) Specification from the US LoC.
Pull request #447 created #created-447
435, 53, 436: lambda expressions, thin arrows
Addresses issue 436 by introducing syntax similar to Java, C#, and JS for anonymous inline functions (lamda expressions). This involves finding a new symbol for the existing "thin arrow" operator; it also gives an opportunity to show how lambda expressions can be used in pipelines.
Some points for WG consideration:
(a) Do we really want the curly braces around the function body to be mandatory?
(b) What symbol should we use for the mapping arrow? I've used =!>
as it suggests to me the combination of function application and sequence mapping.
(c) Should we reinstate the special syntax for arity-one "focus functions" (->{@salary})
) which is dropped in this proposal
(d) I haven't necessarily worked through all the changes to examples needed, e.g.. in the XSLT and F+O specs.
Issue #437 closed #closed-437
xsl:where-populated and table with header
QT4 CG meeting 031 draft minutes #minutes-04-18
Draft minutes published.
Issue #357 closed #closed-357
Representing key-value pairs
Issue #420 closed #closed-420
Issue 357 Map composition and decomposition
Issue #446 closed #closed-446
Fix merge conflicts in PR #420
Pull request #446 created #created-446
Fix merge conflicts in PR #420
Close #420 Close #357
Pull request #445 created #created-445
Editorial updates to XSLT spec
This PR fixes editorial issues in the XSLT 4.0 spec: issue #373, issue #384, issue #423. It also updates the XSD schema for XSLT 4.0 to incorporate most of the syntax changes that have been made to date (though further checking is needed), and updates some 3.0/3.1 references to 4.0 references.
Issue #444 closed #closed-444
Resolve merge conflict for PR 420
Pull request #444 created #created-444
Resolve merge conflict for PR 420
Close #420 Close #357
Issue #443 created #created-443
@select on xsl:matching-substring and xsl:non-matching-substring
In the spirit of making @select
or <sequence constructor>
the norm, I think the children of xsl:analyze-string
have perhaps been overlooked.
Pull request #442 created #created-442
Attempt to clearify XML serialization of control characters
Fix #438
Clarify that the control characters #x1 through X1f and #x7f through #x9f must be output as character references except for the whitespace characters #x9, #xA, #xD, and #85.
Issue #439 closed #closed-439
ExprSingle no longer allows OrExpr
Issue #440 closed #closed-440
Fix bug #439 - grammar for ExprSingle
Issue #441 closed #closed-441
Make XSLT function formatting consistent with F&O formatting
Pull request #441 created #created-441
Make XSLT function formatting consistent with F&O formatting
Resolves action QT4CG-023-01, I believe. This is a minimal sort of fix, not an attempt to refactor everything.
Pull request #440 created #created-440
Fix bug #439 - grammar for ExprSingle
Simple bug fix, shouldn't need any meeting time.
Issue #439 created #created-439
ExprSingle no longer allows OrExpr
The grammar for ExprSingle seems to have been accidentally changed so it no longer allows an OrExpr as one of the alternatives.
Issue #438 created #created-438
What are the "non-whitespace control characters"?
In Section 5, XML Serialization, we find:
A consequence of this rule is that certain characters MUST be output as character references, to ensure that they survive the round trip through serialization and parsing. Specifically, CR, NEL and LINE SEPARATOR characters in text nodes MUST be output respectively as "
", "…", and "
", or their equivalents; while CR, NL, TAB, NEL and LINE SEPARATOR characters in attribute nodes MUST be output respectively as "
", "
", "	", "…", and "
", or their equivalents. In addition, the non-whitespace control characters #x1 through #x1F and #x7F through #x9F in text nodes and attribute nodes MUST be output as character references. (The reference to "non-whitespace control characters" appears in a few other places as well, but for basically the same purpose.)
But what are the "non-whitespace control characters"? The spec doesn't say. I think it means all of the C0 and C1 control characters except CR, NL, TAB, and NEL. The fact that vertical tab and line feed might be considered "white space" doesn't really matter anyway since none of the other C0 control characters are allowed in XML 1.0 anyway (encoded or otherwise).
XML 1.0 doesn't actually care about the C1 control characters. There's no reason to encode them, but it does no harm, I suppose. You'd have to encode the C0 and C1 control characters for an XML 1.1 parser, but none of those exist.
I wonder if it might be a little clearer to say
A consequence of this rule is that certain characters MUST be output as character references, to ensure that they survive the round trip through serialization and parsing. Specifically, CR, NEL and LINE SEPARATOR characters in text nodes MUST be output respectively as "
", "…", and "
", or their equivalents; while CR, NL, TAB, NEL and LINE SEPARATOR characters in attribute nodes MUST be output respectively as "
", "
", "	", "…", and "
", or their equivalents. In addition, the other control characters #x1 through #x1F (except #x9, #xA, and #xD) and #x7F through #x9F (except #x85) in text nodes and attribute nodes MUST be output as character references.
QT4 CG meeting 031 draft agenda #agenda-04-18
Draft agenda published.
Issue #437 created #created-437
xsl:where-populated and table with header
Dear all,
I just discover the use of xsl:where-populated but was surprised it cover only a narrow use case of single level of wrapping
My use case is about table and I have to wrap thing into a table with header
<table ... bunch of attributes'
<thead ....>
<xsl:for-each ... >
<tr>
<xsl:for-each ... >
<td>...</td>
</xsl:for-each>
</tr>
</xsl:for-each>
</table>
How can I do that with xsl:where-populated ?
Issue #436 created #created-436
Allow inline function expressions in arrow operator call chains
It can be useful to create inline functions for simple operations (e.g. adding 1 to a number) to be used in arrow operator call chains.
The current proposal uses -> { ... }
for just the thin arrow operator.
This proposal is split into two parts:
- restructure the grammar to make the function call usage simpler to follow;
- introduce the ability to use inline functions in thin/fat arrow contexts.
Part 1 -- Simplify the Grammar
I suggest changing the grammar to:
FatArrowTarget ::= "=>" ( ArrowFunctionCall | ArrowDynamicFunctionCall )
ThinArrowTarget ::= "->" ( ArrowFunctionCall | ArrowDynamicFunctionCall )
ArrowFunctionCall ::= EQName ArgumentList
ArrowDynamicFunctionCall ::= ( VarRef | ParenthesizedExpr ) PositionalArgumentList
That is, I've grouped the function name/reference with the argument list.
Part 2 -- Allow Inline Functions
FatArrowTarget ::= "=>" ( ArrowFunctionCall | ArrowDynamicFunctionCall | ArrowInlineFunctionCall )
ThinArrowTarget ::= "->" ( ArrowFunctionCall | ArrowDynamicFunctionCall | ArrowInlineFunctionCall )
ArrowInlineFunctionCall ::= ( "function" | "fun" ) EnclosedExpr
Note: here, "fun"
is a placeholder for whichever name/symbol we choose in https://github.com/qt4cg/qtspecs/issues/53.
This then allows expressions like (1, 2, 3) -> function { . + 1 }
and (1, 2, 3) => fun { ~ = 1 }
(see also https://github.com/qt4cg/qtspecs/issues/129) without overloading the meaning of ->
.
QT4 CG meeting 030 draft minutes #minutes-04-11
Draft minutes published.
Issue #435 created #created-435
Remove the inlined function expression variant of the thin arrow operator
This proposal is to remove the third bullet/variant from the thin arrow operator so that the new inline function syntax (-> { ... }
) cannot be used within the arrow expressions.
This makes the thin/fat arrows consistent in behaviour with each other, with the exception of how they pass the value to the expressions:
- thin arrow operators pass the values in the sequence one at a time to the associated function;
- fat arrow operators pass all the values in the sequence to the associated function in a single call.
Changes
- Update the syntax:
[46] FatArrowTarget ::= "=>" ((ArrowStaticFunction ArgumentList) | (ArrowDynamicFunction PositionalArgumentList))
[47] ThinArrowTarget ::= "->" ((ArrowStaticFunction ArgumentList) | (ArrowDynamicFunction PositionalArgumentList))
- Remove the text for the inline function variant:
If the arrow is followed by an EnclosedExpr:
Given a UnaryExpr U, and an EnclosedExpr {E}, the expression U -> {E} is equivalent to the expression (U) ! (E).
For example, the expression $x -> {.+1} is equivalent to ($x)!(.+1).
- Remove/update the associated examples, e.g. to use
let $f := function ($x) { $x + 1 } return $x -> f() -> $f()
.
Issue #390 closed #closed-390
Should parsing and building URIs attempt to special case Windows URIs for UNC names?
Issue #415 closed #closed-415
Revise parse/build URI functions for UNC names
QT4 CG meeting 030 draft agenda #agenda-04-11
Draft agenda published.
Pull request #434 created #created-434
Functions to parse and format hex integers
Addresses issue #241 by providing functions to parse and format integers in any number base from 2 to 36.
Pull request #433 created #created-433
429 Add hex and binary literals and allow underscores
Addresses issue #429. The grammar is extended to allow hex and binary integer literals, and all numeric literals may contain underscores for readability.
Issue #432 closed #closed-432
fix attribute name : it is diff instead of role
Pull request #432 created #created-432
fix attribute name : it is diff instead of role
Issue #417 closed #closed-417
Fix residual reference to op:A2S which is no longer defined
Issue #315 closed #closed-315
fn:transform inconsistency: initial-mode
Issue #427 closed #closed-427
Change fn:transform to use the stylesheet's default mode
Issue #280 closed #closed-280
Why is resolve-uri forbidden from resolving against a URI that contains a fragment identifier?
Issue #426 closed #closed-426
Resolve #280 by allowing a fragid
Issue #428 closed #closed-428
Fix problem in rendering empty <xnt> elements
Issue #431 closed #closed-431
Fix problem rendering xnt elements
Pull request #431 created #created-431
Fix problem rendering xnt elements
Close #428
Hi @michaelhkay . I took a slightly different approach. For unknown reasons long since lost in the mists of time the 'etc' files that act as databases for cross-spec references stored nt
elements. That's weird because the nt
elements are supposed to point to prod
elements. I expect someone (let's be candid, probably me) got confused by the fact that nt
elements have a def
attribute and thought they were definitions. They're not. I've changed things so that the prod
elements are now stored in the database. There's only going to ever be one of those.
I had to tidy up a few things to make that work, and we can't abandon support for nt
files in 'etc' documents because we have existing files that don't get regenerated.
I also cleaned up the cross-reference error to StringLiteral
in the XSLT spec and patched over a problem with a few link in the XQuery specifications.
There's no useful information from PR formatting of PRs that change the stylesheets, so I'm just going to cross my fingers and merge this. Please pull the latest and let me know if you see any problems!
Pull request #430 created #created-430
fn:doc et al, error handling: inconsistencies. Closes #293
Resolves action item QT4CG-029-04.
Issue #429 created #created-429
Hexadecimal and binary literals
Without wanting to challenge our weekly burn-down chart too much, I wonder whether it would be a big deal to support hexadecimal and binary literals in XPath? Examples:
(: decimal :) 1, 255,
(: hexadecimal :) 0x1, 0X00Ff,
(: binary :) 0b1, 0B11111111,
The main question is probably if it conflicts with the existing grammar?
Pull request #428 created #created-428
Fix problem in rendering empty <xnt> elements
In the F&O spec, function parse-QName
, there is a link to BracedURILiteral
repeated 6 times. This happens when an <xnt>
element is written with empty content. There are 6 entries in the /etc/ XP40 file for the relevant grammar symbol, and each of them is output. I haven't tried to eliminate the redundancy in the /etc/XP40 file, I have simply changed the code for processing <xnt>
so it only considers the first one.
Issue #411 closed #closed-411
Remove the note from the parse-html unparsed-entity sections.
Pull request #427 created #created-427
Change fn:transform to use the stylesheet's default mode
Close #315
Pull request #426 created #created-426
Resolve #280 by allowing a fragid
Fix #280
(Based off the right branch this time. I hope.)
Issue #424 closed #closed-424
Allow fn:resolve-uri to resolve against a base URI that includes a fragment identifier
Issue #425 created #created-425
Structural proposal (ThinLayer:tm:) : Add a layer of thin spec between XPath and the XPath Derived Language
XPath is ubiquitous and is used even in places where we have no idea On the other hand XPath is very useful to us as the centerpiece of XSLT, XQuery
In order to allow people to have a more expanded use of XPath without to have to get the whole XSLT, XQuery story, it is perhaps the time to consider adding a thin layer of spec in order to have
- XPath
- Some typing definition mechanism
- Some function definition mechanism
The idea is also to better integrate this with all Validation technologies (XSD, Relax NG, NVDL, JSONSchema, etc.) and allow EXPath, EXQuery and other to have a standard way to use all this around
We also want people that want to use only XPath (for example in LinQ or inside SQL) to have a broader capacity to interact with XML (instead of being limited to XPath 1.0 with namespaces)
I will try to add more and more precision to this proposal along the line, but I feel it is good enough to be a first stone to break and allow people to help drive this initiative
For the moment the name of this new beast is XPathWithCustomizableTypesAndFunctions in our proposal call XPath Next https://github.com/XPath-Next/XPath-Next/blob/first-draft/spec.md
QT4 CG meeting 029 draft minutes #minutes-04-04
Draft minutes published.
Pull request #424 created #created-424
Allow fn:resolve-uri to resolve against a base URI that includes a fragment identifier
Fix #280
Issue #423 created #created-423
[XSLT 4.0] 2.2 Notation is incomplete
"language" and "prefixes" are used in the definition of element but not defined here
On the other hand, "nmtokens" is defined but not used
Issue #22 closed #closed-22
[XPath] Allowing multiple let clauses in LetExpr and for clauses in ForExpr
Issue #416 closed #closed-416
NCName is usually lowercase in attribute type for the rest of the spec
Issue #422 closed #closed-422
Fix syntax in examples
Pull request #422 created #created-422
Fix syntax in examples
attribute namespace-uri corrected to namespace xsl:sequence-of corrected to xsl:sequence
Issue #419 closed #closed-419
fix few syntax issues in the XSLT 4.0 examples
Issue #421 created #created-421
Make sure the build system syntax checks the syntax of examples
Apparently the code is lying around somewhere...
Pull request #420 created #created-420
Issue 357 Map composition and decomposition
This PR addresses the issues concerned with map composition and decomposition in issue #357. It adds a function to decompose a map into key-value pairs (map:key-value-pairs), and its inverse (map:of). It adds explanatory material to F+O to explain how these functions relate to each other, and it adds examples to the XSLT spec to show how the interwork with the xsl:array, xsl:map, and xsl:for-each instructions. Note: the map functions have been sorted alphabetically, so the changes will appear more extensive than they are.
Pull request #419 created #created-419
fix few syntax issues in the XSLT 4.0 examples
Issue #418 created #created-418
array and map attribute in xsl:iterate and xsl:for-each-group
It seems there is still some places where we can still spot some remnant peices of attribute array and attribute map in the spec
You can find this sentence at two places
, or constructed from the expressions in the array or map attributes.
Also there is some examples
- Example: Grouping entries in a Map
- Example: Processing an array using xsl:iterate
Pull request #417 created #created-417
Fix residual reference to op:A2S which is no longer defined
The changes made to redefine array functions in terms of array:members
and array:of
rather than op:A2S
and op:S2A
weren't applied to array:get because that was the subject of a separate PR to add the fallback
option. This PR corrects the omission.
Issue #403 closed #closed-403
Michaelhkay actions 2023 02 01
Pull request #416 created #created-416
NCName is usually lowercase in attribute type for the rest of the spec
Pull request #415 created #created-415
Revise parse/build URI functions for UNC names
Fix #398 Fix #390
This PR attempts to address the questions raised in issues 389 and 390:
- It adds a
unc-path
option that is used to guide the parsing and construction of URIs that represent Windows UNC paths - It adds a
filepath
property to the result ofparse-uri
. This property represents the local path part of the URI. For file: URIs, this is the local path. - It addresses the use of "
|
" in URIs to represent the ":" in Windows filenames - It clarifies that percent-decoding a path also involves intepreting the result as a UTF-8 sequence
- It clarifies that percend-decoding and encoding apply to the query parts of a URI as well.
Issue #300 closed #closed-300
[F+O] Ambiguity regarding Unicode normalization (editorial)
QT4 CG meeting 029 draft agenda #agenda-04-04
Draft agenda published.
Issue #414 created #created-414
Lift character set restriction of xs:string
Adopted from https://github.com/qt4cg/qtspecs/issues/413#issuecomment-1491469514
I guess that raises the question of whether it is still appropriate to restrict the character set of xs:string to that of XML 1.0. Are there any benefits in doing so?
I believe that would simplify things a lot, in particular when working with input/output functions.
QT4 CG meeting 028 draft minutes #minutes-03-28
Draft minutes published.
Issue #404 closed #closed-404
Rework changes from action-qt4cg-019-01 to resolve persistent conflicts.
Issue #398 closed #closed-398
User-defined functions clashing with constructor functions
Issue #406 closed #closed-406
Revise xsl:array instruction and examples
Issue #408 closed #closed-408
Fix issue #398 (clash with constructor functions)
Issue #413 created #created-413
New function: parse-csv()
I propose a new function parse-csv() that accepts a CSV string (such as might be read from a CSV file using unparsed-text()). CSV is as defined in RFC 4180; implementations may be liberal in what they accept, and may define additional options.
An options parameter includes the option header=true|false to indicate whether the first line should be taken as containing column headings.
The result of the function is a sequence of maps, one map per row of the CSV file (excluding the header). Each map contains one entry per column, the key being taken from the column header if present, or an integer 1...n if not.
Pull request #412 created #created-412
409, QT4CG-027-01: xsl:next-match
Clarifies the rules for xsl:next-match, especially for 4.0 type patterns, but also clarifying the exposition of rules unchanged since 3.0 or 2.0 (see issue 409)
Pull request #411 created #created-411
Remove the note from the parse-html unparsed-entity sections.
This applies the review action:
RD to remove the note in 15.5.15 of functions and operators.
QT4 CG meeting 028 draft agenda #agenda-03-28
Draft agenda published.
Issue #410 created #created-410
Converting doubles to decimals, fractional digits
Adopted from a previous discussion on Slack: The result of the following computation…
<x>2</x> + .1
…is serialized as 2.1
. If the result is cast to a decimal via xs:decimal(<x>2</x> + .1)
, 2.100000000000000088817841970012523233890533447265625
is returned, which feels counterintuitive.
Can we possibly change the conversion rules without compromising backward-compatibility?
Issue #409 created #created-409
XSLT: xsl:next-match and xsl:apply-imports interaction with on-multiple-match
(This is an oversight in the XSLT 3.0 specification.)
It is possible for xsl:next-match
or xsl:apply-imports
to encounter a conflict - two template rules with the same precedence and priority. In this situation it should do exactly what xsl:apply-templates
does when it encounters a conflict, for example it should follow the rules of xsl:mode/@on-multiple-match
.
This is all fairly obvious, but it should be stated explicitly (and tested). The spec is written as if conflicts can only occur when finding the first matching rule, and not when finding next-match rules.
Issue #392 closed #closed-392
Partial function application: Placeholders with keywords
Pull request #408 created #created-408
Fix issue #398 (clash with constructor functions)
Add clarification to XSLT and XQuery specs to say that a used-defined function must not clash with a constructor function for an imported atomic type.
Issue #407 created #created-407
XSLT-specific context properties used in function items
I just stumbled across the fact that current-group#0 doesn't work: see the note in 14.2.1 that says:
Like other XSLT extensions to the dynamic evaluation context, the [current group] is not retained as part of the closure of a function value. This means that the expression current-group#0 is valid and returns a function value, but any invocation of this function will fail with a dynamic error [see [ERR XTDE1061].
This restriction is unnecessary and we should remove it. As people become more accustomed to using function items, they don't want to hit restrictions like this, and there's really no good implementation reason for it.
Pull request #406 created #created-406
Revise xsl:array instruction and examples
This PR revises the design of the xsl:array instruction to align it with the recently agreed specs for the functions array:members
and array:of
. The composite
attribute and the xsl:array-member
instruction are dropped; instead the instruction takes a use
attribute which is an expression used to compute each array member value from the corresponding item in the value of the select expression or sequence constructor.
Issue #360 closed #closed-360
Issue 314 array composition and decomposition
Issue #405 closed #closed-405
MK PR #360 with merge conflicts resolved (array compositionand decomposition)
Pull request #405 created #created-405
MK PR #360 with merge conflicts resolved (array compositionand decomposition)
Close #360
This PR is the same as 360 but fixes merge conflicts.
Accepted at meeting 027
Close https://github.com/qt4cg/qtspecs/issues/314
Issue #400 closed #closed-400
Priorities for type-based patterns
Issue #401 closed #closed-401
Issue 400: ranking of type patterns
Issue #395 closed #closed-395
Make the (non-)hierarchical nature of URIs explicit
Issue #394 closed #closed-394
Minor correction to fn:parse-uri
Issue #393 closed #closed-393
Clarify explanations of functions/function items
Issue #391 closed #closed-391
addressed typographical errors; adjusted Unicode character discussion…
QT4 CG meeting 027 draft agenda #agenda-03-21
Draft agenda published.
Issue #336 closed #closed-336
Action QT4CG-019-01 (type of $pattern in fn:tokenize())
Pull request #404 created #created-404
Rework changes from action-qt4cg-019-01 to resolve persistent conflicts.
I reworked these (purely editorial) changes based on the current master to try and resolve the conflicts once and for all.
Pull request #403 created #created-403
Michaelhkay actions 2023 02 01
Issue #402 created #created-402
XSLT patterns: intersect and except
I would like to propose making an incompatible change to the semantics of XSLT patterns using the "except" and "intersect" operators, so that they have their intuitive meaning.
Consider the pattern p except appendix//p
. Anyone writing this probably imagines that this will match any p
element that does not have an appendix
as an ancestor. The intuitive meaning of A except B
is to match anything that matchesA
unless it also matches B
.
The actual meaning in the XSLT 3.1 specification is that it matches any node $N
that has an ancestor $A
such that the result of the XPath expression $A//(p except appendix//p)
includes $N
.
Consider the XML
<appendix>
<div>
<p>...</p>
</div>
</appendix>
The <p>
element here has an ancestor (the <div>
element) where the result of $A//(p except appendix//p)
includes the <p>
element. So despite having an ancestor appendix
this element matches the pattern p except appendix//p
. This is not only a counter-intuitive result, it also makes such patterns useless in practice.
Patterns using intersect
suffer the same problem, though it is much harder to construct a plausible example.
Patterns that only use the child or attribute axis, for example @* except @code
, or * except note
, don't suffer from this problem and will retain the same meaning as in 3.1.
The required effect can be achieved by writing p except p[ancestor::appendix]
. Because the pattern p[ancestor::appendix]
is equivalent to appendix//p
, people are very likely to imagine that p except p[ancestor::appendix]
is equivalent to p except appendix//p
.
Making any incompatible change to the language semantics should be done only with a very strong justification, but I believe that it is justified in this instance. The existing semantics are not only counter-intuitive, they are also sufficiently useless that it is extremely unlikely anyone has existing working code, other than artificial test cases, that relies on the current semantics.
Issue #387 closed #closed-387
Add compatibility notes for fn:namespace-uri-for-prefix
Issue #385 closed #closed-385
Actions QT4CG-025-07 / -08
Issue #378 closed #closed-378
Update the localName and unparsed entity reference notes for parse-html
Pull request #401 created #created-401
Issue 400: ranking of type patterns
Proposes how to handle type-based match patterns (record tests, in particular) in the absence of explicit priorities, basing the decision on the type hierarchy. Note: user-defined priorities are always considered before any inferred selectivity rules. Also fixes some grammar problems with type patterns. See Issue #400.
Issue #400 created #created-400
Priorities for type-based patterns
XSLT §6.6 currently has a big TODO:
TODO: define default priorities for type patterns, as suggested in https://www.saxonica.com/papers/xmlprague-2020mhk.pdf section 6.5.1
We need to plug this gap. [Note: it's worth reading that cited section as it points out some of the difficulties].
I'm going to suggest an alternative approach. Rather than allocating a numeric priority to patterns such as record(lat, long)
, we allocate them a relative priority -- called their selectivity
-- based on the subtype relationship among types. This is a partial ordering. So we extend the rule that currently orders patterns by (1) import precedence, (2) priority, (3) declaration order, to become instead (1) import precedence, (2) selectivity, (3) priority, (4) declaration order.
Type-based patterns (such as type(xs:integer)
, record(lat, long)
) are defined to have higher selectivity than any non-type-based pattern; all the latter (that is, all XSLT 3.1 patterns) are defined to have equal selectivity, which means the rules for discriminating among 3.1 patterns are unchanged.
For type-based patterns, we define that a pattern based on type T has higher selectivity than a pattern based on type U if T is a subtype of U. If neither is a subtype of the other, then they have equal selectivity.
The type pattern type(T)
followed by one or more predicates is deemed to have higher selectivity than type(T) with no predicates, but apart from this, the predicates are ignored. Explicit numeric priorities can be used to define an ordering among type patterns that have the same selectivity.
Issue #399 created #created-399
fn:deep-equal: Using Multilevel Hierarchy and Abstraction when designing and specifying complex functions
Whenever a function is too-complex, its precise and clear specification becomes problematic and the complexity results in huge volume of text that is difficult to fathom, understand and the correctness of whose meaning becomes less and less obvious.
Solving this problem would benefit all groups of readers, be they future implementors or just curious XPath enthusiasts.
Here I present one well-known and successfully tried in practice solution, which the Romans summarized in the phrase: "Divide et Impera" ("Divide and conquer")
Below is one possible splitting of the functionality of fn:deep-equal into different smaller and simpler functions on 5 levels, each level possibly dispatching to a function on another level. The intermediate-level functions each have their own value and could be used independently of fn:deep-equal and of each other. Even though there is the possibility of recursion, we can still get a simple picture and immediate understanding of this functionality, just playing with the following collapsible/expandable representation (click on the corresponding arrow), which fits on a single screen:
(writing the full specification from this animated picture is left as an exercise for the reader 😂
deep-equal-sequence
deep-equal-item
deep-equal-atomic
deep-equal-map
deep-equal-atomic
deep-equal-sequence
deep-equal-array
deep-equal-sequence
deep-equal-node
deep-equal-document
deep-equal-element
deep-equal-attribute
deep-equal-NS
deep-equal-PI
deep-equal-comment
deep-equal-text-node
deep-equal-attribute
deep-equal-NS
deep-equal-PI
deep-equal-comment
deep-equal-text-node
Issue #398 created #created-398
User-defined functions clashing with constructor functions
There is no explicit rule in either XQuery or XSLT that a user-defined function must not clash (in name and arity) with a constructor function for an imported atomic type. It's implicit in the rule that you can't have two conflicting functions in the static context, but it would be helpful to say so explicitly and define an error code.
I have added tests to the XSLT3 and XQuery3 test suites.
See also Saxon bug 5921 - https://saxonica.plan.io/issues/5921
Issue #397 created #created-397
Type names
The draft specifications propose the introduction of item type declarations that can associate a name with an item type. The feature probably still needs some work, which this issue aims to explore.
The main purpose of introducing named item types is that the ItemType for a record structure or a function signature can become quite complex and lengthy, and you don't want to have to repeat them every time they are used because it means you have to make the same change everywhere when a change occurs. Another motivation is to allow type definitions (for example, of records or functions) to be recursive.
I considered allowing named sequence types rather than just item types, but the rules for where you can and can't have an occurrence indicator get complicated, so I pulled back from that.
It seems natural to say:
- Item type names are QNames
- In XPath, type names (and their mapping to item types) appear in the static context
- In XQuery, type names follow the conventions for global variables and function declarations. That suggests they can appear either in the main module or a library module; in a library module they must be in the namespace of the module; they can be annotated as %public or %private; an
import module
declaration makes the name visible in the importing module. - In XSLT, a name declared in a module is automatically available throughout the stylesheet package, and can be exposed to other packages using the same visibility mechanisms as other stylesheet components. However, I don't think it makes sense to allow a type name to be overridden, either using import precedence or using xsl:override.
The question then arises, should item type names be in the same "symbol space" as named atomic and union types? There seem to be several options here:
(a) Item type names are in a different symbol space from atomic types; the are no rules barring the same name being used for a named item type and an atomic type, and they are disambiguated by requiring item type names to be distinguished using some kind of marker syntax such as type(name)
, rather than just a bare name.
(b) Item type names are in the same symbol space as atomic types, which means there must be a rule that an item type name must not be the same as an atomic type name that is visible in the same place. We could try and define this rule for individual names, or at the level of namespaces (if there are any atomic/union types in a particular namespace in the static context of any module, then there must be no declared type names in that namespace in that module, either declared in that module or imported from another module).
(c) Atomic type names "shadow" item type names, or vice versa: if the same name is used for both, then one of them takes precedence. Probably not a good idea.
I'm inclined to go for (b). Note that a simple rule that item type names can't be in a reserved namespace will prevent conflict for all non-schema-aware applications, since those applications only access atomic types in the xs
namespace.
Now, what about circular definitions?
There are legitimate circular definitions, like declare item type LIST = record(payload as item()*, next? as LIST)
, and there are "impossible" definitions, like declare item type THING = THING
. Do we have to define the rules needed to ban "impossible" definitions, or can we just leave it that the determination of whether something is an instance of THING is non-terminating? I think we probably need to define the rules, which will require careful thought.
Where can item type names be used? The simple answer is: anywhere an ItemType is allowed. But what about contexts that only allow some ItemTypes and not others? For example, (a) "cast as", (b) as arguments of a LocalUnionType, (c) as the key type in a map type. (The solution in the current draft is that the syntax allows any ItemType to be used in these contexts, and there are semantic rules to constrain what kind of item types are allowed).
If we allow $v cast as my:X
where my:X
is a declared item type name, should we also allow the constructor function my:X($v)
? That would presumably also mean that item type names and function names cannot overlap.
Should we define any "built-in" item type names? We've been defining built-in functions (such as build-uri and parse-uri) whose signatures use record type definitions. Should we define built-in names for these record definitions?
An editorial issue: I think it's becoming increasingly difficult to get away with overloading the word ItemType
to mean both the abstract concept of an item type, and the specific BNF construct used to define it. Same for SequenceType
. I think we should probably move to having a defined term "item type" and a BNF construct such as ItemTypeDesignator
to represent the two separate meanings.
QT4 CG meeting 026 draft minutes #minutes-03-14
Draft minutes published.
Pull request #396 created #created-396
333: Deep-equal, no failure when comparing functions
Refines the spec of fn:deep-equal so it no longer fails when comparing function items, rather it returns a result which in general is implementation-dependent, though it must be false unless the functions are provably equivalent.
Pull request #395 created #created-395
Make the (non-)hierarchical nature of URIs explicit
The fn:parse-uri
function will parse hierarchical or non-hierarchical URIs, however, the parse cannot be reversed if the fn:build-uri
function doesn't know whether the scheme is hierarchical. Consider fn:parse-uri("querty:abc")
:
map {
"path":"abc",
"scheme":"querty",
"path-segments":["abc"],
"uri":"querty:abc"
}
When fn:build-uri
parses that map, it produces: querty://abc
because the scheme is not known to be non-hierarchical.
This PR changes fn:parse-uri
so that it records whether or not the URI was hierarchical and fn:build-uri
to use that information.
It's possible that we could finesse this by setting the authority
to the empty string for hierarchical URIs, but it seems clearer to be explicit.
This PR also fixes a bug. Previously, if the scheme
was not present when building a URI, the URI began with //
. That's an error. If the scheme isn't present, there should be no scheme separator.
Issue #320 closed #closed-320
Issue 98 - add options parameter to fn:deep-equal
Pull request #394 created #created-394
Minor correction to fn:parse-uri
The fn:parse-uri()
function recognizes "URIs" of the form c:/path/to/thing
as implicitly being file:
URIs. This small change adds a leading "/" to make the fact that it is a path explicit.
Issue #25 closed #closed-25
[XPath] `%variadic("sequence")` does not allow specifying some argument values in the variadic sequence, and in one case even not the variadic sequence itself
Issue #26 closed #closed-26
[XPath]A value in the last row (for "sequence-variadic" functions) of the table "Number of Arguments allowed in a Function Call" is incorrect
Issue #54 closed #closed-54
[XPath] [XQuery] Keyword arguments don't work with all parameters/keys in static functions.
Issue #47 closed #closed-47
[XPath] [XQuery] Allow argument placeholders on keyword arguments
QT4 CG meeting 026 draft agenda #agenda-03-14
Draft agenda published.
Issue #386 closed #closed-386
Action QT4CG-025-05 (markup typo)
Pull request #393 created #created-393
Clarify explanations of functions/function items
This PR is purely editorial in the sense that it does not attempt to make any changes that would affect an implementation. It's intended to clear up ambiguity and lack of clarity in the description of operations on functions, in particular the way that a function item captures static and dynamic context. It addresses issues #239 and issue #392.
Issue #392 created #created-392
Partial function application: Placeholders with keywords
It's clear that the following is allowed:
format-date(current-date(), '[Y]-[M]-[D]', place:=?, language:=?, calendar:="AD")
The resulting function item takes two arguments (place and language) but in what order? Is it the order of parameters in the original function definition, or the order in which they appear in the partial function application?
I think it should be the latter, but this needs to be made explicit in the spec.
Note that this doesn't only apply to optional parameters as in the above example, it applies equally, for example to
starts-with(substring=?, value=?)
While we're on the subject, we should also ask whether
concat(value83 := ?)
is legal, and if so, what it means.
Pull request #391 created #created-391
addressed typographical errors; adjusted Unicode character discussion…
… for internal local consistency, clarity
This being my first PR, I opted to include beyond the typos I noted in #289 another small block of hopefully uncontroversial edits as a test balloon.
Issue #278 closed #closed-278
array bound checking
Issue #289 closed #closed-289
Proposal to add fallback behaviour to map:get and array:get
Issue #390 created #created-390
Should parsing and building URIs attempt to special case Windows URIs for UNC names?
Depending on the platform and language APIs involved, we see file:
URIs encoded in a variety of different ways. It doesn't help that there's no official RFC for file:
URIs.
file:/path/part
is afile:
URI with no host and a path of/path/part
.file:///path.part
is afile:
URI with an explicitly empty host and a path of/path/part
.file://path/part
is afile:
URI with an authority ofpath
and a path of/part
. I think one common way to interpret this is as if it wasfile:/part
. That is, infile:
URIs, although a different host is possible, it's often just ignored.c:\path\part
is most usefully interpreted asfile:/c:/path/part
, afile:
URI with no host and a path of/c:/path/part
. These are only going to be useful on a Windows system, so it isn't a problem to treat them the same way on all platforms. (Aside: I don't actually know if the path part should bec:/path/part
instead, but it's currently got the leading slash infn:parse-uri()
.)
And then there's this: file:////name/path/part
.
One interpretation is, "look, we accept file:/
and file:///
so let's just accept file://
and file://///////
, etc. as the same." And I think that's generally right, with the single special exception of file:////
. The problem is that on Windows, this is a very common way to encode the URI for a UNC path, that is: \\name\path\part
which is a Windows UNC path for \path\part
on a host named name
(via whatever networking protocol backs UNC).
You'd think that this should be file://name/path/part
, but I think because browsers and maybe other tools just discard the authority part of a file:
URI (or maybe because these are paths in some Windows sense?), that's not how they're encoded.
Aside: Yes, I'm sure you also see
file:\\\\name\path\part
andfile:c:\path\part
and other forms as well. Those are out of scope, they're simply, flatly, completely wrong. You can't use\
as a delimiter in a URI. RFC 3986 is authoritative on this point. Step one of dealing with random strings we think should be URIs is replacing all\
with/
because RFC 3986.
It's problematic to deal with file:////
as a special case, but it's also problematic to leave out support for a common pattern on a widely deployed operating system.
Recognizing four slashes after file:
and treating that specially isn't hard. The hard part is how do we encode this in the map that fn:parse-uri
produces bearing in mind that the result should round-trip if you push it back through fn:build-uri
.
Consider file:////uncname/path/part
Today, that is parsed as:
map {
"uri": "file:////uncname/path/part",
"scheme": "file",
"authority": "uncname",
"host": "uncname",
"path": "/path/to/file",
"path-segments": array { "", "path", "to", "file" }
}
and that doesn’t round trip. If you feed that to fn:build-uri
, you get file://uncname/path/part
and that absolutely doesn’t mean the same thing on a Windows machine.
We could encode the slashes in the authority
in which case we also have to encode them in the host
because in the presence of host
, the authority
isn’t used to by fn:build-uri()
:
map {
"uri": "file:////uncname/path/part",
"scheme": "file",
"authority": "////uncname",
"host": "////uncname",
"path": "/path/to/file",
"path-segments": array { "", "path", "to", "file" }
}
It kind of works, but it’s really ugly and it means we have a host value that is a complete kludge. It doesn’t match the RFC rules for hostnames at all.
The other option that occurs to me is to add a “unc-path” property to the map:
map {
"unc-path": true(),
"uri": "file:////uncname/path/part",
"scheme": "file",
"authority": "uncname",
"host": "uncname",
"path": "/path/to/file",
"path-segments": array { "", "path", "to", "file" }
}
That works but it introduces all sorts of possibilities for incoherent data, such as an https:
URI with a unc-path
flag set to true()
.
What’s the right answer?
- Ignore the UNC path special case, it’s the users problem to deal with them.
- Recognize them, encode the details in the
authority
andhost
. - Recognize them, use a special property like
unc-path
. - Recognize them, and do this other much better idea I have: ________________
Issue #389 created #created-389
The fn:build-uri function needs to perform URI encoding for path and query segments
The fn:parse-uri
function describes decoding, but the fn:build-uri
function fails to encode.
Issue #388 closed #closed-388
Update the example background color in serialization
Pull request #388 created #created-388
Update the example background color in serialization
This PR completes my action to fix the dark blue background in examples in the serialization spec. I've made them the same as the examples in the XSLT spec which seem to have been satisfactory.
Issue #328 closed #closed-328
Switch Cases: Lift single-item restriction on operands
Issue #28 closed #closed-28
[XPath] Support multiple clauses in ForExpr and LetExpr.
Pull request #387 created #created-387
Add compatibility notes for fn:namespace-uri-for-prefix
Action QT4CG-024-01
Pull request #386 created #created-386
Action QT4CG-025-05 (markup typo)
Pull request #385 created #created-385
Actions QT4CG-025-07 / -08
Improves termdef markup; adds error code; updates change history appendix.
Issue #344 closed #closed-344
Issue 22: allow "for"/"let" keyword to be repeated in XPath
Issue #307 closed #closed-307
Parsing and building URIs comments and queries
Issue #347 closed #closed-347
Attempt to clarify fn:parse-uri and fn:build-uri
Issue #355 closed #closed-355
Action QT4CG-022-02 - add to imp-def-feature appendix
Issue #370 closed #closed-370
Bump XSLT version
Issue #345 closed #closed-345
Missing rule for matching atomic values against atomic types
Issue #363 closed #closed-363
Fix issue #345 - missing rules for type matching
Issue #364 closed #closed-364
Generalize switch expressions in XQuery (issue #328)
Issue #371 closed #closed-371
Issue 370: forwards and backwards compatibility for 4.0
QT4 CG meeting 025 draft minutes #minutes-03-07
Draft minutes published.
Issue #147 closed #closed-147
Terse syntax for map entries
Issue #60 closed #closed-60
[FO] fn:namespace-uri-for-prefix no longer supports passing a prefix by string
Issue #45 closed #closed-45
Second parameter of fn:sum must be neutral element for +
Issue #384 created #created-384
Definition of "effective value" in XSLT
The term "effective value" is defined in XSLT with a rather narrow definition in the context of attribute value templates. The term is used throughout the spec (sometimes hyperlinked, sometimes not) in a much more general sense, for example the "effective value" of an attribute is the explicit value given to the attribute, or the value after basic normalization such as whitespace stripping, or the default value if the attribute is not present.
This affects the determination of the correct result for test merge-021, where it is a little ambiguous whether two xsl:merge-source/@order
attribute have the same "effective value" given that one is defaulted.
QT4 CG meeting 025 draft agenda #agenda-03-07
Draft agenda published.
Issue #383 created #created-383
fn:deep-equal: Order of child elements (unordered-elements)
At meeting 024 where PR https://github.com/qt4cg/qtspecs/pull/320 was accepted, there remained an open question of how best to specify that in some circumstances the comparisons should be made without regard to the order of (some) children.
Can the name of the option be improved?
Should the option support wildcard names?
Issue #382 created #created-382
Improve whitespace handling in deep-equal
At meeting 024 where PR https://github.com/qt4cg/qtspecs/pull/320 was accepted, there remained an open question of how to deal with whitespace.
The current options can be seen as having somewhat overlapping domains. Can this be improved?
Issue #381 created #created-381
Deep-equal comparisons without errors
At meeting 024 where PR #320 was accepted, there remained an open question of how to deal with errors.
On the one hand, in order for fn:deep-equal
to be most easily used as a comparison function in the many contexts where a comparison function is required, it would be best if it simply returned false()
rather than raising an error when incomparably items are encountered.
On the other hand, making "return false()" the default will mean that it is possible to construct items that are not equal to themselves, which will certainly violate the expectations of some users.
This conflict needs to be resolved somehow.
QT4 CG meeting 024 draft minutes #minutes-02-28
Draft minutes published.
Issue #377 closed #closed-377
Published XQuery 4.0 spec renders XML predefined entities instead of literal characters
Issue #380 closed #closed-380
Removed CDATA sections around markup
Pull request #380 created #created-380
Removed CDATA sections around markup
Fix #377
I took a minimal approach here. I've removed CDATA sections where the section contained an &
but not a <
.
- If the section contains
<
, then it's (presumably) necessary to escape the markup - If the section does not contain an
&
, then it's irrelevant. But not removing it limits the number of places changed by the script
Issue #379 created #created-379
Namespace handling in parse-html
The HTML5/"Living Standard" specification has two modes when it comes to handling namespaces:
- For XHTML content the document is parsed as XML with full namespace support.
- For HTML content, it has pseudo-namespace support.
For example, the HTML parsing algorithm:
- places html, svg, and mathml elements in their corresponding namespaces.
- allows certain element/attribute tag names (e.g.
xlink:href
) to be parsed as QNames.
From the XSLT/XQuery perspective, this affects the data model. Specifically, how to model and specify the node-names and the set of namespaces associated with a given element node.
Pull request #378 created #created-378
Update the localName and unparsed entity reference notes for parse-html
This PR applies the following changes:
- [x] QT4CG-021-03: RD to change must to will in DOM notes about lowercase
- [x] QT4CG-021-04: RD to revise and move the note about unrecognized entities
Issue #377 created #created-377
Published XQuery 4.0 spec renders XML predefined entities instead of literal characters
When rendered in the browser, XML examples in the XQuery 4.0 specification show, for example, '<
' instead of '<
':
Issue #376 created #created-376
add documentation prefix attribute to xsl:stylesheet
Although the addition of xsl:note is very welcome, i had been hoping for something like the xsl:stylesheet attribute extension-element-prefixes, e.g. ignored-element-prefixes.
The specification would be something like, Elements and attributes associated with an ignored element prefix are not treated as direct constructors, and are removed when the stylesheet is compiled. For such an element, this is equivalent to having an xsl:use-when attribute with value false on the element; for attributes, they are simply discarded along with their value.
It is not an error for a prefix to be listed both as an ignored element prefix and as an extension element prefix; the result is implementation dependent in this case, but MUST not result in neither an extension being invoked nor the element or attribute being ignored.
Ignored elements may appear anywhere in the input tree, and ignored attributes may appear on any element.
Example:
<xsl:template match="city/park" css:module="main">
<css:rule>
color: green;
trees: tall;
</xsl:rule>
<div class="park">
<xsl:apply-templates />
</div>
</xsl:template>
Pull request #375 created #created-375
256: Context for default parameter values
This is an attempt to resolve issue #256 by providing details of the static and dynamic context for evaluating default parameter values, including providing a mechanism for accessing parts of the static and dynamic context of the caller.
If this PR is accepted we will need to follow up with (a) similar changes to XSLT, and (b) use of the new notation in the signatures of standard functions and operators that have context-dependent default values for parameters.
Note that the PR also breaks up the rather unwieldy sections for Function Declarations and Variable Declarations into more manageable subsections, which has involved some re-ordering; some of the change marking may therefore be spurious.
Issue #374 created #created-374
Can't view the XSD for XSLT in the browser
If you attempt to open https://qt4cg.org/specifications/xslt-40/schema-for-xslt40.xsd
in the browser (in Firefox), you'll get:
Error loading stylesheet: An unknown error has occurred (805303f4)
http://www.w3.org/2008/09/xsd.xsl
In a Chrome-derived browser I get a blank screen on which even the context menu doesn't work. Digging about in the inspect window leads me to
Unsafe attempt to load URL http://www.w3.org/2008/09/xsd.xsl from frame with URL
https://qt4cg.org/specifications/xslt-40/schema-for-xslt40.xsd. Domains, protocols and ports must match.
I conclude that the problem is trying to load the XSL for XSD from a different domain. Boo. I guess we should copy those stylesheets to qt4cg.org
, or remove the stylesheet PI, or ignore the whole thing on the assumption that we'll eventually publish these specifications in some W3C location and the probem will go away. Maybe.
Issue #373 created #created-373
apparent copy/paste error in annotation documentation of simple type yes-or-no-or-maybe
The XSD 1.1 schema for XSLT 3 and the one for XSLT 4 (at https://qt4cg.org/specifications/xslt-40/schema-for-xslt40.xsd) has an error in the annotation/documentation section of the simple type yes-or-no-or-maybe
as it there says One of the values "yes" or "no" or "omit".
. I think that should be One of the values "yes" or "no" or "maybe"
, the error probably exists as someone copied the text from the yes-or-no-or-omit
type declaration and forgot to adapt the description.
Issue #372 created #created-372
Separate default namespace for elements from the default namespace for types
Currently the static context provides a "default namespace for elements and types". It's not at all clear why these should be the same. For types, the vast majority of QNames representing types are in the XML Schema namespace, which is never used for elements.
In the current 4.0 drafts the two default namespaces are separated; but this has not been reviewed or agreed by the CG. This issue is raised for discussion of the change, and I will also review the design to see whether it still make sense.
Some observations on the current text for XQuery:
- In section 2.2.1 (static context) it would be good to give a bit more detail (if only as a forwards reference) about the circumstances in which the default element namespace and the default type namespace are used.
- In 3.4 Sequence Types the sentence "[Lexical QNames]appearing in a [sequence type] have their prefixes expanded to namespace URIs by means of the [statically known namespaces] and (where applicable) the [default element namespace] or [default type namespace]" is rather inelegantly worded. If there is a prefix, then the statically known namespaces are used; if there is none, then the relevant default namespace is used, and it would be nice to explain more clearly which one applies.
- In 3.6 Item Types, we need to be clearer about references to named/declared item types, and about how the names are resolved. Do we really want these names to be in the same symbol space as atomic types? Perhaps we should have a rule that Item Types (like functions) must be in a namespace and this must not be the same as an imported schema namespace.
- In 5.14, Default namespace declaration, there seems to be duplication between the two paragraphs starting "for backwards compatibility reasons"
- Appendix C.1 (much though I dislike it) should say something about the initialisation of the default namespaces for elements and for types.
Note that issue #65 talks of the need for different default namespaces for input and output elements. I think that's a separate issue.
Observations on the current text for XSLT:
- In 5.1.2.1 the paragraph "The [xsl:]xpath-default-namespace attribute must be in the [XSLT namespace] if and only if its parent element is not in the XSLT namespace needs to be generalised to [xsl:]default-element-namespace. In fact, this rule should move to the parent section 5.1.2, which needs an introduction.
Pull request #371 created #created-371
Issue 370: forwards and backwards compatibility for 4.0
This is essentially editorial; it updates the XSLT rules for forwards and backwards compatible processing to acknowledge the fact that the current version is now 4.0.
Issue #370 created #created-370
Bump XSLT version
There are various places where the XSLT spec refers to XSLT 3.0 where it should now refer to 4.0.
QT4 CG meeting 024 draft agenda #agenda-02-28
Draft agenda published.
Issue #19 closed #closed-19
[xslt] annotation-prefixes
Issue #84 closed #closed-84
Proposal : allow ignorable <xsl:div> wrapper for documentation or organize the code
Issue #189 closed #closed-189
Adopt the coercion rules for variables in XQuery
Issue #352 closed #closed-352
The @array attribute of xsl:for-each-group is no more
Issue #354 closed #closed-354
Combine multiple signatures of XSLT functions to use defaults
Issue #353 closed #closed-353
Issue109 xsl note
Issue #362 closed #closed-362
Drop obsolete note in XSLT regarding for-each-group/@array
QT4 CG meeting 023 draft minutes #minutes-02-21
Draft minutes published.
Issue #369 created #created-369
Namespaces for Functions
What problem are we trying to solve? Essentially, I think "namespace clutter".
Namespace clutter manifests itself in several different ways.
- Firstly, declaration clutter in source code. Here's the start of a module in an XSLT Stylesheet of medium complexity:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="#all"
version="3.0"
xmlns="http://ns.saxonica.com/xslt/export"
xmlns:doc="http://www.saxonica.com/ns/documentation"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
xmlns:ex="http://ns.saxonica.com/xslt/export"
xmlns:f="MyFunctions"
xmlns:t="MyTypes"
expand-text="true">
Eight namespace declarations here, of which 3 are concerned with functions; and Wit can get a lot worse than that.
- Secondly, namespace clutter in the static and dynamic context. The namespace bindings shown above don't disappear when the code is compiled; even with exclude-result-prefixes="yes", they have to hang around at run-time just in case someone tries to resolve a QName dynamically. Preserving the namespace context in the expression tree through optimization rewrites is a significant cost that has no user benefit; very rarely are they actually going to use the namespace context at run time.
- Thirdly, prefix clutter in the executable code. Writing
math:cos(math:cos($x))
is just so clumsy compared withcos(cos($x))
.
I think there are a number of things we can do to reduce this.
First, separate out the namespace context for static resolution of function names as a separate part of the static context, used only for this purpose. Ensure that there is no functionality that depends on knowing this part of the static context at run time, so it can be discarded by the compiler as soon as function names are resolved. Then provide source syntax for binding function prefixes to function namespaces in XSLT and XQuery to populate this part of the static context; there is no reason this has to be done using XML namespace declarations. There is also no reason for having different bindings in force in different parts of a single module. And once we've separated these declarations from XML namespace declarations, there's no reason why we can't provide default bindings. We could also allow bindings to have cross-module scope to reduce duplicated code. Note: the xsl:function-library proposal in the current XSLT 4.0 draft tries to achieve some of these things.
Second, allow functions to be referenced by local name alone where the reference is unambiguous; and perhaps provide some aliasing mechanisms to make more existing names unambiguous.
We've explored a third idea, which is to introduce some kind of polymorphism where function names have local scope and are distinguished by the types of objects to which they are applied. I think that given our type system, this is very hard to achieve and I haven't seen any very satisfactory proposals. We also need to remember that there are considerable costs if we start resolving function names dynamically at run time. I wouldn't rule out making progress in this direction, but I'm not optimistic of coming up with a workable solution. There might be some simple things we could do, like having a single function size()
that performs the work of both map:size()
and array:size()
depending on the argument.
Issue #318 closed #closed-318
Serialization HTML/XHTML output methods: meta elements and the charset attribute
Pull request #368 created #created-368
129: Context item generalized to context value
This is a first cut proposal to generalize the context item to a context value, allowing (for example) array predicates.
The proposal covers XPath and XQuery only at this stage; it doesn't address the consequences for XSLT.
Careful review requested!
Addresses issue #129 and issue #367.
Issue #367 created #created-367
Focus for RHS of thin arrow expressions
We define A -> F(B, C)
as being equivalent to A ! F(., B, C)
which means that B and C are evaluated with a focus based on the current item in A, not with the outer focus. This is different from the =>
operator. For example if the $E is an element E, with several children called F, then
namespace-uri(.) -> fn:QName(name())
has a different effect from
namespace-uri(.) => fn:QName(name())
whereas it might reasonably be expected that in the case where the LHS produces a single value, the two operators are equivalent. We can't change the meaning of =>
because it's defined in 3.1. So should we change the meaning of ->
to fall into line?
We could do this easily enough by defining A -> F(B, C)
as equivalent to for $a in A return F($a, B, C)
. I think that as well as being more consistent with =>
, the result is probably more intuitive. (We could also define it as equivalent to let $f := F(?, B, C) return A ! $f(.)
)
For the expression A -> {B}
, and for the proposed A => {B}
, I don't think we have any choice other than evaluating B with an inner focus based on A. But at least we can do it consistently for both operators.
Issue #366 created #created-366
Support xsl:use-package with xsl:package-location
Unless I am misreading the specs (which I do commonly enough), there is currently no way for an XSLT writer using xsl:use-package
to indicate where the package is to be found, except outside the XSLT environment. I propose to allow xsl:use-package
to contain zero or more xsl:package-location
children. I propose the addition of an element and not an attribute, because a package may be in multiple locations, and need nuance, as noted below.
Attributes:
@href
, on the model ofxsl:import
andxsl:include
, would specify by relative or absolute URI where the package is.@priority
(default 0) would provide a mechanism to indicate whether the specifiedxsl:package-location
should override (value greater than 0), or simply provide a fallback for (less than or equal to 0), the preconfigured place the package should be retrieved from.@use-when
would allow a developer to manage different versions of a package for different cases.
Other attributes given to xsl:package-location
would need discussion.
Issue #365 created #created-365
switch, typeswitch: Optional braces
The indentation of switch expressions is often a mess. Now that we allow curly braces for if
, it would be nice to also allow optional braces for switch
and typeswitch
:
typeswitch($item) {
case xs:numeric return 'number'
default return '...'
},
switch($item) {
case 0 to 9 return 'single digit'
default return '...'
}
The current syntax is:
typeswitch($item)
case xs:numeric return 'number'
default return '...',
switch($item)
case 0 to 9 return 'single digit'
default return '...'
Pull request #364 created #created-364
Generalize switch expressions in XQuery (issue #328)
Issue #337 closed #closed-337
Local union and enum types: and the definition of generalised atomic types
Pull request #363 created #created-363
Fix issue #345 - missing rules for type matching
Pull request #362 created #created-362
Drop obsolete note in XSLT regarding for-each-group/@array
Fixes issue #352
Issue #361 created #created-361
Named arguments: $input vs. $value
Great effort has been made in unifying the parameter names of the XQFO standard; thanks for that!
I believe to remember that:
$value
,$values
,$value1
, etc. is used for atomic/atomized arguments, whereas$input
,$input1
, etc. is used for input, mostly of typeitem()
, that is processed unchanged.$uri
is used for arguments that could have been defined as items of typexs:anyURI
.
I believe the following argument names need to be double-checked (if not, It may be that I haven’t fully grasped how the naming rules are supposed to work):
Function | Currently | Presumably | Justification
--- | --- | --- | ---
array:slice
| $input
| $array
| Alignment with array:size
et al.
trace
| $value
| $input
| Argument is not atomized
json
| $input
| $value
| Argument is atomized
string
| $item
| $value
| $item
is used nowhere else
expanded-QName
| $qname
| $value
| Alignment with prefix-from-QName
et al.
resolve-QName
| $qname
| $value
| Alignment with prefix-from-QName
et al.
parse-QName
| $eqname
| $value
| Alignment with parse-xml
et al.
parse-json
| $json
| $value
| Alignment with parse-xml
et al.
json-to-xml
| $json
| $value
| Alignment with parse-xml
et al.
char
| $name
| $value
| Input may also be codepoint values, etc.
namespace-uri-for-prefix
| $prefix
| $value
| $prefix
is used nowhere else
resolve-uri
| $relative
| $uri
| Absolute URIs are legal as well
array:append
| $add
| $member
| Alignment with array:put
And we should probably pay particular attention to the naming conventions when adding new functions.
Pull request #360 created #created-360
Issue 314 array composition and decomposition
This PR addresses parts of issue 29, issue 113, and issue 314 relating to the composition and decomposition of arrays.
It introduces two functions array:of
for array composition, and array:members
for decomposition, and defines all other array functions in terms of these two primitives (replacing the internal functions op:A2S
and op:S2A
). The items in the decomposed form of an array are called "value records", singleton maps of the form map{'value': $value}
The function array:from-sequence
is renamed array:build
to reflect its symmetry with map:build
.
Question for the group: should we have a new function for constructing a "value record", or is the syntax map{'value': $value}
adequate for the purpose?
Issue #359 created #created-359
fn:void: Absorb result of evaluated argument
Summary
Absorb the result of the evaluated argument.
Signature
fn:void(
$input as item()*
) as empty-sequence()
Motivation
Developers tend to get creative if they want to suppress the result of an expression. The reason is that there is no simply solution to do this properly. Some constructs I have seen in practice:
let $unused := EXPRESSION
return 'ok'
EXPRESSION[position() = 10000], 'ok'
let $result := 'ok'
return if(exists(EXPRESSION)) then $result else $result
Cases like this are frequent in nondeterministic code. Think e.g. of side-effecting functions of the EXPath HTTP-Client and File Modules: The function results are not always relevant for the invoking application, or already known.
The function is also helpful during development and for testing code. fn:void#1
and fn:identity#1
can both be passed on to functions to either return or ignore the result of their arguments. The function can potentially be used to measure the runtime performance of an expression (but an implementation should not be prevented from discarding the function call if the argument expression is deterministic).
Issue #358 created #created-358
serialization indent whitespace
There could be an option to control whether the serialization indents with space or tabs, and how many of them (e.g. 2 or 4 spaces )
Related: https://github.com/qt4cg/qtspecs/issues/101
A user request: https://github.com/benibela/xidel/issues/100
Issue #357 created #created-357
Representing key-value pairs
A map can be decomposed into, or composed from, a sequence of key-value pairs (KVPs).
There are two natural representations of a key-value pair (K, V): it can be represented as a singleton map (map{ K: V }
) or as a "doubleton" map (map{ 'key': K, 'value': V}
).
This issue examines how well either of these representations is currently supported, which of them is preferable, and how this support should be improved.
I'll consider the following basic operations: constructing a KVP from a key and a value, assembling a map from a set of KVPs, decomposing a map into a sequence of KVPs, extracting the key from a KVP, and extracting the value from a KVP.
Singleton Representation
Constructing a KVP from a key and a value:
map{ $key : $value }
map:entry($key, $value)
<xsl:map:entry key="$key" select="$value"/>
Assembling a map from a set of KVPs
map:merge($kvps)
<xsl:map>
Decomposing a map into a sequence of KVPs:
map:for-each($map, map:entry#2)
Extracting the key from a KVP:
map:keys($kvp)
Extracting the value from a KVP:
$kvp?*
Doubleton Representation
Constructing a KVP from a key and a value:
map{ 'key': $key, 'value': $value }
Assembling a map from a set of KVPs
map:build($kvps, ->{?key}, ->{?value})
Decomposing a map into a sequence of KVPs:
map:for-each($map, ->($K, $V){map{ 'key': $key, 'value': $value })
Extracting the key from a KVP:
$kvp?key
Extracting the value from a KVP:
$kvp?value
Analysis
The singleton representation is better supported at present, and it makes sense therefore to fill in the gaps that currently make it awkward. The main attraction of the doubleton representation is the ease of extracting the key and the value using $kvp?key
and $kvp?value
. The equivalents for the singleton representation (map:keys($kvp)
and $kvp?*
) feel clumsy and unintuitive; however, it's not at all obvious what would be better, short of introducing new custom syntax, which seems over-the-top. The best idea I can come up with is to have two functions map:key($kvp)
and map:value($kvp)
which require $kvp to be a singleton map. But I hate the namespace prefixes...
The other thing needed to "fill the gaps" is a function map:entries($map)
equivalent to map:for-each($map, map:entry#2)
.
What if we chose to go the other way, and improve support for the doubleton representation?
We could add map:key-value-pair($key, $value)
to create KVP, and map:of($kvps)
to build a map from a set of KVPs, and map:key-value-pairs($map)
to decompose a map. The trickiest problem is what to do about XSLT, where the 3.0 instructions <xsl:map>
and <xsl:map-entry>
use the singleton representation.
Issue #356 created #created-356
array:leaves
1. Issues
There are at least two issues with the definition of the function array:flatten
:
-
Unlike most other functions on arrays (such as array:put, array:replace, array:append, array:slice, array:subarray, array:remove, array:insert-before, array:tail, array:trunk, array:reverse, array:join, array:for-each, array:filter, array:for-each-pair, array:sort, array:partition) , which produce an array as their result, this function produces only a sequence
-
This function is not lossless -- any members that are the empty sequence or the empty array are not represented in the returned result.
2. Suggested solution(s)
We want to have a function that is similar to the wrongly defined one, but produces its contents as an array, and is lossless. There are two obvious ways to do this:
-
Correct the specification of
array:flatten
so that its result is an array and it represents the empty sequences and empty arrays as the same members of its result. -
Add to the Specification a new function:
array:leaves
that produces an array as its result and that is lossless. array:leaves returns an array whose members are exactly all the leaves of the input array, by the order of their appearance. By definition leaves are all, and at any depth, members that are not an array except when they are the empty array. Thus () (the empty sequence) and [] (the empty array) are leaves by definition.
Solution 2. will not cause any compatibility issues.
3. Examples
The expression array:leaves([1, (), [4, 6], 5, 3])
returns [1, (), 4, 6, 5, 3]
.
The expression array:leaves([1, 2, 5], [[10, 11], 12], [], 13)
returns [1, 2, 5, 10, 11, 12, [], 13]
.
QT4 CG meeting 023 draft agenda #agenda-02-21
Draft agenda published.
Pull request #355 created #created-355
Action QT4CG-022-02 - add to imp-def-feature appendix
Adds entries to the implementation-defined-features appendix of the serialization spec, corresponding to the option to generate <meta charset="XXX">
for HTML5.
Pull request #354 created #created-354
Combine multiple signatures of XSLT functions to use defaults
This PR addresses issue 69, by modifying those XSLT built-in functions that currently have multiple signatures, to use a single signature with parameter defaults instead.
The changes however don't currently render correctly. The XSLT processing pipeline needs to be changed to pick up the changes that were made to the F+O stylesheets to render parameter defaults correctly. I haven't yet managed to work out where this is done.
Pull request #353 created #created-353
Issue109 xsl note
Addresses issue #109 and issue #87. Unfortunately the PR also includes the unrelated commits for issue 22.
Issue #352 created #created-352
The @array attribute of xsl:for-each-group is no more
There is a note in XSLT §14.2 concerning the @array attribute of xsl:for-each-group, but this attribute has been dropped.
Issue #351 closed #closed-351
Another attempt to build off the merge-base branch
Pull request #351 created #created-351
Another attempt to build off the merge-base branch
Issue #341 closed #closed-341
[XPath] Error-free selection operator for maps or arrays, or finite-domain functions
Issue #350 created #created-350
CompPath (Composite-objects path) Expressions
CompPath (Composite-objects path) Expressions
As initially discussed in issue #341, we were exploring different ways to provide an XPath-like language to traverse in depth composite objects such as maps and arrays and select their members at any depth. While working on this, the idea of an XPath-like language for composite items started to emerge and here we present this idea in a more or less crystalized form.
1. Root Component
Any CompPath expression must start off a composite item (of type map or array, or of other future composite item type (maybe set? ) ). This can be a literal composite item or a reference to a variable whose value is a composite item.
Examples:
(: Literal composite items: :)
[1, 2, 3]
[1, [2, 3]]?2
{"x":1, "y" : map{ "z": 2}}
{"x":1, "y" : map{ "z": 2}} ?y
(: Variables containing composite items: :)
let $comp1 := [1, [2, 3]],
$comp2 :=$comp1 ?2,
$comp3 := {"x":1, "y" : map{ "z": 2}},
$comp4 := $comp3 ?y
In the above examples all literal expressions and all variables ($comp1, $comp2, $comp3, $comp4
) may serve as the root component for a CompPath expression.
2. The component-path operator (\)
The component-path operator "\" is used to build expressions for locating members at any depth within component trees. Its left-hand side expression must return a result that is a composite item or else this result is represented as such by wrapping it into an array.
The operator returns an array, the values of whose members are composite items themselves or any such value may be a non-composite "leaf" in the root-component tree).
Each operation E1\E2 is evaluated as follows: Expression E1 is evaluated, and the result is wrapped in an array A1. If any member of A1 is not a composite item, a type error is raised.
Each member of A1 serves in turn to provide an inner "composite-focus" (the member as the "composite-context-item" or .
, its index in A1 as the "composite-context-position" or index()
, the set of keys of the composite-context-item as the "composite-keyset" or keys()
and the size of this member as the "composite-context-size" (specified as one of: size()
, or array-size()
or key-size()
) ) for the evaluation of E2. The result of each evaluation of E2, if it isn't a single composite item, is wrapped in a single array. The arrays resulting from all the evaluations of E2 are wrapped in a single array and this single array is the result of the evaluation.
E2 is typically a function over the context-focus and its results will be the set of the next step composite-context-items (used as the left-hand-side of the next in chain composite-step-expression (see below)), or these results would be the final results of evaluation if this is the last-in chain composite-step-expression.
3. Composite-Steps
A composite-step is a part of a composite-path-expression that generates an array and filters its members by zero or more predicates. A composite-step-expression is either a CompositeAxisStep or a CompositePostfixExpression.
4. Composite-Axes
The following axes are defined for traversing a composite-item tree:
- The
child-member::
axis contains the members of the composite-context-item. - The
value-member::
axis contains the members of the composite-context-item that are not composite themselves. - The
node-member::
axis contains the members of the composite-context-item that are nodes. - The
descendant-member::
axis is defined as the transitive closure of thechild-member::
axis; it contains the descendent-members of the composite-context-item (the child members of the composite-context-item, and their child-members, ... and so on). - The
self::
axis contains just the composite-context-item. - The
descendant-member-or-self::
contains the composite-context-item and all of its descendent-members. - The
following-sibling-member::
axis contains the members of the immediate container of the composite-context-item that follow it. For any two members mem1 and mem2 of a composite item Comp, by definition mem2 follows mem1 if and only if Comp is an array and the index of mem2 in Comp is greater than that of mem1, or if Comp is a map, then the key of mem2 is greater than that of mem1. - The
preceding-sibling-member::
axis contains the members of the immediate container of the composite-context-item that precede it. For any two members mem1 and mem2 of a composite item Comp, by definition mem1 precedes mem2 if and only if Comp is an array and the index of mem2 in Comp is greater than that of mem1, or if Comp is a map, then the key of mem2 is greater than that of mem1.
For example, following-sibling-member::5
means all members of the composite-context-item with index > 5,
and preceding-sibling-member::5
means all members of the composite-context-item with index < 5
Note: If the immediate container of the composite-context-item is a map whose key-values cannot be ordered, then specifying either of the following-sibling-member::
or preceding-sibling-member::
axes on this composite-context-item must raise a type error. (Obviously, these two axes are meaningful only for composite items, whose members are ordered, such as the array).
If the composite-axis name is omitted from a composite-axis step, the default axis is child-member::
5. Composite Axis Steps
A composite axis step completely resembles the ordinary axis step in XPath. It consists of three parts:
- The composite axis (
child-member::
,descendant-member::
,value-member::
,node-member::
,following-sibling-member::
,preceding-sibling-member::
,self::
, or thedescendant-member-or-self::
axis) - The member test
- The composite-predicates
6. Member Tests
A member test is a condition on the key-name, index, or kind (composite, map, array or value, node, or (any) member). A member test determines which members contained by a copmosite-axis are selected by a composite-step.
As such, a member test is either an identifier-test (key-name or index) or a kind-test (composite, map, array, value, or member)).
Examples of member identifiers:
-
A string specifies a name of a key, whose value will be selected. For example:
\child-member::X
selects from the composite-context-item the value corresponding to its key which has the name "X". -
\child-member::3
selects from the composite-context-item the value of its 3rd member, if it is an array or the value corresponding to its key3
, if it is a map. -
following-sibling-member::3
selects from the composite-content-item (which is most-likely an array) all of its members having index greater than 3. -
preceding-sibling-member::3
selects from the composite-content-item (which is most-likely an array) all of its members having index less than 3. -
\descendant-member-or-self::X
selects from the composite-context-item (that must be a map) and from all its descendant-members, the values corresponding to their key named "X", if these descendents have a key named "X". -
Similarly
\5
is equivalent to\child-member::5
and selects from the composite-context-item that is an array the value of its 5th member. This will also select the value corresponding to the key5
from the composite-context-item if it is a map, because on thechild-member::
axis both maps and arrays may be selected. -
\X
is equivalent to\child-member::X
and selects from the composite-context-item (that must be a map), the value corresponding to its key which has the name "X".There is also the pseudo-operator
\\
. This is an abbreviation for:\descendant-member-or-self::member()\
Thus,
\\X
means: "(Deep) Select all members of the root-component that are the corresponding values of keys equal to 'X' " -
We may use a kind test as part of the previous example, if we want to select only a specific kind of members of the composite-context-item.
\array()
In this example, although we are on thechild-member::
axis, we want to select only members of the composite-context-item that are arrays. -
\map()
In this example, although we are on thechild-member::
axis, we want to select only members of the composite-context-item that are maps. -
\value()
In this example we want to select only members of the composite-context-item that are not composite items themselves. -
\node()
In this example we want to select only members of the composite-context-item that are nodes. -
\member()
In this example we want to select all members of the composite-context-item, regardless whether they are maps, arrays, or values.
6.1 Wildcards
The *
wildcard can be used instead of a member identifier. Its meaning is to select all existing members of the composite-context-item, that is possibly selected by a specific axis and limited by a specific member kind-test.
Examples:
\*
(: (Shallow) Selects all members of the composite-context-item :)\map()\*
(: Selects from the composite-context-item all values that correspond to a key of any map-member of the composite-context item :)\array()\*
(: Selects from the composite-context-item all members of all its members that are arrays :)\\*
(: (Deep) Select all members of the composite tree rooted by the root-component :)
7. Predicates
As defined above, a composite-step has three parts: composite-axis (can be omitted and then a default axis is used), member test, and an optional list of composite-predicates.
A composite-predicate in a composite-step is an expression used as a filter applied on the members of the composite-context-item that are already selected by the axis and member tests of the axis step, and not filtered out by any preceding composite-predicates in the composite-predicates-list. The composite-predicate may be any XPath expression and is written within double square brackets.
Examples:
\*[[3]]
(: Selects any member of the composite-context-item, that is an array and has a 3rd member or any member of the composite-context-item, that is a map and has a key 3 :) This is a shorthand for:\*[[array-size() ge 3 or 3 = keys()]]
\array()[[3]]
(: Selects those array members of the composite-context-item that have a 3rd member :) This is a shorthand for:\*[[size() ge 3]]
\*[[size() eq 7]]
(: Selects those members whose array-size() or key-size() is exactly 7:) This is a shorthand for:\composite::*[[self::map() and key-size() eq 7 or self::array() and array-size() eq 7]]
\*[[X]]
(: Selects any member of the composite-context-item, that is a map and has a key X :)\map()[[X]]
(: Selects any map member of the composite-context-item, that has a key X :) The above two expressions are a shorthand for:\*[['X' = keys()]]
\value()[[. gt 0]]
(: Selects any value (non-composite member) of the composite-context-item, that is a positive number :)
8. Mixing CompPath and XPath expressions
CompPath and XPath expressions can be used as parts of a single expression:
-
A CompPath expression may be appended at the end of any XPath expression that produces a composite-object .
-
An XPath expression may be appended at the end of any CompPath expression. When doing this,
CompPathExpr / XPathExpr
is equivalent to:
CompPathExpr\node::* / XPathExpr
And this:
CompPathExpr ! XPathExpr
(: Note: also causes ordering and deduplication of the nodes! :)is equivalent to:
CompPathExpr\value::* ! XPathExpr
(: Note: No ordering or deduplication, can be applied on any item, not just on nodes :) -
A CompPath expression may be substituted for the expected argument of any XPath expression, for example:
count(MyCompPathExpr)
-
Any XPath expression that produces a composite item can be used as the composite-root for any CompPath expression
Example:
let $myBooks :=
<books>
<book name="Tom Sawyer">
<author>Mark Twain</author>
</book>
<book name="Wuthering Heights">
<author>Emily Brontë</author>
</book>
<book name="Jane Eyre">
<author>Charlotte Brontë</author>
</book>
<book name="Adventures of Huckleberry Finn">
<author>Mark Twain</author>
</book>
</books>,
$map1 := map {"science-works": map{"Einstein": "Special Theory of relativity",
"Darwin" : "On the Origin of Species"
},
"literature" : map{"19the Century": $myBooks}
}
return
$map1\literature\\*/book[author eq 'Mark Twain']
Evaluating this mixed CompPath and XPath expression produces the correct result:
<book name="Tom Sawyer">
<author>Mark Twain</author>
</book>
<book name="Adventures of Huckleberry Finn">
<author>Mark Twain</author>
</book>
Issue #349 closed #closed-349
Revert PR change; it doesn't work in this context
Pull request #349 created #created-349
Revert PR change; it doesn't work in this context
Issue #348 closed #closed-348
Attempt to build PR with merge-base version of master
Pull request #348 created #created-348
Attempt to build PR with merge-base version of master
This PR changes the CI build-pr.yml
script so that it checks out the version of master that the branch started from, rather than the current version of master, for building the specifications.
- Pro: we won't get build failures when the current master can't build the old version (for example, when images have been removed)
- Con: we won't get any features from the current master, such as stylesheet updates
Since failing builds are more troublesome than formatting issues, I'm going to say the pros outweigh the cons.
Pull request #347 created #created-347
Attempt to clarify fn:parse-uri and fn:build-uri
Fix #307
Issue #346 closed #closed-346
Remove dagger from record cross-references
Pull request #346 created #created-346
Remove dagger from record cross-references
Record types are better supported by the stylesheets so the dagger is simply a distraction.
Issue #345 created #created-345
Missing rule for matching atomic values against atomic types
In XPath §3.6.2 we have forgotten to state the obvious rule:
"An Atomic Value AV matches a generalized atomic type GAT if the type annotation of AV (call it T) satisfies the condition derives-from(T, GAT)."
At the same time it would a good idea to clarify whether locally-declared union and enum types fall within the definition of "schema types" (I think they should do so.)
Issue #342 closed #closed-342
Issue318 meta elements
QT4 CG meeting 022 draft minutes #minutes-02-14
Draft minutes published.
Issue #338 closed #closed-338
Add ednote per action QT4CG-016-02
Pull request #344 created #created-344
Issue 22: allow "for"/"let" keyword to be repeated in XPath
Addresses the proposal in issue 22 to allow repetition of the "let" or "for" keyword in a ForExpr or LetExpr. (It does not, however, allow "for" and "let" to be mixed).
Issue #343 created #created-343
$collation argument: Unification
In the function set of the XQFO current specification, the type of the $collation
parameter is sometimes xs:string
and sometimes xs:string?
, depending on the position of the parameter. Examples:
Mandatory
fn:distinct-values($values as xs:anyAtomicType*, $collation as fn:default-collation()) as xs:anyAtomicType*
fn:index-of($input | as xs:anyAtomicType*, $search | as xs:anyAtomicType, $collation as xs:string) as xs:integer*
Optional
fn:sort($input as item()*, $collation as xs:string?, $key as function(item()) as xs:anyAtomicType*
fn:lowest($input as item()*, $collation as xs:string?, $key as function(item()) as xs:anyAtomicType*) as item()*
I think we should always allow an empty sequence.
Pull request #342 created #created-342
Issue318 meta elements
Revises the rules for serializing meta elements to take account of new HTML5 syntax.
Resolves issue #318
Issue #18 closed #closed-18
[DM31] Function types do not form a hierarchy
Issue #58 closed #closed-58
[XQuery] String Value Templates
Issue #107 closed #closed-107
Allow self::(a|b|c)
Issue #234 closed #closed-234
If Without Else
Issue #330 closed #closed-330
Update fn:parse-html to apply review feedback.
Issue #341 created #created-341
[XPath] Error-free selection operator for maps or arrays, or finite-domain functions
In March 2021 Jarno Elovirta raised on the #general channel of the XML.com Slack the problem that the existing map or array lookup operator "?" prevents a free traversal of a nested mapp/array object. For example, this expression results in error:
[
map {"k0": 1},
map{"k0": [1, 2, 3]}
] ?* ?("k0") ?*
[XPTY0004] Input of lookup operator must be map or array: 1.
There are three possible types of reaction to this problem:
-
Do nothing
-
Relax the semantics of the map/array lookup operator "?" so that it can be applied on items of non-map/non-array type and in such case produce the empty sequence.
-
Introduce a similar operator to "?" that will behave as it, but instead of producing an error when applied on items of non-map/non-array type it produces the empty sequence.
Obviously, we are not advocating the 1st choice above, or otherwise we wouldn't be raising any issue 😄
Choice 2 could be implemented, but this would have a few drawbacks:
- it would bring a certain degree of backwards incompatibility
- "silently returning nothing" is really difficult to debug or even notice unexpected results, as pointed out by @michaelhkay
This proposal is to choose alternative 3. above.
Why is it better than the 2nd one?
- No incompatibility can be introduced, as this is a new operator.
- The user has intentionally chosen this operator over the "?" operator, and this means that the user is well aware of the new, sometimes tricky to observe/explain/debug behavior, but the user doesn't mind these effects and is ready to deal with them.
Definition
By definition the operator "->" with left-hand-side any expression E and right-hand-side a literal string X:
E -> X
is lexically expanded to:
E[. instance of map(*) or . instance of array(*)]?X
Example
With the original expression provided by Jarno Elovirta, but now using the "->" operator:
[
map {"k0": 1},
map{"k0": [1, 2, 3]}
] ->* ->("k0") ->*
its evaluation produces the expected result (all the values within just one of the leaves of the tree), and no error:
1, 2, 3
That is, 1 ->*
produces the empty sequence and no error.
Note:
Of course, the above example can be rewritten to this equivalent XPath 3.0 expression and will get the wanted result, but literally no one, myself included, will ever write this:
[
map {"k0": 1},
map{"k0": [1, 2, 3]}
] [. instance of map(*) or . instance of array(*)] ?*
[. instance of map(*) or . instance of array(*)] ?k0
[. instance of map(*) or . instance of array(*)] ?*
Thus this is all about making it possible/feasible and empowering our users!
Issue #340 created #created-340
fn:format-number: Specifying decimal format
It would be nice if the decimal format for fn:format-number
could also be supplied via an additional argument. The current syntax is:
(: result: 12.345,67 :)
declare decimal-format de decimal-separator = ',' grouping-separator = '.';
format-number(
value := 12345.67,
picture := '#.##0,00',
decimal-format-name := 'de'
)
The syntax could be enhanced as follows:
format-number(
value := 12345.67,
picture := '#.##0,00',
format := map { 'decimal-separator': ',', 'grouping-separator': '.' }
)
If both decimal-format-name
and format
are supplied, an error should be raised.
Edit 2023-05-02, adopted from a comment further below:
Next, language-specific default settings would be sensible. The existing syntax could be used:
format-number(12345.67, '#.##0,00', 'de')
As known from the other functions for formatting numbers and dates, it could be up to the implementation to decide which languages are supported. The defaults could be overwritten by custom decimal-format declarations in the prolog to ensure that a setting is applied, even if an implementation does not support it.
QT4 CG meeting 021 draft minutes #minutes-02-07
Draft minutes published.
Issue #339 created #created-339
The constraints on document-uri are too...constraining
The XPath data model imposes the following constraints on the document-uri property:
If the
document-uri
is not the empty sequence, then the following constraint must hold: the node returned by evaluatingfn:doc()
with the document-uri as its argument must return the document node that provided the value of thedocument-uri
property.In other words, for any Document Node
$arg
, eitherfn:document-uri($arg)
must return the empty sequence orfn:doc(fn:document-uri($arg))
must return$arg
.
This contraint turns out to be inconvenient whenever the larger environment doesn’t enforce a 1:1 mapping between URIs and documents.
For example, in a browser context, a JavaScript function that returns different versions of the same document over time cannot identify those documents with the same document-uri.
In XProc, a p:add-attribute
step that returns a copy of its input document with one additional attribute, cannot identify the output document with the same document-uri as the input document.
Given that the document URI is often necessary to evaluate relative URI references within a document, the constraints imposed in the data model are too strict.
Pull request #338 created #created-338
Add ednote per action QT4CG-016-02
This is a purely editorial change. Unless someone objects over the next few days, I'm just going to merge it in.
QT4 CG meeting 021 draft agenda #agenda-02-07
Draft agenda published.
Issue #337 created #created-337
Local union and enum types: and the definition of generalised atomic types
We need to review the proposed specs for local union and enum types, and decide whether or not to proceed with them.
I note that the definitions of generalized atomic type and pure union type say they must be "schema-defined", which appears to exclude locally-defined union and enum types.
I wonder if the definition of local enum types should be aligned more closely with an XSD type derived from xs:string by restricting with an enum facet. Now that we allow down-casting in the coercion rules, the objections to this seem to disappear.
cast
and castable
should also probably pay more attention to these types.
Pull request #336 created #created-336
Action QT4CG-019-01 (type of $pattern in fn:tokenize())
Also, update the fos:history record for a number of functions.
Issue #308 closed #closed-308
Improve the legends in the diagrams
Issue #335 closed #closed-335
Rework type hierarchy diagrams as styled lists
Pull request #335 created #created-335
Rework type hierarchy diagrams as styled lists
Close #308
This proposal was accepted at meeting 020 on 31 January 2023.
The PR won't format correctly because there are style changes, so I'm just going to merge this. I have fixed the diagrams in both the data model specification and f&o.
Issue #205 closed #closed-205
Make higher-order-function support mandatory
Issue #221 closed #closed-221
Expose op:same-key() as a user-visible function
Issue #324 closed #closed-324
Proposed syntax and semantics for string templates
Issue #326 closed #closed-326
Issue 205: make support for higher-order functions mandatory
Issue #319 closed #closed-319
Issue 221: op:same-key becomes fn:atomic-equal
Issue #334 created #created-334
Transient properties: a new approach to deep selection and update in maps and arrays
After exploring many alternatives, I have come to the conclusion that we can't solve the problem of deep navigation and transformation of JSON structures without a data model change.
Most of the problems boil down to this: JSON trees do not have parent pointers, therefore after navigating down to a leaf node of the tree, we cannot get any information from higher up the tree. The solution to this (the "zipper" model) is to retain transient information about how a particular node in the tree was reached, so that we can retrace our steps and revisit nodes that were passed en route.
The change I propose is quite minor, but powerful: Any XDM value can be augmented with a set of transient properties represented as a set of key-value pairs. These properties are ignored (and typically dropped) by all operations on a value, except where otherwise specified. For the purpose of exposition, I'll use the syntax $value¶name
to refer to the transient name
property of $value
.
We'll change the semantics of map:get()
and array:get()
, and the associated lookup operators, so that the resulting values have transient properties indicating how they were selected. For example, given
let $name := $person?firstName
the resulting value (perhaps the string "Michael") will be augmented with transient properties
- ¶parent - the map from which the value was selected (retaining its own transient properties if any)
- ¶key - the key used to make the selection, here "firstName"
and derived properties:
- ¶ancestors - the transitive closure of ¶parent
- ¶root - the last ¶ancestor
- ¶path - a string representation of the path used to select the value
We can also define other "downward selection" operations such as map:find
, and array:foot
to retain these transient properties. So for example map:find($json, 'firstname')[.='Michael']¶parent?surname
now finds the surnames of anyone named 'Michael', at any depth of the tree.
If we turn back to the use cases in my 2016 paper on transforming JSON
https://www.saxonica.com/papers/xmlprague-2016mhk.pdf
The first use case (bulk update) relied on matching items expressed in XML as
match="map[array[@key='tags']/string='ice']/number[@key='price']/text()"
which couldn't be done in JSON because of the inability to match based on ancestor context. With the new transient properties we can match this as
match="type(xs:integer)[¶key = 'price'][¶parent?tags?* = 'ice']"
In the second use case (hierarchic inversion), we can again get properties of parent or ancestor maps
$students ! map:put("course", ¶parent?name)
I think we can also use this to define deep update operations. But I'll leave that investigation until later.
Note: transient properties potentially have many other applications, for example we might use them to solve our problems with document-uri()
. But exploring that would be a distraction here. The nice thing about transient properties is that they give a lot of potential for augmenting existing functionality with full backwards compatibility, because we can define existing operations to return results with additional transient properties that all existing operations will ignore. If we were so minded, for example, we could have different functions/operators return "quiet NaN" and "signalling NaN" by adding a transient property to the NaN value returned.
Issue #333 created #created-333
Equality of function items
The question of equality of function items arises in the discussion of determinism of functions and memo functions in XSLT - see F&O 3.1 section 1.7.4, and came up again today in the context of fn:deep-equal.
1.7.4 makes a brave attempt to describe situations under which two functions are "identical", though leaving implementations room for flexibility. I think we can build on this and improve it, by describing more situations in which the result is predictable.
The data model describes the properties of a function item, and we can say that two function items are equivalent if all their properties are the same.
The properties that cause problems are the "implementation" and the "closure", and in both cases I think we can find ways of doing a comparison.
For the implementation, we can define this by reference to the way in which the implementation property is set. For function items constructed by reference to static functions (e.g. my:func#3
or function-lookup(my:func, 3)
) then they have the same implementation if and only if they are constructed by reference to the same static function. Similarly for function items constructed by evaluating an inline function expression. Other ways of constructing a function item, such as partial application, essentially create a new function with the same implementation as an existing function and a different closure.
For the closure (ignoring for the moment functions that include parts of the dynamic context in their closure), this is essentially just a set of variable bindings and it's not too difficult to say that functions are identical if these sets of variable bindings are identical.
Issue #332 created #created-332
Add a namespace uris option to fn:path
The output of fn;path
using namespaces is very verbose as it is specified to use the Q{uri}name
syntax. It would be useful if it was extended to take a namespace prefix to uri map.
- Add a second
$namespaces
parameter that has the typemap(union(xs:NCName, enum('')), xs:anyURI)
(the same as fn:in-scope-namespaces) -- this will have a default value ofmap{}
to preserve the existing behaviour. - If the namespace uri is in the map, use the given prefix. If that prefix is "" then just use the local name.
- If the namespace uri is not in the map, use the
Q{uri}name
syntax.
This allows for things like fn:path($e, namespaces := fn:in-scope-namespaces())
.
Issue #331 created #created-331
Extend fn:path to support arrays and maps.
Currently, fn:path
is defined for nodes. This means it is not possible to use it with arrays or maps (e.g. to determine the path to a JSON item when a comparison fails).
As such, I recommend:
- changing the type to
item()
- If the value is a node use the current logic.
- If the value is in an array, use
?n
where n is the nth item of the array where the item is located. - If the value is in a map, use
?name
or?"name"
where name is the key name of the map where the item is located. - If the value is an atomic item, or the root of a map/array structure, use
.
.
Example: .?4?user?name
QT4 CG meeting 020 draft minutes #minutes-01-31
Draft minutes published.
Pull request #330 created #created-330
Update fn:parse-html to apply review feedback.
This PR applies the following review comments:
- [x] QT4CG-016-03: RD to add a note clarifying “known character encoding”
- [x] QT4CG-016-04: RD to add a note clarifying the “”/”” html/version combination
- [x] QT4CG-016-05: RD to add a “todo” noting the dependency on keyword arguments
- [x] QT4CG-016-06: RD to reword the introduction to mapping to clarify who’s doing the mapping
- [ ] QT4CG-016-08: RD to clarify how namespace comparisons are performed.
- [x] QT4CG-016-09: RD to add a note stating that the local name should always be lowercase
- [x] QT4CG-016-10: RD to consider how to clarify parsed entity parsing.
Issue #329 created #created-329
Keyword parameters: Error codes
I’ve read the current specification twice, and I have checked the existing qt4 tests, but I’m still confused by the exact meaning of the new error codes for keyword arguments, XPST0141
and XPST0142
. Things are getting particularly tricky if we consider partial function applications.
My proposal would be to stick with the existing error code XPST0017
for functions that cannot be matched.
Initial suggestion (obsolete):
- use the existing error code
XPST0017
for all cases in which a function cannot be chosen as the available arguments (both positional and keyword-based) don’t match the function definition, and - only raise a new error code (
XPST0141
, possibly) if a keyword argument has been specified more than once (as this can be done without checking the function definitions).
Issue #328 created #created-328
Switch Cases: Lift single-item restriction on operands
Motivation
XQuery switch cases have a peculiar restriction: The operand of a single case must yield an empty sequence or a single item. There seem to be no (obvious) reasons why this restriction exists, so I believe we should lift it and allow arbitrary sequences.
A similar extension is planned for Java 12 (JEP 325: Switch Expressions). The required changes in XQuery are simpler, though, as the 3.1 grammar already supports arbitrary expressions as operands.
Examples
switch($value)
case 1
case 2
case 3
case 4
case 5
return 'small'
default
return 'big'
Proposed syntax:
switch($value)
case 1 to 5
return 'small'
default
return 'big'
Required Changes
The current matching rules could be rephrased as follows:
- The SwitchCaseOperand is evaluated.
- The resulting value is atomized.
- The case matches if the value is empty and if the value of the switch expression is empty as well.
- Otherwise, the atomized value of the switch operand expression is compared with each item of the atomized value of the SwitchCaseOperand using
fn:deep-equal
, with the default collation from the static context.
References
- Original Proposal: https://github.com/expath/xpath-ng/pull/12
- Discussion on Slack: https://xmlcom.slack.com/archives/C011NLXE4DU/p1675006336963479
QT4 CG meeting 020 draft agenda #agenda-01-31
Draft agenda published.
Issue #327 created #created-327
Tokenisation
The rule in A.2
When tokenizing, the longest possible match that is consistent with the EBNF is used.
needs clarifying. It could be read as suggesting that if taking the longest match turns out to lead to a syntax error, the tokenisation should be re-attempted using a shorter match. I don't think that has ever been intended. So what exactly does the qualifier "that is consistent with the EBNF" actually mean?
Possibly related, A.2.2 Terminal Delimitation states:
Terminal symbols that are not used exclusively in [/* ws: explicit */] productions are of two kinds: delimiting and non-delimiting.
But (at least in the XQuery version) the list of delimiting tokens includes a number that are indeed used exclusively in ws:explicit productions, for example a number of tokens containing back-ticks, and ]]>
.
I think we need to be clearer that tokens used in ws:explicit productions are recognised only when parsing the production that uses them. For example given the expression A[B[C]]>3
, we should not recognise ]]>
as a token under the longest-token rule. I think that's probably what the "consistent with the EBNF" rule is intended to convey.
Pull request #326 created #created-326
Issue 205: make support for higher-order functions mandatory
Issue #325 created #created-325
Operator precedence table needs updating
The otherwise
and ->
operators (and maybe others) are missing from the non-normative precedence table in Appendix A.4.
Pull request #324 created #created-324
Proposed syntax and semantics for string templates
See issue #58.
I would recommend reviewing the XQuery version of the spec first, since it contains additional notes contrasting string templates and the existing string constructors. The section on string constructors has moved, but is unchanged except for the addition of this note.
Issue #323 created #created-323
add select attribute to xsl:text
Although xsl:text select="socks" would be the same as xsl:value-of select="socks" in implementation terms, users of XSLT 2 and later, even people who have been using XSLT 2 or 3 for some time, are often surprised to learn that xsl:value-of makes a text node, and that they need to use xsl:sequence to return something else.
So it'd be great to have them use xsl:text instead of xsl:value-of, where text nodes are wanted, because then introducing xsl:sequence is a small step.
Of course, beginners also often use value-of where they should be using apply-templates, e.g. to handle mixed content! But again, using xsl:text reduces that temptation.
We do have value templates now, xsl:text{ .... }</xsl:text>, which mitigates the need slightly, but i think only slightly, because the select= analogy is very compelling.
Issue #322 created #created-322
Map construction in XSLT: xsl:record instruction
Constructing maps in XSLT often involves code rather like this:
<xsl:map>
<xsl:map-entry key="'author'" select="string(AUTHOR)"/>
<xsl:map-entry key="'title'" select="string(TITLE)"/>
<xsl:map-entry key="'price'" select="xs:decimal(PRICE)"/>
<xsl:map-entry key="'publisher'" select="string(../@name)"/>
</xsl:map>
The alternative using XPath is also rather ugly:
<xsl:sequence select="map{'author': string(AUTHOR),
'title':string(TITLE),
'price': xs:decimal(PRICE),
'publisher':string(../@name)}"/>
(the fact that it is creating a map doesn't stand out; the xsl:sequence
is a distraction because there's no sequence involved; and many users dislike long multi-line XPath expressions because of formatting problems in their editing tools)
I propose a new instruction xsl:record which allows:
<xsl:record author="string(AUTHOR)"
title="string(TITLE)"
price="xs:decimal(PRICE)"
publisher="string(../@name)"/>
This is rather like literal result elements in that the attributes are user-defined rather than system-defined. Unlike LREs, the values are general expressions rather than AVTs, because the values are not necessarily strings. The instruction can only be used where the keys (field names) take the form of NCNames.
If variable entries are required, or entries whose keys are not NCNames, they can appear as child instructions:
<xsl:record author="string(AUTHOR)"
title="string(TITLE)"
price="xs:decimal(PRICE)"
publisher="string(../@name)">
<xsl:if test="@private">
<xsl:map-entry name="'private entry'" select="true()"/>
</xsl:if>
</xsl:record>
Follow the tradition of LREs, duplicates are resolved as "last one wins".
If "standard attributes" such as [xsl:]version
are required, they must be in the XSLT namespace, as with LREs.
Issue #321 created #created-321
relax $input in fn:serialize
Relevant specifications: https://qt4cg.org/specifications/xpath-functions-40/Overview-diff.html#func-serialize
Would it be possible to relax the strictures on $input
(first parameter) of fn:serialize()
?
- The specifications do not explicitly forbid
map(*)
orarray(*)
as input, but in practice, when these are supplied, Saxon rejects them. Developers (or at least this one) who work with arrays and maps often need to render them in string output or messages, if only for diagnostics. If there is something really prohibitively wrong with those two items as input tofn:serialize()
, then the specifications should say so. - Attributes are forbidden, but it is unclear why. They get serialized fine in the context of a parent, why not alone?
- Namespace nodes are forbidden; see previous point.
(No doubt there must have been discussion on points 2-3, but the rationale is not clear from the specs.)
Perhaps the question is that the details of what the serialization should look like is contestable. I think the answer there is simply, pick one. I think we'll happily live with whatever is chosen.
For the serialization of maps and arrays, I'll point as one possible model my tan:map-to-xml() and tan:array-to-xml(), which have been indispensable for daily troubleshooting.
Pull request #320 created #created-320
Issue 98 - add options parameter to fn:deep-equal
This proposal adds an options parameter to fn:deep-equal, giving much more detailed control over how the comparison is performed (while remaining backwards compatible by default).
This proposal is a first draft and I would request careful review, it's not one to pass through "on the nod".
Pull request #319 created #created-319
Issue 221: op:same-key becomes fn:atomic-equal
The proposal renames op:same-key as fn:atomic-equal, thus making it directly available to applications.
Issue #294 closed #closed-294
fn:remove removing multiple items
Issue #318 created #created-318
Serialization HTML/XHTML output methods: meta elements and the charset attribute
HTML5 introduces the ability to write
<meta charset="utf-8"/>
in place of
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
The serialization spec (for HTML and XHTML output methods) ignores this.
(a) it requires the serializer to add a meta
element in the second form rather than the first.
(b) when removing existing meta
elements, it requires the second form to be deleted, but not the first. This may result in invalid (X)HTML in which both elements are present.
Issue #309 closed #closed-309
Drop ternary conditionals, as agreed on 2023-01-17
Issue #310 closed #closed-310
Fix outstanding issues from PR 304
Issue #312 closed #closed-312
Minor editorial improvements
Issue #313 closed #closed-313
Issue 294: fn:remove()
Issue #317 created #created-317
fn:format-integer: $lang → $language ?
A minor inconsistency in the XQFO specification: The third parameter of fn:format-integer
is named $lang
…
https://qt4cg.org/specifications/xpath-functions-40/Overview-diff.html#func-format-integer
…whereas all other language parameters are named $language
.
QT4 CG meeting 019 draft minutes #minutes-01-24
Draft minutes published.
Issue #316 created #created-316
Function fn:differences
I didn't see any issues thread devoted fn:differences()
, so am opening this one. Please respond with xrefs to anything relevant.
Draft here: https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-differences
IMO, this function seems overly complicated for both users and implementors. The specs provide difficult reading. But it is the first function to try to address the desideratum for differencing. Something like it is needed methinks.
My suggestion would be to simplify the function as a straightforward string comparison, i.e., change the signature to something like fn:differences($input1 as xs:string, $input2 as xs:string) as OUTPUT
where OUTPUT
is either a tree structure (like the output of fn:analyze-string()
) or a sequence of records (e.g., (is-in-1 as xs:boolean, is-in-2 as xs:boolean, fragment as xs:string)
).
Such a change would make the function more tractable for both users and implementers. The user, would need to cast each sequence to a string, and in so doing will be able to (be compelled to) make fine-grained decisions on things such as normalization. Processor implementers have far simpler input, and they can choose the difference algorithm that makes best sense at the moment.
One counterargument might be that the resultant output would be difficult to correlate to the original sequences. Ostensibly, one wants to do things such as decide whether to drop certain items in sequence 1 or sequence 2. My response is that the current draft results in output that suffers from the same problem. Navigating the map to correlate it to the original sequence sounds daunting. With my suggestion, there are ways around this, through auxiliary functions or arity expansions that normalize the output.
But I don't want to get postprocessing output here, which would be tangential to the main question, i.e., how fn:differences()
should be constructed in a way conducive to both users and implementers.
Issue #311 closed #closed-311
Stylesheet fix to mark optional fields in record definitions
Issue #62 closed #closed-62
[FO] The parameter types for fn:unique and array:partition are incorrectly specified.
Issue #71 closed #closed-71
[XSLT] Use of multiple predicates: order of evaluation
Issue #171 closed #closed-171
XPath ternary conditional operator
Issue #315 created #created-315
fn:transform inconsistency: initial-mode
The fn:transform
specification in F+O says that if no initial-mode is supplied, the unnamed mode is used.
The XSLT 3.0 specification says that if no initial mode is supplied, then the default mode is used if one has been specified, or the unnamed mode is used if not.
I think the XSLT 3.0 spec should win here: it makes more sense if a default has been declared that it should actually be used.
(Thanks to Amanda Galtman for pointing this out.)
Issue #314 created #created-314
Basic Operations on Maps and Arrays
Manipulating Arrays and Maps
This is an outline of proposed new facilities designed to make processing of maps and arrays easier. The basic facilities needed for transformation of maps and arrays are the ability to decompose them into their parts, manipulate the parts, and the compose new arrays and maps from these parts.
Further background is in my 2022 Balisage paper, https://balisage.net/Proceedings/vol27/html/Kay01/BalisageVol27-Kay01.html
This proposal considers only the "shallow" operations on maps and arrays. Further proposals for deep search and update of nested structures are to be expected.
Basics
A map entry is an item used to represent a key-value pair in a map; it is an item of type record(key as xs:anyAtomicType, value as item()*)
, aliased in this proposal as type(map-entry)
.
Note: an alternative representation for key-value pairs is as a singleton map, and that's the representation used by the existing map:entry() function and by the xsl:map-entry instruction. A representation as record(key, value)
is rather more convenient to enable extraction of the key and value, but does create some compatibility issues...
An array entry is an item used to represent a member of an array; it is an item of type record(value as item()*)
, aliased in this proposal as type(array-entry)
.
Decomposing Maps and Arrays
The function map:entries($map)
returns a sequence of map entries, in unpredictable order, representing the contents of the supplied map. It is equivalent to map:for-each($map, ->($k, $v){map:entry($k, $v)})
.
The function array:entries($array)
returns a sequence of array entries, in array order, representing the members of the supplied array. It is equivalent to array:for-each($array, ->($v){array:entry($v)})
.
Constructing Maps and Arrays
The function map:of($entries as type(map-entry)*) as map(*)
constructs a map from a sequence of map entries. A second parameter, $options
, is available to control handling of duplicates, as with map:merge()
. The function is equivalent to map:merge($entries!map{'key':?key, 'value':?value})
.
The function array:of($entries as type(array-entry)*) as array(*)
constructs an array from a sequence of array entries. It is equivalent to array:fold-left($entries, [], array:append#2)
.
The function map:entry($key, $value) as type(map-entry)
is equivalent to map{'key':$key, 'value':$value}
. Problem: we already have a function map:entry() in 3.1 that does something different. Need to change the terminology...
The function array:entry($value) as type(array-entry)
is equivalent to map{'value':$value}
.
Filtering Maps and Arrays
The construct $map?[PREDICATE]
is equivalent to map:of(map:entries($map)[PREDICATE])
. For example, given a map in which the keys are dates, $map?[year-from-date(?key)=2023]
returns a map containing those entries in which the key is a date in 2023.
The construct $array?[PREDICATE]
is equivalent to array:of(array:entries($array)[PREDICATE])
. For example, $array?[1]
selects the first item in the array (as a single-member array), while $array?[exists(?value)]
returns an array containing all those entries in the input array that are not empty. If $array
is an array of maps, then $array?[?value?name='John']
selects those members of the array that are maps having ?name='John'
.
Mapping Maps and Arrays
The construct $map!!EXPR
evaluates EXPR
once for each entry in $map
and returns the result as a flattened sequence. For example map:of($map!!map:entry(?key, ?value+1))
returns a map in which each value has been incremented by one.
The construct $array!!EXPR
evaluates EXPR
once for each entry in $array
, and returns the result as a flattened sequence. For example, array:of($array!!array:entry(?value+1))
returns an array in which every value has been incremented by one.
FLWOR Expressions
The for-member clause for member $m in $array
is equivalent to for $sys:var in array:entries($array) let $m := $sys:var?value
.
The for-entry clause for entry ($k, $v) in $map
is equivalent to for $sys:var in map:entries($map) let $k := $sys:var?key, $v := $sys:var?value
.
XSLT
Iteration over maps and arrays is achieved using <xsl:for-each select="array:entries()">
and <xsl:for-each select="map:entries()">
directly.
Construction of maps uses the existing instructions <xsl:map>
and <xsl:map-entry>
. There is an inconvenience here in that the <xsl:map-entry>
instruction returns a singleton map (map{key:value}
) rather than a map entry as defined in this proposal (map{'key':key, 'value':value}
)..
Construction of arrays uses the new instructions <xsl:array>
and <xsl:array-entry>
. The xsl:array-entry
instruction is defined to construct an array entry as defined in this proposal.
Use Cases
To be supplied.
Pull request #313 created #created-313
Issue 294: fn:remove()
Allow remove() to remove several items, aligning it with array:remove() and map:remove()
Pull request #312 created #created-312
Minor editorial improvements
- Issue 300 (clarification about results being normalized)
- Action QT4CG-018-02 (explaining signature notation)
- Action QT4CG-018-04 (explaining numeric predicates on ancestor unions)
Pull request #311 created #created-311
Stylesheet fix to mark optional fields in record definitions
Pull request #310 created #created-310
Fix outstanding issues from PR 304
See https://github.com/qt4cg/qtsp…ecs/pull/304#issuecomment-1378532583 - but excluding item 3 because that's a stylesheet change.
Pull request #309 created #created-309
Drop ternary conditionals, as agreed on 2023-01-17
We agreed today to drop ternary conditional expressions from the proposal; this PR implements that change.
QT4 CG meeting 018 draft minutes #minutes-01-17
Draft minutes published.
Issue #286 closed #closed-286
Spec changes to allow child::(a|b|c) - Issue 107
Issue #290 closed #closed-290
Fix issue #18 (function type hierarchy)
Issue #35 closed #closed-35
[FO]The `union ( | )`, `itersect`, `except` and `combine (,)` operators are not mentioned in the F & O. Have not the best categorization in the XPath spec.
Issue #288 closed #closed-288
Error in fn:path specification
Issue #257 closed #closed-257
Improving the styling/presentation/prepresentation of the record types in the F&O spec
Issue #70 closed #closed-70
[FO] Built-in function changes to support default values
Issue #291 closed #closed-291
DTD validity of F&O spec
Issue #304 closed #closed-304
Mike's content changes from PR 292
Issue #284 closed #closed-284
Add grammar for "if (test) then {expr}" with no else
Pull request #308 created #created-308
Improve the legends in the diagrams
This PR completes my action QT4CG-015-03: NW to make sure the direction of the arrow is in the legends
I also made sure the legends aren't too wide. I still have more work to do for the other actions.
Issue #259 closed #closed-259
Issue #74 - add the fn:parse-html function
QT4 CG meeting 018 draft agenda #agenda-01-17
Draft agenda published.
Issue #306 closed #closed-306
fn:char - editors actions from 2023-01-10
Issue #307 created #created-307
Parsing and building URIs comments and queries
- fn:build-uri states:
If the scheme key is present in the map, the URI begins with the value of that key concatenated with //, otherwise it begins //.
a. Shouldn't the concatenation be ://
so e.g. http
becomes http://
?
b. How are non-heirarchical schemes handled like urn
, and mailto
?
- RFC 3986 allows IPv6 and IPvFuture addresses that contain
:
characters, e.g.http://[::1]:80
.
My understanding of fn:parse-uri is that this will fail to parse.
- RFC 3986 states that for userinfo, the
user:password
form is deprecated.
Browsers will reject this due to the security risk, and the RFC suggests that applications should not render the password (the part after the :
) in clear text. -- Should fn:build-uri follow suite, or (along with fn:parse-uri) have an option to control the behaviour (keep, remove, invalid), where if the option is invalid, it will throw an fn:error?
- RFC 3986 suggests that the port should be omitted if it matches the default for the scheme
Should fn:build-uri have this behaviour?
Pull request #306 created #created-306
fn:char - editors actions from 2023-01-10
Changes to the new fn:char function (issue #121) as follows:
- Action QT4CG-017-01 clarifies the definition of formats #nnn and #xnnn.
- Action QT4CG-017-02 changes the order of the rules
- In discussion it was asked whether any HTML5 entity names refer to strings comprising more than one character. On investigation it appears that they do, and the spec has been revised to allow for this.
- added history/status information
Issue #121 closed #closed-121
[FO] fn:nl, fn:tab, fn:cr
QT4 CG meeting 017 draft minutes #minutes-01-10
Draft minutes published.
Issue #261 closed #closed-261
Proposed fn:char function - see issue 121
Issue #305 created #created-305
parse-xml() and whitespace stripping
There seems to be nothing in either the XSLT spec or in F+O that says explicitly whether stylesheet-defined space stripping rules (xsl:strip-space and xsl:preserve-space) apply to documents loaded using fn:parse-xml
(or, by extension, parse-html
).
The spec says that these rules apply to "source trees" defined as "any tree provided as input to the transformation. This includes the document containing the [global context item] if any, documents containing nodes present in the [initial match selection], documents containing nodes supplied as the values of [stylesheet parameters], documents obtained from the results of functions such as [document], [doc], and [collection]...".
I guess one reasonable interpretation is that the "such as" includes parse-xml()
. But it goes rather against the grain that the behaviour of parse-xml() should be affected by the containing stylesheet declarations, when there is no mention of such a context-dependency in the function specification; in this, parse-xml() is rather different from doc() which deliberately says very little about how the XDM instance returned relates to the URI supplied as input.
Issue #292 closed #closed-292
Merge signatures with optional params
Pull request #304 created #created-304
Mike's content changes from PR 292
I teased apart some of the omnibus PR #292. I've commited the schema and stylesheet changes. This PR covers the remaining prose changes.
Mike writes:
I regret that this has turned into a bit of an omnibus PR. The main changes are:
- Fix validity issues with the function catalog and its schema (Issue 291)
- Convert all functions to use a single signature with optional parameters (Issue 70)
- Extend the function catalog to handle record definitions (Issue 257)
- Fix the (trivial) bug with properties of fn:path (Issue 288)
- Add introductory text concerning the handling of operators (Issue 35)
Fix #291 Fix #70 Fix #257 Fix #288 Fix #35
Issue #303 closed #closed-303
Mike's proposed schema and stylesheet changes
Pull request #303 created #created-303
Mike's proposed schema and stylesheet changes
These are the schema and stylesheet changes from PR #292. They don't break the build and on casual inspection they seem fine, so I'm just going to accept them.
QT4 CG meeting 017 draft agenda #agenda-01-10
Draft agenda published.
Issue #300 created #created-300
[F+O] Ambiguity regarding Unicode normalization (editorial)
In §1.7.1 the paragraph
Unless explicitly stated, the xs:string values returned by the functions in this document are not normalized in the sense of [Character Model for the World Wide Web 1.0: Fundamentals].
is a little bit ambiguous for my taste. By "are not normalized" it means "no action is taken to normalize the strings", it doesn't mean "the strings will not be in normalized form".
I suggest: "Unless explicitly stated, the functions in this document operate on strings as sequences of codepoints and do not attempt to convert input strings, or produce output strings, in Unicode normalized form. Unicode normalization occurs only when explicitly requested, for example by use of the fn:normalize-unicode
function."
At the same time we might update the reference to point to "Character Model for the World Wide Web: String Matching", revised in 2021, though it is still only a Working Group Note. See https://www.w3.org/TR/charmod-norm/#unicodeNormalization
Issue #281 closed #closed-281
XPath: Short-circuiting Functions and Lazy Evaluation Hints
Issue #299 created #created-299
Short-circuiting functions, function-arity guards and lazy hints
I. Shortcutting and lazy hints
Let us have this expression:
let $f := function($arg1 as item()*, $arg2 as item()*) as function(item()*) as item()*
{ (: Some code here :) }
return
$f($x) ($y)
Evaluating $f($x)
produces a function. The actual arity of this resulting function can be any number N >= 0
:
-
If
N > 1
there would be arity mismatch error, as only one argument$y
is provided in the expression. -
If
N = 1
the final function call can be evaluated, and the argument$y
must be evaluated, or -
If
N = 0
, then$y
is unneeded and can safely be ignored according to the updated “Coercion Rules / Function Coercion” in Xpath 4.0.
Because a possibility exists to be able to ignore the evaluation of $y
, it is logical to delay the evaluation of $y
until the actual arity of $f($x)
is known.
The current XPath 4.0 evaluation rules do not require an implementation to base its decision whether or not to evaluate $y
on the actual arity of the function produced by $f($x)
, thus at present an implementation could decide to evaluate $y
regardless of the actual arity of the function produced by $f($x)
.
This is where a lazy hint comes: it indicates to the XPath processor that it is logical to make the decision about evaluation of $y
based on the actual arity of the function returned by $f($x)
.
A rewrite of the above expression using a lazy hint looks like this:
let $f := function($arg1 as item()*, $arg2 as item()*) as function(item()*) as item()*
{ (: Some code here :) }
return
$f($x) (lazy $y)
Here is one example of a function with short-cutting and calling it with a lazy hint:
let $fAnd := function($x as xs:boolean, $y as xs:boolean) as xs:boolean
{
let $partial := function($x as xs:boolean) as function(xs:boolean) as xs:boolean
{
if(not($x)) then ->(){false()}
else ->($t) {$t}
}
return $partial($x)($y)
}
return
$fAnd($x (: possibly false() :), lazy $SomeVeryComplexAndSlowComputedExpression)
Without the lazy hint in the above example, it is perfectly possible that an XPath implementation, unrestricted by the current rules, would evaluate $SomeVeryComplexAndSlowComputedExpression
- something that is unneeded and could be avoided completely.
Formal syntax and semantics
-
The lazy keyword should immediately precede any argument in a function call. If specified, it means that it is logical to make the decision about evaluation of this argument based on the actual arity of the function in this function call.
Based on this definition, it follows that
lazy $argK
implies lazy for all arguments following$argK
in the function call. Thus specifying more than one lazy hint within a given function call is redundant and an implementation may report this redundancy to the user.The scope of a lazy keyword specified on an argument is this and all following arguments of (only) the current function call.
-
It is possible to specify a lazy keyword that is in force for the respective argument(s) of all function calls of the given function. To do this, the lazy keyword must be specified immediately preceding a parameter name in the function definition of that function.
For example, if the function
$f
is specified as:let $f := function($arg1 as item()*, lazy $arg2 as item()*, $arg3 as item()*, $arg4 as item()* ) { (: some code here:) } return $someExpression
Then any call of
$f
in its definition scope that has the form:$f($x, $y, $z, $t)
is equivalent to:
$f($x, lazy $y, $z, $t)
-
It is possible to specify the lazy keyword immediately preceding a function definition. This instructs the XPath processor that any call of this function is only necessary to be evaluated if the function is actually called during the evaluation of the expression that contains this function call.
For example:
let $complexComputation := lazy function($x, $y) {$x + $y}, (: Make it as complex as you want ... :) $someCondition := function() { let $date := current-date() return month-from-date($date) eq 2 and day-from-date($date) eq 29 } return if($someCondition()) then $complexComputation(2, 3) else 0
Specifying the lazy keyword in the function definition for
$complexComputation
can save significant computing resources, because the programmer knows that$someCondition()
is true during only a single day in any 4-years period.
II.fn:lazy
Summary
Applied on a single argument that can be any expression. Lazily returns its argument expression.
Signature
lazy fn:lazy(
$expression as item()*
) as item()*
Properties
This function is deterministic, context-independent, focus-independent
Rules
The semantics of the function is strictly defined below:
let $lazyFunction := lazy fn:identity#1
return
(: AnyExpression here :)
Any expression Q
of the form:
Q(E1, lazy(E2))
where E1
and E2
are subexpressions of Q
, must be evaluated by the Processor in two steps:
-
Substitute the expression
Q(E1, lazy(E2))
with:
Q(E1, ?) (lazy E2)
-
Evaluate the latter according to the rules for a lazy argument
Example
We can use almost the same example as above, but here $complexComputation
is defined without the lazy keyword and thus is not a lazy function. To have $complexComputation
evaluated lazily, we call the lazy()
function, passing $complexComputation
to it:
let $complexComputation := (: no lazy here :) function($x, $y) {$x + $y}, (: Make it as complex as you want ... :)
$someCondition := function()
{
let $date := current-date()
return
month-from-date($date) eq 2
and
day-from-date($date) eq 29
}
return
$someCondition() and lazy( $complexComputation(2, 3))
Here the expression Q
is:
$someCondition() and lazy( $complexComputation(2, 3))
This is the same as:
fn:op("and")($someCondition(), lazy( $complexComputation(2, 3))
According to the Rules above, the processor converts this to:
fn:op("and")($someCondition(), ?) (lazy( $complexComputation(2, 3)) )
$someCondition()
is evaluated and if its value is false()
, then the expression to be evaluated is:
fn:op("and")(false(), ?) (lazy( $complexComputation(2, 3)) )
As fn:op("and")(false(), ?)
by definition is function() {false()}
. then the final result false()
is produced and the unnecessary argument $complexComputation(2, 3)
is not evaluated at all.
III. A function's arity is a guard for its arguments
Let us have a function $f
defined as below:
let $f := function($arg1 as item()*, $arg2 as item()*, …, $argN as item()*)
as function(item()*, item()*, …, item()*) as item()*
{
if($cond0($arg1)) then -> () { 123 }
else if($cond1($arg1)) then -> ($Z1 as item()*) {$Z1}
else if($cond2($arg1)) then -> ($Z1 as item()*, $Z2 as item()*) {$Z1 + $Z2}
(: . . . . . . . . :)
else if($condK($arg1)) then -> ($Z1 as item()*, $Z2 as item()*, …, $Zk as item()*)
{$Z1 + $Z2 + … + $Zk}
else ()
}
return
$f($y1, $y2, …, $yN) ($z1, $z2, …, $zk)
A call to $f
returns a function whose arity may be any of the numbers: 0, 1, …, K.
Depending on the arity of the returned function (0, 1, …, K), the last (K, K-1, K-2, …, 2, 1, 0) arguments of the function call:
$f($y1, $y2, . . . , $yN) ($z1, $z2, . . . , $zk)
are unneeded and it is logical that they would not need to be evaluated.
So, the actual arity of the result of calling $f
is a guard for the arguments of a call to this function-result.
Thus, one more bullet needs to be added to [2.4.5 Guarded Expressions] https://qt4cg.org/specifications/xquery-40/xpath-40.html#id-guarded-expressions), specifying an additional guard-type:
- In an expression of the type
E(A1, A2, ..., AN)
any of the argumentsA
K is guarded by the conditionactual-arity(E) ge K
. This rule has the consequence that if the actual arity ofE()
is less thanK
then if any argumentAm
(wherem >= K
) is evaluated, this must not raise a dynamic error. An implementation may base on the actual arity ofE()
its decision for the evaluation of the arguments.
Issue #298 created #created-298
Abstract supertype for map and array
I've been wondering whether there would be any mileage in introducing an abstract super type for map() and array(), perhaps called lookup(). This would basically treat an array as a map with integer keys.
This would allow a cleaner type signature for map:find() and any future functions such as xx:search() that work both on maps and arrays. It might simplify the description of the lookup operator "?". For functions that already exist in both the map and array namespaces, such as get(), we could introduce a unified function in the fn namespace with the cosmetic benefit of reducing the need for namespace prefixes and namespace declarations.
I'm still keen to find a better way of doing iteration, filtering, mapping, and construction of maps and arrays, and I think this might be a useful stepping stone.