@qt4cg statuses in 2021
This page displays status updates about the QT4 CG project from 2021.
See also recent statuses.
Issue #105 created #created-105
Maps with Infinite Number of Keys: Total Maps and Decorated maps
Maps with Infinite Number of Keys: Total Maps and Decorated maps
1. Total Maps
Maps have become one of the most useful tools for creating readable, short and efficient XPath code. However, a significant limitation of this datatype is that a map
can have only a finite number of keys. In many cases we might want to implement a map that can have more than a fixed, finite number of arguments.
Here is a typical example (Example 1):
A hotel charges per night differently, depending on how long the customer has been staying. For the first night the price is $100, for the second $90, for the third $80 and for every night after the third $75. We can immediately try to express this pricing data as a map, like this:
map {
1 : 100,
2 : 90,
3 : 80
(: ??? How to express the price for all eventual next nights? :)
}
We could, if we had a special key, something like "TheRest", which means any other key-value, which is not one of the already specified key-values.
Here comes the first part of this proposal:
- We introduce a special key value, which, when specified in a map means: any possible key, different from the other keys, specified for the map. For this purpose we use the string:
"\"
Adding such a "discard symbol" makes the map a total function on the set of any possible XPath atomic items.
Now we can easily express the hotel-price data as a map:
map {
1 : 100,
2 : 90,
3 : 80
'\' : 75
}
Another useful Example (2) is that now we can express any XPath item, or sequence of items as a map. Let's do this for a simple constant, like π:
let $π := map {
'\' : math:pi()
}
return $π?* (: produces 3.141592653589793 :)
the map above is empty (has no regular keys) and specifies that for any other key-value $k
it holds that $π($k) eq math:pi()
Going further, we can express even the empty sequence (Example 3) as the following map:
let $Φ := map {
'\' : ()
}
return $Φ?* (: produces the empty sequence :)
Using this representation of the empty sequence, we can provide a solution for the "Forgiveness problem" raised by Jarno Jelovirta in the XML.Com #general
channel in March 2021:
This expression will raise an error:
[map {"k0": 1}, map{"k0": [1, 2, 3]}]?*?("k0")?*
[XPTY0004] Input of lookup operator must be map or array: 1.
To prevent ("forgive", thus "Forgiveness Problem") the raising of such errors we could accept the rule that in XPath 4.0 any expression that evaluates to something different than a map or an array, could be coerced to the following map, which returns the empty sequence as the corresponding value for any key requested in a lookup:
map {
'\' : ()
} (: produces the empty sequence for any lookup:)
To summarize, what we have achieved so far:
- The map constructed in Example 1 is now a total function over the domain ℕ of all natural numbers. Any map with a
"\"
(discard key) is a total function over the value-space of allxs:anyAtomicType
values - We can represent any XPath 4.0 item or sequence in an easy and intuitive way as a map.
- It is now straight-forward to solve the "Forgiveness Problem" by introducing the natural and intuitive rule for coercing any non-map value to the empty map, and this allows to use anywhere the lookup operator
?
without raising an error.
2. Decorated Maps
Although we already achieved a lot in the first part, there are still use-cases for which we don't have an adequate map solution:
-
In the example (1) of expressing the hotel prices, we probably shouldn't get
$75
for a key such as -1 or even"blah-blah-blah"
But the XPath 4.0 language specification allows any atomic values to be possible keys and thus to be the argument to themap:get()
function. If we want validation for the actually-allowed key-values for a specific given map, we need to have additional processing/functionality. -
With a discard symbol we can express only one infinite set of possible keys and group them under the same corresponding value. However, there are problems, the data for which needs several infinite sets of key-values to be projected onto different values. Here is one such problem:
Imagine we are the organizers of a very simple lottery, selling many millions of tickets, identified by their number, which is a unique natural number.
We want to grant prizes with this simple strategy.
- Any ticket number multiple of 10 wins $10.
- Any ticket number multiple of 100 wins $20
- Any ticket number multiple of 1000 wins $100
- Any ticket number multiple of 5000 wins $1000
- Any ticket number which is a prime number wins $25000
- Any other ticket number doesn't win a prize (wins $0)
None of the sets of key-values for each of the 6 categories above can be conveniently expressed with the map
that we have so far, although we have merely 6 different cases!
How can we solve this kind of problem still using maps?
Decorators to the rescue...
What is decorator, what is the decorator pattern and when it is good to use one? According to Wikipedia:
What solution does it describe? Define Decorator objects that
- implement the interface of the extended (decorated) object (Component) transparently by forwarding all requests to it
- perform additional functionality before/after forwarding a request.
This allows working with different Decorator objects to extend the functionality of an object dynamically at run-time.
The idea is to couple a map with a function (the decorator) which can perform any needed preprocessing, such as validation or projection of a supplied value onto one of a predefined small set of values (that are the actual keys of the map). For simplicity, we are not discussing post-processing here, though this can also be part of a decorator, if needed.
Let us see how a decorated-map solution to the lottery problem looks like:
let $prize-table := map {
"ten" : 10,
"hundred" : 20,
"thousand" : 100,
"five-thousand" : 1000,
"prime" : 25000,
"\" : 0
},
$isPrime := function($input as xs:integer) as xs:boolean
{
exists(index-of((2, 3, 5, 7, 11, 13, 17, 19, 23), $input)) (: simplified primality checker :)
},
$decorated-map := function($base-map as map(*), $input as xs:anyAtomicType) as item()*
{
let $raw-result :=
(
let $key :=
if(not($input castable as xs:positiveInteger)) then '\' (: we can call the error() function here :)
else if($input mod 5000 eq 0) then 'five-thousand'
else if($input mod 1000 eq 0) then 'thousand'
else if($input mod 100 eq 0) then 'hundred'
else if($input mod 10 eq 0) then 'ten'
else if($isPrime($input)) then 'prime'
else "\"
return $base-map($key)
),
$post-process := function($x) {$x}, (: using identity here for simplicity :)
$post-processed := $post-process($raw-result)
return $post-processed
},
$prizeForTicket := $decorated-map($prize-table, ?), (: Note: this is exactly the lookup operator ? :)
$ticketNumbers := (1, 10, 100, 1000, 5000, 19, -3, "blah-blah-blah")
return $ticketNumbers ! $prizeForTicket(.) (: produces 0, 10, 20, 100, 1000, 25000, 0, 0 :)
Conclusion
-
In the 2nd part of this proposal, a new type/function -- the
decorated-map
was described. -
We defined the signature of a
decorated-map
and gave an example how to construct and use one in solving a specific problem. In particular, the proposal is to have a standard function:decorated-map ($base-map as map(*), $input as xs:anyAtomicType) as item()*
-
Finally, we showed that the lookup operator
?
on a decorated map $dm is identical to and should be defined as :$dm($base-map, ?)
What remains to be done?
The topic of decorators is extremely important, as a decorator may and should be possible to be defined on any function, not just on maps. This would be addressed in one or more new proposals. Stay tuned 😊
Issue #104 created #created-104
name of map:replace/array:replace
The name of map:replace/array:replace is easily confused with fn:replace. One might think that map:replace applies a regular expression to all the keys of a map (which might be a quite useful replacement for JSON object).
It might also be confused with the replace function of Java's hashmap, which only inserts a new value if the key already exists in the map.
One could name it map:put-with-function or map:put-f or map:putf or map:puf for short
Or something else: map:change, map:modify, map:alter
Issue #103 created #created-103
fn:all, fn:some
a) the text says the function returns boolean, but the signature says integer*
b) the text considers a case where the second argument is omitted, but there is no one argument function signature
c) fn:some
is a wrapper around XQuery's some
expression, and fn:all
is a wrapper around XQuery's every
expression. Is this not confusing and people would expect it to be a wrapper around some kind of all
expression? Or be called fn:every
?
d) I think it is pointless to have such functions when there are already the some/every
XQuery expressions
Issue #102 created #created-102
[xslt30] Meaning of the term "lexical space"
The XSLT 3.0 specification uses the term "lexical space" rather freely, without definition.
In XSD, the "lexical space" for a data type is the set of lexical representations AFTER any whitespace removal. For example, the lexical space for xs:integer does not allow leading or trailing whitespace. Such whitespace is valid in an instance document, but it is stripped prior to validation by a "pre-lexical" transformation.
In XSLT, as far as I can see, the intended reading of a phrase such as "a string in the lexical space of xs:integer" is "a string that is castable to xs:integer", which includes strings with leading and trailing whitespace. In some cases the text can only be read this way.
The F&O spec gets this right. Section 19.2, relating to casting from xs:string, says "The supplied string is mapped to a typed value of the target type as defined in [XML Schema Part 2: Datatypes Second Edition]. Whitespace normalization is applied as indicated by the whiteSpace facet for the datatype. The resulting whitespace-normalized string must be a valid lexical form for the datatype. The semantics of casting follow the rules of XML Schema validation."
Issue #101 created #created-101
fn:serialize line breaks
Normally fn:serialize uses LF for line breaks
But on Windows you want to have CR LF
There could be an option for that
Issue #100 created #created-100
[FO] Typo in §17.5.3
implementation-dependant => implementation-dependent
Issue #99 created #created-99
Functions that determine equality of two sequences or equality of two arrays
The only standard XPath 3.1 function that compares two arrays or two sequences for equality is the deep-equal() function. It implements "value-based equality" which may not always be the equality one needs to check for. For example, the standard XPath 3.1 operator is implements a check for "identity-based equality" on nodes.
Thus for two nodes $n1
and $n2
it is possible that:
deep-equal($n1, $n2) ne ($n1 is $n2)
The functions defined below can be used to verify a more generic kind of equality between two sequences or between two arrays. These functions accept as a parameter a user-provided function $compare()
, which is used to decide whether or not two corresponding items of the two sequences, or two constituents of the two arrays are "equal".
fn:sequence-equal($seq1 as item()*, $seq2 as item()*,
$compare as function(item(), item()) as xs:boolean := deep-equal#2) as xs:boolean
fn:array-equal($ar1 as array(*), $ar2 as array(*),
$compare as function(item()*, item()*) as xs:boolean := deep-equal#2) as xs:boolean
Examples:
fn:sequence-equal((1, 2, 3), (1, 2, 3)) (: returns true() :)
fn:sequence-equal((1, 2, 3), (1, 2, 5)) (: returns false() :)
fn:sequence-equal((1), (1, 2)) (: returns false() :)
fn:sequence-equal((), ()) (: returns true() :)
let $compare := function($ arg1 as xs:integer, $arg2 as xs:integer) {$arg1 mod 2 eq $arg2 mod 2}
return fn:sequence-equal((1, 2, 3), (5, 6, 7), $compare) (: returns true() :)
let $compare := function($ arg1 as xs:integer, $arg2 as xs:integer) {$arg1 mod 2 eq $arg2 mod 2}
return fn:sequence-equal((1, 2, 3), (5, 6, 8), $compare) (: returns false() :)
fn:array-equal([1, 2, 3], [1, 2, 3]) (: returns true() :)
fn:array-equal([1, 2, 3], [1, 2, 5]) (: returns false() :)
fn:array-equal([1], [1, 2]) (: returns false() :)
fn:array-equal([], []) (: returns true() :)
fn:array-equal([], [()]) (: returns false() :)
Possible implementations:
- Here is a pure XPath implementation of
fn:sequence-equal
:
let $compare := function($it1 as item(), $it2 as item()) as xs:boolean
{deep-equal($it1, $it2)},
$sequence-equal := function($seq1 as item()*, $seq2 as item()*,
$compare as function(item(), item()) as xs:boolean,
$self as function(*)) as xs:boolean
{
let $size1 := count($seq1), $size2 := count($seq2)
return
if($size1 ne $size2) then false()
else
$size1 eq 0
or
$compare(head($seq1), head($seq2)) and $self(tail($seq1), tail($seq2), $compare, $self)
}
return
$sequence-equal((1, 2, 3), (1, 2, 3), $compare, $sequence-equal)
- Below is a pure XPath implementation of
fn:array-equal
:
let $compare := function($val1 as item()*, $val2 as item()*) as xs:boolean
{deep-equal($val1, $val2)},
$array-equal := function($ar1 as array(*), $ar2 as array(*),
$compare as function(item()*, item()*) as xs:boolean,
$self as function(*)) as xs:boolean
{
let $size1 := array:size($ar1), $size2 := array:size($ar2)
return
if($size1 ne $size2) then false()
else
$size1 eq 0
or
$compare(array:head($ar1), array:head($ar2)) and $self(array:tail($ar1), array:tail($ar2), $compare, $self)
}
return
$array-equal([], [()], $compare, $array-equal)
Issue #98 created #created-98
Support ignoring whitespace/indentation differences in fn:deep-equal.
Signatures
fn:deep-equal( $input1 as item()*,
$input2 as item()*,
$collation as xs:string? := (),
$boundary-space as enum("preserve", "strip") := "preserve") as xs:boolean
Notes
If $boundary-space
is "preserve", then any whitespace differences are checked and would result in the function returning false()
.
If $boundary-space
is "strip", then any whitespace differences are ignored and would result in the function returning true()
if the inputs are otherwise identical.
Use Case
Comparing two XML fragments in a unit test assertion where you don't care about indentation differences.
Issue #97 created #created-97
[XPath] Functions symmetric to `head()` and `tail()` for sequences and arrays
In Xpath 3.1 we already have head()
, tail()
, and last()
But there is no function that produces the subsequence of all items of a sequence except the last one. There exists such a function in other programming languages. For example, in Haskell this is the init function.
And the last()
function isn't the symmetric opposite of head()
-- it doesn't give us the last item in a sequence, just its position. So we need another function: fn:heel()
for this.
fn:init($sequence as item()*) as item()*
fn:heel($sequence as item()*) as item()?
init($seq)
is a convenient shorthand for subsequence($seq, 1, count($seq) -1)
heel($seq)
is a convenient shorthand for slice($seq, -1)
Examples:
fn:init(('a', 'b', 'c'))
returns 'a', 'b'
fn:init(('a', 'b'))
returns 'a'
fn:init('a')
returns ()
fn:init(())
returns ()
fn:heel('a', 'b', 'c')
returns 'c'
('a', 'b', 'c') => init() => heel()
returns 'b'
It makes sense to have fn:init()
and fn:heel()
defined on arrays, too.
array:init($array as array(*)) as array(*)
array:heel($array as array(*)) as item()*
Examples:
array:init([1, 2, 3, 4, 5])
returns [1, 2, 3, 4]
array:init([1])
returns []
array:heel([1, 2, 3, (4, 5)])
returns (4, 5)
array:heel([()])
returns ()
(the empty sequence)
array:init([])
produces error
array:heel([])
produces error
[1, 2, 3, (4, 5)] =>array:heel() => heel()
returns 5
I would challenge anyone to re-write the last example in understandable way using fn:slice()
💯
Issue #96 created #created-96
[XPath] Functions that determine if a given sequence starts with another sequence or ends with another sequence
It is surprising that we are at version 4 and still are missing:
(1) fn:starts-with-sequence($container as item()*, $maybe-start as item()*,
$compare as function(item(), item()) as xs:boolean := deep-equal)
) as xs:boolean
and
(2) fn:ends-with-sequence($container as item()*, $maybe-end as item()*,
$compare as function(item(), item()) as xs:boolean := deep-equal)
) as xs:boolean
(2) above is a shorthand for:
fn:starts-with-sequence(reverse($container), reverse($maybe-end))
Examples:
fn:starts-with-sequence(('a', 'b', 'c', 'd'), ('a', 'b'))
returns true()
fn:starts-with-sequence(('a', 'b', 'c', 'd'), ('a', 'c'))
returns false()
fn:ends-with-sequence(('a', 'b', 'c', 'd'), ('c', 'd'))
returns true()
fn:ends-with-sequence(('a', 'b', 'c', 'd'), ('b', 'd'))
returns false()
('a', 'b', 'c', 'd') => starts-with-sequence(('a', 'b'))
returns true()
('a', 'b', 'c', 'd') => starts-with-sequence(('a', 'c'))
returns false()
('a', 'b', 'c', 'd') => ends-with-sequence(('c', 'd'))
returns true()
('a', 'b', 'c', 'd') => ends-with-sequence(('b', 'd'))
returns false()
One possible implementation:
let $starts-with-sequence := function($seq1 as item()*, $seq2 as item()*, $self as function(*))
{
empty($seq2)
or
head($seq1) eq head($seq2) and $self(subsequence($seq1, 2), subsequence($seq2, 2), $self)
}
return
$starts-with-sequence(('a', 'b', 'c', 'd'), ('a', 'b', 'c'), $starts-with-sequence)
Issue #95 closed #closed-95
[XPath] URI validation function
Issue #95 created #created-95
[XPath] URI validation function
Apparently there is no function to validate URI syntax against the relevant RFC. I expected that casting to xs:anyURI
validates but that seems not to be the case:
Because it is impractical for processors to check that a value is a context-appropriate URI reference, this specification follows the lead of [RFC 2396] (as amended by [RFC 2732]) in this matter: such rules and restrictions are not part of type validity and are not checked by ·minimally conforming· processors. Thus in practice the above definition imposes only very modest obligations on ·minimally conforming· processors.
https://www.w3.org/TR/xmlschema-2/#anyURI
Valid URIs are quite critical in strict RDF output formats such as RDF/XML.
Issue #94 created #created-94
Functions that determine if a given sequence is a subsequence of another sequence
It is surprising that we are at version 4 and still are missing:
(1) fn:has-subsequence($container as item()*, $maybe-subsequence as item()*,
$compare as function(item(), item()) as xs:boolean := deep-equal)
) as xs:boolean
and
(2) fn:has-subsequence($container as item()*, $maybe-subsequence as item()*, $contiguous-subsequence := true(),
$compare as function(item(), item()) as xs:boolean := deep-equal)
) as xs:boolean
and
(3) fn:has-non-contigous-subsequence($container as item()*, $maybe-subsequence as item()*,
$compare as function(item(), item()) as xs:boolean := deep-equal)
) as xs:boolean
(3) above is a shorthand for:
fn:has-subsequence(?, ?, false())
Examples:
fn:has-subsequence(('a', 'b', 'c', 'd'), ('b', 'c'))
returns true()
fn:has-subsequence(('a', 'b', 'c', 'd'), ('b', 'd'))
returns false()
fn:has-non-contigous-subsequence(('a', 'b', 'c', 'd'), ('b', 'd'))
returns true()
fn:has-non-contigous-subsequence(('a', 'b', 'c', 'd'), ('d', 'b'))
returns false()
('a', 'b', 'c', 'd') => has-subsequence(('b', 'c'))
returns true()
('a', 'b', 'c', 'd') => has-subsequence(('b', 'd'))
returns false()
('a', 'b', 'c', 'd') => has-non-contigous-subsequence(('b', 'd'))
returns true()
('a', 'b', 'c', 'd') => has-non-contigous-subsequence(('d, 'b'))
returns false()
Issue #93 created #created-93
Support order by ascending/descending from a string value.
Use Case
It is a common pattern to have a sort key and direction on requests that support listing or searching an object (e.g. authors).
Currently, in order to switch between ascending/descending in a FLWOR expression two separate expressions need to be written. For example:
if ($sort-order eq "ascending") then
for $name in $authors order by $name ascending return $name
else
for $name in $authors order by $name descending return $name
It would be cleaner if this could be rewritten as:
for $name in $authors
order by $name in $sort-order order
return $name
Syntax
OrderModifier ::= ("ascending" | "descending" | OrderDirection)?
("empty" ("greatest" | "least"))?
("collation" URILiteral)?
OrderDirection ::= "in" ExprSingle "order"
Semantics
If OrderDirection is used, the expression is evaluated.
- If the expression is not a single atomic value, an XQST#### error is raised.
- If the expression evaluates to the string "ascending", this is the same as using the
ascending
keyword. - If the expression evaluates to the string "descending", this is the same as using the
descending
keyword. - Otherwise, an XQST#### error is raised.
Issue #92 created #created-92
Simplify rule for attribute values on Extension Instructions used to invoke named templates
Regarding the rule in the current proposal for Invoking Named Templates with Extension Instructions:
The way in which attribute values are handled depends on the type declaration of the template parameter...
I have some problems with this dependency on parameter type (to control whether value is an AVT or XPath expression):
- In many cases, a
xs:string
orxs:boolean
type passed as a param will be a variable reference so a coder needs to entername="{$myName}"
instead ofname="$myName"
in their XSLT editor. - If passing a literal
xs:string
type, the syntax:name="first"
would be easy for a human reader to misinterpret as aNameTest
instead of aStringLiteral
. - The dependency on param type means more effort (and thus poorer performance) for a tokenizer or syntax-highlighter as it may need to get type information from included/imported XSLT stylesheet modules or from extension elements declared later in the same XSLT module.
The third point above is most important from my viewpoint as maintainer of an XSLT editor, but I believe the first two points are also valid.
For these reasons, I propose that: all attribute-values on extension instructions used to invoke named templates are treated as XPath expressions.
Issue #91 created #created-91
name of map:substitute
map:substitute is a weird name for the function. It sounds as it would change just one value with a new value like map:put
Actually it is mapping all values. map:map
or map:map-values
would be more fitting
Or map:for-each
would have been logical. Unfortunately it is already taken. fn:for-each
takes a sequence and returns a sequence, array:for-each
takes an array and returns an array. map:for-each
takes a map and returns a ~map~ sequence. makes no sense. Anyways, map:for-each-value
would also be a good name
Other languages have other names. It could also be called map:transform
like C++, or map:apply
like pari/gp
Issue #90 created #created-90
Simplified simplified stylesheets
A couple of suggestions for making "simplified stylesheets" more useful:
(a) Allow the xsl:version (and therefore the XSLT namespace declaration) to be omitted; the default is supplied by the processor. So this becomes a valid stylesheet:
<out id="{/*/@id}">
<x>{/thing/foo[1]/x}</x>
<y>{/thing/foo[2]/x}</y>
</out>
(b) Allow "single-template" stylesheets as an intermediate form between simplified stylesheets and full stylesheets:
<xsl:xslt xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:param name="id"/>
<out id="{$id}">
<x>{/thing/foo[1]/x}</x>
<y>{/thing/foo[2]/x}</y>
</out>
</xsl:xslt>
The last child element of xsl:xslt, if not in the XSLT namespace, is implicitly wrapped in <xsl:template match="/">
Issue #89 created #created-89
[XQuery] DirPIConstructor permits ':' in the PI name.
Overview
The PITarget
symbol allows a colon in the grammar, but the rest of the spec and XQuery implementations (tested on BaseX, Saxon, and MarkLogic) disalow :
in DirPIConstructor
productions.
Details
The DirPIConstructor
construct is defined as:
[151] DirPIConstructor ::= "<?" PITarget (S DirPIContents)? "?>" | /* ws: explicit */
[232] PITarget ::= [http://www.w3.org/TR/REC-xml#NT-PITarget]XML /* xgc: xml-version */
with XML defining PITarget
as:
[17] PITarget ::= Name - (('X' \| 'x') ('M' \| 'm') ('L' \| 'l'))
The "excluding 'xml' in any case insensitive form" part is covered by the 3.9.2 Other Direct Constructors section.
While the XML specification states:
The Namespaces in XML Recommendation [XML Names] assigns a meaning to names containing colon characters. Therefore, authors should not use the colon in XML names except for namespace purposes, but XML processors must accept the colon as a name character.
various XQuery processors disallow a colon here in line with the rest of the XQuery specification.
Proposal
Change PITarget
to:
[232] PITarget ::= NCName
to reflect actual usage and align it with the rest of the XQuery specification.
Issue #88 created #created-88
[XPATH] breaking ancestor or descendant axes
A common issue that I often have to deal with in Xpath (within xslt most of the time) is to be able to break descendants or ancestors axes. To do that I have to use predicates which are sometimes quite complicated, because it has to appen to every encountered nodes, which force to think really globally. Maybe this is proper to functionnal langages but maybe it would be possible to add simple feature for this common use-case.
An really simple example (no difficult to resolve here, but it's sometimes much more complex) :
Let's say I want to get all "doc" elements that have a table as content :
<doc id="doc1">
<header/>
<table/>
<doc id="doc2">
<p/>
<doc id="doc3">
<table/>
</doc>
</doc>
<footer>
<doc id="doc4">
<table/>
</doc>
</footer>
</doc>
In this example doc1, doc3 and doc4 all have a table "as content", but doc2 doesn't, though it has a table as descendant.
The xpath to get all doc that have a table as content would be something like :
//doc[let $self := . return exists (descendant::table[ancestor::doc[1] is $self])]
I don't really have any idea on how to express a new way to break this axis, let's suggest a predicate on the axis itself, something like :
//doc[descendant[break-axis-on-matching='self::doc']::table]
This syntax is really not nice, but I guess you see the idea ? Maybe I missed a way to achieve this in Xpath 3.1 ?
Issue #87 created #created-87
[XSL] Support for "master files"
Oxygen allows to set one or more "master files" on a project. This is quite usefull when validating or searching for references while editing XSLT "modules" that depends on a main XSLT.
The main use-case is when I have an big XSLT that I want to split into modules (typically on module per mode). I could off course gather each global variable / parameters / function into the same module, but then I have to import it from each module if I want every XSLT to be valid. It would make more sens to import it once from the main XSLT, but then none of the other modules are valid anymore.
Being able to set master files would help in validation as a common XSLT feature. It would have no incidence on compilation but only as validation feature.
What do you think ?
Issue #86 created #created-86
Fallback for named timezones
§9.8.4.6 says "If no timezone name can be identified, the timezone offset is output using the fallback format +01:01." But "+01:01" is not a valid format. It should say either "01:01" or (preferably, I think) "00:00t".
Issue #85 closed #closed-85
New separators (apply-templates, for-each) vs attribute, value-of, serialization's item-separator
Issue #85 created #created-85
New separators (apply-templates, for-each) vs attribute, value-of, serialization's item-separator
Don't know if the new separator attributes for apply-templates and for-each are designed to work differently than the existing separator functionality in XSLT 3. According to one of the examples, if the instruction produces sibling text nodes then separators are included between the text nodes. In XSLT 3, sibling text nodes are always merged and separators ignored. Perhaps this inconsistency should be reconsidered, or at least a note added for clarification.
Issue #84 created #created-84
Proposal : allow ignorable <xsl:div> wrapper for documentation or organize the code
Hi,
It's a long time I'm missing a way to organize XSLT code. Using enclosed mode will help a lot, but it's not its main purpose and it's not enought I guess to be able :
- Group templates or function that go together (according to the author)
- easily comments blocks of code for debugging purpose
- add documentation on any XSLT elements : not only top level elements as with oxygen "xd" elements
- add foreign xml structures that can help for static analysing of the code (ex : informations to help with xslt schematron validation, that need autocompletion with a specific xml schema, that means processing instructions are not engough here)
What about a xsl:div
element (for division), this is a well known element's name, used in HTML but also Relax NG.
That element might have a process-content
attribute with 2 possible values:
true
(default): to say the content should be "applied" asxsl:div
might be nested with different process-content attribute valuesfalse
: to say the content has to be completely skipped at compilation
Example:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
xmlns:local="local"
xmlns:xslq="https://github.com/mricaud/xsl-quality"
xmlns:a="http://my-annotations.org"
version="4.0">
<xsl:div>
<xsl:div process-content="false">
<p>This block is about "foo"</p>
</xsl:div>
<xsl:function name="local:has-foo-child" as="xs:boolean">
<xsl:param name="e">
<xsl:div process-content="false"><xd:doc>Any elements</xd:doc></xsl:div>
</xsl:param>
<xsl:sequence select="exists($e/foo)"/>
</xsl:function>
<xsl:template match="foo">
<xsl:div process-content="false">
<xslq:schematron ignore-rule="mode-name-must-be-namepace-prefixed"/>
<xsl:div>
<xsl:value-of select="normalize-space(.)"><xsl:div process-content="false"><xd:doc>Normalization is needed here</xd:doc></xsl:div>
</xsl:template>
</xsl:div>
</xsl:stylesheet>
Writing this example let me see that it's a bit verbose. Another proposal would be to declare a set of namespaces that are to be ignored at compilation time, wether by skipping it or by apply what's inside of it. The same example would give something like this:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
xmlns:local="local"
xmlns:xslq="https://github.com/mricaud/xsl-quality"
xmlns:a="http://my-annotations.org"
version="4.0">
<xsl:ignore-namespaces select="map{
'http://www.oxygenxml.com/ns/doc/xsl' : 'skip',
'http://my-annotations.org' : 'apply',
'https://github.com/mricaud/xsl-quality' : 'skip'
}"/>
<a:div label="This block is about foo">
<xsl:function name="local:has-foo-child" as="xs:boolean">
<xsl:param name="e"><xd:doc>Any elements</xd:doc></xsl:param>
<xsl:sequence select="exists($e/foo)"/>
</xsl:function>
<xsl:template match="foo" mode="bar">
<xslq:schematron ignore-rule="mode-name-must-be-namepace-prefixed"/>
<xsl:value-of select="normalize-space(.)" xd:doc="normalization is needed here"/>
</xsl:template>
</a:div>
</xsl:stylesheet>
Well, there are probably multiple ways to achieve this need.
I would be really happy to have such possiblilities, but I don't know if some of you have the same need.
Thanks for any comments / replies / ideas / feedbacks
Issue #83 created #created-83
[XPath]Proposal: Notation for using an operator as a function
In this thread of (XML slack)#general: https://app.slack.com/client/T011VK9115Z/C011NLXE4DU/thread/C011NLXE4DU-1627497085.455200?cdn_fallback=2 there is this expression:
for-each-pair($aa, $bb, function($x, $y) {$x ne $y})
=> index-of(true())
Notice the long and unreadable: function($x, $y) {$x ne $y}
Writing, understanding and maintaining XPath code, would be enhanced if we had a better way of expressing the use of an operator as a function. In Haskell one simply writes:
(/) 4, 2
half = (/2)
(-) 4, 2
negate = (0-)
ne = (/=)
We could accept a similar convention, so the original expression above is written simply as:
for-each-pair($aa, $bb, (ne))
=> index-of(true())
Or we could use something less overloaded than parenthesis, for example:
`ne`
Then the original expression looks like this:
for-each-pair($aa, $bb, `ne`)
=> index-of(true())
Regardless which lexical representation is chosen, being able to represent an operator as a function leads to significant code simplification, and improves its readability.
Please, share your thoughts/questions on this proposal.
Issue #82 created #created-82
Should the mode attribute for apply-templates in templates of enclosed modes default to #current?
XSLT 4 with enclosed modes allows to nest xsl:template
declarations inside of an xsl:mode
declaration, to kind of wrap all templates belonging to a certain mode.
On XmlSlack, it was suggested, that for such templates, if they have an xsl:apply-templates
instruction without a mode
attribute, the mode should implicitly default to #current
, meaning the enclosed mode, and not to the default mode of the stylesheet.
So the section in https://qt4cg.org/branch/master/xslt-40/Overview-diff.html#using-modes saying about the optional mode
attribute of xsl:apply-templates
that "If the attribute is omitted, the default mode for the stylesheet module is used." needs to be adjusted to say that the enclosed mode is used if the template containing the xsl:apply-templates
is declared inside of such a mode.
Issue #81 created #created-81
[xslt30] Typo in §4.4
The text of the first Note in §4.4 reads
This list excludes documents passed as the values of stylesheet parameters or parameters of the initial named template or initial function, trees created by functions such as parse-xml, parse-xml-fragment, analyze-string, or json-to-xml, nor values returned from extension functions.
"nor" should be "and".
Issue #80 created #created-80
[FO] fn:while (before: fn:until)
Motivation
Similar to fold-left
, the function allows for an alternative writing of code that would otherwise be solved recursively, and that would possibly cause stack overflows without tail call optimizations.
In contrast to sequence-processing functions (fold functions, for-each
, filter
, others), the initial input of fn:while
can be arbitrary and will not determine the number of maximum iterations.
Summary
Applies the predicate function $test
to $input
. If the result is false
, $action
is invoked with the start value – or, subsequently, with the result of this function – until the predicate function returns false
.
Signature
Edit: The $input
argument (before: $zero
) is now defined as first parameter.
fn:while(
$input as item()*,
$test as function(item()*) as xs:boolean,
$action as function(item()*) as item()*
) as item()*
Examples / Use Cases
Calculate the square root of a number by iteratively improving an initial guess:
let $input := 3936256
return fn:while(
$input,
function($result) { abs($result * $result - $input) >= 0.0000000001 },
function($guess) { ($guess + $input div $guess) div 2 }
)
Find the first number that does not occur in a sequence:
let $values := (1 to 999, 1001 to 2000)
return while(1, -> { . = $values }, -> { . + 1 })
Equivalent Expression
declare function local:while(
$input as item()*,
$test as function(item()*) as xs:boolean,
$action as function(item()*) as item()*
) {
if($test($input)) then (
local:while($action($input), $test, $action)
) else (
$input
)
};
Issue #79 created #created-79
fn:deep-normalize-space($e as node())
Summary: removes redundant whitespace within the content of a given node, leaving the element structure intact.
Example:
<p> My <i>crazy</i>
<b> content</b>.
</p>
becomes
<p>My <i>crazy</i> <b>content</b>.</p>
Rules (expressed informally, and may need refining):
- The string value of the result is the normalize-space() of the string-value of the input.
- Every non-whitespace character in the result has the same ancestor path as the corresponding character in the input (for example if it was in an
i
element in the input, then it will be in ani
element in the output). - When several adjacent whitespace characters from different elements in the input are combined into a single space in the output, the resulting space will be in a text node whose parent is the result node corresponding to the common ancestor of those different elements.
For example <i>easy </i><b> peasy</b>
becomes <i>easy</i> <b>peasy</b>
Could perhaps also add an option to word-wrap to a given line length.
Issue #78 created #created-78
Specify strict order of evaluation for a subexpression
As discussed in a related issue #71, given an XPath expression such as (1):
for $d in ( 10, 2, 3 , current-date())
return
$d[. castable as xs:date][xs:date(.) le current-date()]
anyone who expects this expression to be evaluated without errors and to produce as result a sequence of one item, will be disappointed to get an error (as per BaseX 9.5.2):
Error:
Stopped at C:/W3C-XPath/DupsSolutions/file, 3/39:
[XPTY0004] Cannot convert xs:integer to xs:date
: 10.
At present the recommended solution to the problem is to write an expression of this kind instead (2):
for $d in ( 10, 2, 3, current-date() )
return
if($d[. castable as xs:date] eq $d)
then $d[xs:date(.) le current-date()]
else()
Evaluating this produces the expected result (a sequence of one item, which is the current-date()
).
There are many challenges with such a recommendation:
- The expression above is unreadable.
- It is very difficult and error-prone to convert manually (1) to (2)
- It would be nearly impossible to transform a more complex expression and it would be tremendously difficult to read, understand and maintain such code.
Proposed solution:
Introduce the strict-order evaluation operator ~
Then achieving a strict-order evaluation for the subexpression (of (1) above): [xs:date(.) le current-date()]
would be simply (3):
for $d in ( 10, 2, 3 , current-date())
return
$d[. castable as xs:date]~[xs:date(.) le current-date()]
In this particular case the XPath processor will rewrite the above expression into:
for $d in ( 10, 2, 3, current-date() )
return
$d[. castable as xs:date] =>
(function($x) {
$x[xs:date($x) le current-date()]
}
) ()
Issue #77 created #created-77
Allow manipulation of maps and arrays
As discussed in the xml.com Slack workspace's xpath-ng channel, there is interest in extending the XQuery Update Facility to allow manipulation of maps and arrays—in effect, to facilitate the editing of large, deep JSON documents.
For example, @DrRataplan provided this use case (the first code snippet can be viewed at fontoxml's playground):
I think XQUF for JSON may have its merit. Editing larger JSON documents using XQuery is not the most elegant. I mean, in JavaScript, changing a value in a deep map is
theMap['key']['deeperKey'].push(42)
. In XPath, it is more like:$theMap => map:put('key', $theMap?key) => map:put('deeperKey', array:append($theMap?key?deeperKey, 42)))
In XQUF terms, I think this would look a bit like:
insert 42 as last into $theMap?key?deeperKey
... which is at least a lot shorter.
At some point when working on a project that tried to edit some JSON metadata objects in XQuery I implemented a function that accepted a map, a path of keys, a value and some semantics, such as inserting at the start vs. at the end. It did not work too great in the end and we went for JavaScript functions instead. Just too explicit and hard to debug.
See also this discussion at StackOverflow, where a user was struggling to use map:put
or map:remove
on deeper entries in a map; asked, "Is XQuery 3.1 designed for advanced JSON editing?"; and worried that XQuery "might not be the right choice" for his use case. Highlights from the responses:
@michaelhkay wrote:
You're correct that doing what I call deep update of a map is quite difficult with XQuery 3.1 (and indeed XSLT 3.0) as currently defined. And it's not easy to define language constructs with clean semantics. I attempted to design a construct as an XSLT extension instruction - see https://saxonica.com/documentation10/index.html#!extensions/instructions/deep-update -- but I don't think its anywhere near a perfect solution.
@ChristianGruen wrote:
Updates primitives had been defined for JSONiq (https://www.jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#section-json-updates), but I believe they haven’t made it into the reference implementation. They could also be considered for XQuery 4.
@michaelhkay responded:
If I'm not mistaken, maps in JSONiq have object identity, which is not true of XQuery maps (which are pure functional data structures). That makes the semantics of deep update much easier to define, but makes it more difficult to make simple operations such as
put()
andremove()
efficient.
In Slack @liamquin also wrote:
the proposals i've seen for this in the past required that maps and arrays be given identity in some way, but then you have the problem that e.g.
map:insert
returns a new map, which is not how an XQuery update expression works
@jonathanrobie also wrote:
Yes, but the first question is this: how much will is there to support JSON updates in XQuery update?
I would love to have this. I no longer work for an implementation of XQuery.
@adamretter added:
Sounds like a nice idea
Issue #76 created #created-76
non-deterministic time
The current-date/Time
functions are deterministic, so they always return the same time, which is very confusing to everyone
There could be actual-current-date/Time
functions that return the actual time non-deterministically. Or call it wall-date/Time
or system-date/Time
Issue #75 created #created-75
Support processing HTML 5 template element content
The Problem
The HTML 5 specification introduces a template
element [1], [2] where the content of that element doesn't represent children of it, but are part of a content property. The root node of the content property is a DocumentFragment which is a light-weight document node. These specifications provide some non-normative guidelines for interacting with XSLT and XPath [3].
The DocumentFragment interface is defined in the HTML DOM 4.1 [4] as an instance of a Node. Within the HTML 5 specification, it is only referenced in relation to the template
element.
This affects the proposed fn:parse-html
(issue #74) function as well as databases and query processors that support storing and accessing HTML5 content via fn:doc
and other APIs.
Requirements
- Accurately represent the contents of the
template
element in the DOM/data model. - Allow a conforming implementation to process the
template
content as if it was XML content -- i.e. using the child:: axis to access the content. - Allow a conforming implementation to process the
template
content separately from child content -- e.g. if the implementation has support for the HTML DOM. - Allow authors to select the content of a
template
element. - Minimize changes to the data model specification. [*]
[*] I don't believe it is possible to support this without some changes to the data model (see the Design section below).
Design
Storing the content of the template
There are 3 options to handling the content of a template
element.
1. As children
Store the content as child elements of the
template
element.
This is how conforming processors that only understand XML content will process and view the document.
2. As a document node
Store the content as children of a
document
node, where the parent of the document node is thetemplate
element.
This would be the minimal amount of changes needed to make the HTML5 model work. The only change I can see is that this won't conform to section 6.1.2 Accessors of the data model, in that:
dm:parent
Returns the empty sequence
becomes:
dm:parent
If this is a document fragment for atemplate
element, returns thetemplate
element. Otherwise, returns the empty sequence.
Implementors using the HTML DOM would need to map DocumentFragment nodes to document-node()
.
3. As a new document-fragment node
Store the content as children of a new
document-fragment
node type, where the parent of the document-fragment node is thetemplate
element.
This is the option that is most compatible with the HTML DOM as it mirrors the DocumentFragment
interface from that, but is also the one that is the most invasive. It will require (among other things):
- Defining rules in section 6. Nodes of the data model for Document Fragment Nodes -- accessors, construction from infoset and PSVI, and infoset mapping.
- Adding a new
document-fragment()
KindTest
to the supported node/item types. - Adding
subtype-itemtype
rules for the document fragment nodes. - Adding a new
document-fragment { ... }
computed constructor for XQuery.
Selecting template content
A new forward axis should be added that supports selecting fragment nodes. Some of the possible names include:
fragment::
-- following the pattern defined by theattribute::
axis; orcontent::
-- following the nomenclature from the HTML specification for thetemplate
element contents.
The behaviour will depend on which of the 3 options above is selected for storing the content type:
- If an implementation only supports XML (option 1), the new axis will work the same as
child::
. The principle node kind is element. - If option 2 is chosen (reuse the document node), the new axis will match document nodes whose parent is a
template
element. The principle node kind is document. Note: This has an ambiguity with the reverse axes, as it is checking the parent of the node as well as the node type. - If option 3 is chosen (create a document fragment node), the new axis will match any document fragment nodes. The principle node kind is document fragment. Note: This makes more sense when the
fragment::
name is used for the axis, and would be more generally applicable, such as for computed constructor created fragments, or HTML DocumentFragments created from a JavaScript or web browser XPath/XSLT/XQuery binding such as Saxon-JS.
References
[1] https://www.w3.org/TR/html52/semantics-scripting.html#the-template-element [2] https://html.spec.whatwg.org/#the-template-element [3] https://www.w3.org/TR/html52/semantics-scripting.html#interaction-of-template-elements-with-xslt-and-xpath [4] https://www.w3.org/TR/dom41/#documentfragment
Issue #74 created #created-74
[FO] Support parsing HTML
It is common for applications that use an XQuery database engine to want to parse HTML documents when adding content from HTML pages into a database, or in other applications like generating epub documents from HTML source files. Vendors like MarkLogic (xdmp:tidy
via HTML Tidy for HTML4), BaseX (html:parse
via TagSoup), Saxon (saxon:parse-html
via TagSoup), and eXist-db (util:parse-html
via Neko) have provided custom methods to support this.
Q: Should there also be functions to list the supported methods and character encodings?
fn:parse-html
Summary
Parses HTML-based input into an XML document.
Signature
fn:parse-html($input as union(xs:string, xs:hexBinary, xs:base64Binary),
$options as map(*) := map { "method": "html5" }) as document-node()
Properties
This function is ·deterministic·, ·context-independent·, and ·focus-independent·.
Rules
The $options map conforms to record(method as union(enum("html5"), xs:string), encoding as xs:string?, *)
. A vendor may provide ·implementation-dependent· options that may vary between the different method
values.
The method
property of $options defines the approach used to convert the HTML document to XML. This specification supports html5
for using the HTML5 parsing rules for HTML content. The exact version of HTML5 used is ·implementation-dependent·.
The encoding
property of $options defines the character encoding used to decode binary data. By default, this is an empty sequence. Implementations must support at least utf-8
, utf8
, ascii
, and latin1
. Other encoding values are ·implementation-dependent·, but it is recommended that the encodings documented in the WHATWG Encoding specification [3] are supported.
If $input is an xs:string
, no character decoding is performed as the input is already decoded.
If $input is an xs:hexBinary
or xs:base64Binary
, the character encoding used to decode the binary data is determined as follows:
- if the binary data has a valid Unicode Byte Order Mark (BOM), the character encoding specified by that BOM is used.
- if
encoding
is specified in $options, that value is used; - if prescanning the first 1024 bytes of data contains a character encoding (using the rules from https://html.spec.whatwg.org/multipage/parsing.html#prescan-a-byte-stream-to-determine-its-encoding), the detected encoding is used;
- if ·implementation-dependent· heuristics (in line with the HTML5 rules) detect a character encoding, that encoding is used;
- otherwise, the encoding is "utf-8".
If the detected character encoding name is not supported, an FO######
error is raised. Otherwise, the character encoding method associated with the character encoding is used.
If the parsing method is not supported, an FO######
error is raised.
The $input is then parsed according to the specified parsing method, building an intermediate HTML Document object. The XML document-node
is then constructed by mapping the HTML document, element, attribute, text, and comment nodes to their XML equivalents.
If a HTML document contains a template
element, the contents of that element are added as children of the template
element. It is ·implementation-dependent· whether or not a processor ignores this content when evaluating path expressions on these template
elements, and how they are represented in any DOM interfaces.
Notes
The character encoding logic follows the https://html.spec.whatwg.org/multipage/parsing.html#encoding-sniffing-algorithm rules.
HTML does not support processing instructions. They are treated as comments in the HTML5 specification.
The HTML
template
element is complex as the HTML specification defines its content as being part of a separate document that is associated with the template contents property of that element, not its children. The WHATWG specification provides a non-normative guide for XSLT and XPath interacting with these elements (https://html.spec.whatwg.org/#template-XSLT-XPath).A conforming implementation may choose to parse and return the HTML into a HTML-based data model (e.g. the HTML DOM) instead of generating an XML infoset or PSVI. This is valid as long as the accessor functions (https://www.w3.org/TR/xpath-datamodel-31/#accessors) and the various syntax that works with XML nodes also works for the HTML nodes. That is, expressions like
$html/html/body/p instance of element(p)
are supported.
Examples
The expression
fn:parse-html("<html>")
returns an empty html document constructed using the HTML5 document construction rules.The expression
fn:parse-html($html, encoding: "latin2")
uses thelatin2
character encoding to parse $html, or generates an FO###### error if the processor does not support that encoding.The expression
fn:parse-html($html, method: "html5", encoding: ())
is equivalent tofn:parse-html($html)
.The expression
fn:parse-html($html, method: "tidy")
uses thetidy
method (e.g. from the HTML Tidy application) to parse $html into an XML document if supported by the implementation. Otherwise anFO######
error is raised.The expression
fn:parse-html($html, method: "tagsoup", nons: true())
uses thetagsoup
method (e.g. from the TagSoup application) to parse $html into an XML document if supported by the implementation, passing the--nons
attribute. Otherwise anFO######
error is raised.
References
- HTML 5.2, W3C.
- HTML Living Standard, WHATWG.
- Encoding Living Standard, WHATWG.
Issue #73 created #created-73
Split a string by graphemes
The new fn:characters
function is useful, but doesn't solve a problem of manipulating strings where multiple codepoints correspond to a single grapheme. For example:
- characters with one or more combining characters;
- emoji with skin tone variant selectors;
- emoji with gender variant selectors;
- multi-sequence emoji -- family, wales flag, etc.;
- region indicator pairs for flags.
Getting this right is complex, and implementing it as a regular expression is easy to get wrong/make mistakes.
fn:graphemes
Summary
Splits the supplied string into a sequence of single-grapheme (one or more character) strings.
Signature
fn:graphemes($value as xs:string?) as xs:string*
Properties
This function is ·deterministic·, ·context-independent·, and ·focus-independent·.
Rules
The function returns a sequence of strings, containing the corresponding ·grapheme· in $value. These are determined by the corresponding Unicode rules for what constitutes a ·grapheme·. The version of Unicode and the Unicode Emoji standards is ·implementation-dependent·.
If $value is a zero-length string or the empty sequence, the function returns the empty sequence.
Examples
The expression
fn:graphemes("Thérèse")
returns("T", "h", "é", "r", "è", "s", "e")
, irrespective of whether the e characters use combining characters or not.The expression
fn:graphemes("")
returns()
.The expression
fn:graphemes(())
returns()
.The expression
fn:graphemes("👋🏻👋🏼👋🏽👋🏾👋🏿")
returns("👋🏻", "👋🏼", "👋🏽", "👋🏾", "👋🏿")
.The expression
fn:graphemes("👪")
returns("👪")
.The expression
fn:graphemes("👨🔬👩🔬")
returns("👨🔬", "👩🔬")
.The expression
fn:graphemes("🇪🇪🇩🇪🇫🇷🏴🇮🇸")
returns("🇪🇪", "🇩🇪", "🇫🇷", "🏴", "🇮🇸")
.
Issue #72 created #created-72
[FO] Provide better support for URI processing within an expression
Use Case 1: Decode an encoded URI string.
This is difficult to implement correctly, and is a commonly asked question/request on sites like stackoverflow. Vendors have even implemented their own functions, like xdmp:uri-decode
.
Use Case 2: Extracting the hash/parameters from a URI string.
This is common when manipulating URI strings and not using something like RESTXQ to bind the query parameters to function parameters. The API should:
- extract the hash as a string, and the parameters as a name/value map;
- combine parameters with the same name into the same map entry;
- decode the values where necessary.
Use Case 3: Extract the other parts of a URI string.
This can be useful if writing a RESTXQ or similar implementation in XSLT/XQuery. It can also be useful for generating response headers such as Origin, or doing HTTP to HTTPS redirects.
It is easy to make mistakes and the wrong assumptions when writing a URI parser by hand. Additionally, it is not easy to implement in XSLT/XQuery as functions like analyse-string
and tokenize
are not powerful enough to implement a lexer, and manipulating codepoints is difficult without stateful logic.
Issue #71 created #created-71
[XSLT] Use of multiple predicates: order of evaluation
I notice I added an example pattern to the draft XSLT4 spec match=".[. castable as xs:date][xs:date(.) le current-date()]"
which is incorrect because processors are allowed to change the order of predicates, so you can't use the first predicate as a guard to stop the second predicate throwing an error. I've seen users fall over this (Saxon does sometimes reorder predicates). My instinct is to ban reordering of predicates; if you want to allow it, you can use the "and" operator. An alternative would be an "and" operator (say "and-also") with explicit ordering semantics, as in XPath 1.0.
Issue #70 created #created-70
[FO] Built-in function changes to support default values
This issue tracks the changes needed to the built-in functions to allow them to combine the declarations into a single definition with default parameter values.
The general approach to this is to make required arguments optional if they are for a function signature that is not the lowest argument count signature, and move any associated logic into the function.
array:subarray
- Change
array:subarray
/$length
fromxs:integer
toxs:integer?
.
Rules
Except in error cases, the result of the function is the value of the expression op:A2S($array) => fn:subsequence($start, $length) => op:S2A().
Error Conditions
A dynamic error is raised [err:FOAY0001] if $start is less than one or greater than array:size($array) + 1.
A dynamic error is raised [err:FOAY0002] if $length is not an empty sequence and is less than zero.
A dynamic error is raised [err:FOAY0001] if $length is not an empty sequence and $start + $length is greater than array:size($array) + 1.
fn:concat
This should be modified to use a sequence-variadic signature, either as a 1 parameter function (taking an xs:anyAtomicType*
value, allowing 0 and 1 arguments), or a 3 parameter function with the last parameter having the type xs:anyAtomicType*
.
fn:differences
The $options
parameter should be moved to the end of the parameter list in order to make the function a map-variadic function when default values are applied. This then makes it possible to specify the collation argument using a keyword argument in addition to specifying options as keyword arguments.
fn:resolve-uri
- Change
fn:resolve-uri
/$base
fromnode()
tonode()?
.
- If the $base argument is not supplied,
+ If the $base argument is the empty sequence,
fn:subsequence
- Change
fn:subsequence
/$length
fromxs:double
toxs:double?
.
When $length is the empty sequence, this function returns:
$input[fn:round($start) le position()]
When $length is not the empty sequence, this function returns:
$input[fn:round($start) le position()
and position() lt fn:round($start) + fn:round($length)]
fn:substring
- Change
fn:subsequence
/$length
fromxs:double
toxs:double?
.
More specifically, when $length is not the empty sequence the function returns the characters in $value whose position $p satisfies:
fn:round($start) <= $p and $p < fn:round($start) + fn:round($length)
When $length is the empty sequence the function assumes that $length is infinite and thus returns the ·characters· in $value whose position $p satisfies:
fn:round($start) <= $p
fn:tokenize
- Change
fn:tokenize
/$pattern
fromxs:string
toxs:string?
.
If $pattern is the empty sequence, the $value argument is set to fn:normalize-space($value)
and $pattern is set to ' '
.
fn:unparsed-text / fn:unparsed-text-available
- Change
fn:unparsed-text
/$encoding
fromxs:string
toxs:string?
. - Change
fn:unparsed-text-available
/$encoding
fromxs:string
toxs:string?
.
fn:unparsed-text-lines
- Change
fn:unparsed-text-lines
/$encoding
fromxs:string
toxs:string?
.
The result of the function is the same as the result of the expression fn:tokenize(fn:unparsed-text($href, $encoding), '\r\n|\r|\n')[not(position()=last() and .='')]
.
Collations
- Change
fn:collation-key
/$collation
fromxs:string
toxs:string?
. - Change
fn:compare
/$collation
fromxs:string
toxs:string?
. - Change
fn:contains
/$collation
fromxs:string
toxs:string?
. - Change
fn:contains-token
/$collation
fromxs:string
toxs:string?
. - Change
fn:deep-equal
/$collation
fromxs:string
toxs:string?
. - Change
fn:differences
/$collation
fromxs:string
toxs:string?
. - Change
fn:distinct-values
/$collation
fromxs:string
toxs:string?
. - Change
fn:ends-with
/$collation
fromxs:string
toxs:string?
. - Change
fn:index-of
/$collation
fromxs:string
toxs:string?
. - Change
fn:max
/$collation
fromxs:string
toxs:string?
. - Change
fn:min
/$collation
fromxs:string
toxs:string?
. - Change
fn:starts-with
/$collation
fromxs:string
toxs:string?
. - Change
fn:substring-after
/$collation
fromxs:string
toxs:string?
. - Change
fn:substring-before
/$collation
fromxs:string
toxs:string?
. - Change
fn:uniform
/$collation
fromxs:string
toxs:string?
. - Change
fn:unique
/$collation
fromxs:string
toxs:string?
.
Passing the empty sequence to the $collation argument is equivalent to supplying the default collation to that argument.
Issue #69 created #created-69
fn:document, fn:function-available: default arguments
This issue tracks the changes needed to the built-in functions to allow them to combine the declarations into a single definition with default parameter values.
The general approach to this is to make required arguments optional if they are for a function signature that is not the lowest argument count signature, and move any associated logic into the function.
fn:document
- Change
fn:document
/$base-node
fromnode()
tonode()?
.
- If $base-node is supplied,
+ If $base-node is not empty,
fn:function-available
- Change
fn:function-available
/$arity
fromxs:integer
toxs:integer?
.
If $arity is the empty sequence, the function-available function returns true if and only if there is at least one available function (with some arity) whose name matches the value of the $name argument.
If $arity is not the empty sequence, the function-available function returns true if and only if there is an available function whose name matches the value of the $function-name argument and whose arity matches the value of the $arity argument.
Issue #68 closed #closed-68
Don't attempt to upgrade the host
Pull request #68 created #created-68
Don't attempt to upgrade the host
The CI script shouldn't attempt to upgrade the host. CircleCI have customized some of the packages so upgrading doesn't work. And it shouldn't really be necessary anyway.
Issue #67 created #created-67
Allow optional parameters and keyword arguments on map and sequence variadic functions.
These proposed draft changes seek to address the following issues with, and limitations of, the current draft specification:
- A
%variadic("sequence")
function where the sequence type uses the+
occurrence indicator should not have an implicit default value. That would mean passing()
to the sequence, which would generate a coercion error. - Map-variadic and sequence-variadic functions cannot have user-specified default parameter values with the current draft wording. In this case the map/sequence last parameter need to be given a default in the function declaration. This allows those to be defaulted to something other than an empty map/sequence, as well as specifying the defaults for other parameters (e.g. in the case where a map is the last of several parameters).
- It should be possible to allow parameters to be specified as keyword arguments for map-variadic functions. For map-variadic functions, a keyword argument will be bound to a parameter if it matches the parameter, or added to the map if not.
Design Note:
It would be nice to support keyword arguments for sequence-variadic functions. The other design notes detail a possible way to implement this logic. This would resolve issue #26, and make the features (keyword arguments in this case) usable in all cases.
Proposal
There are two orthogonal concepts related to variadic functions:
- arity bounds -- the number of required and optional parameters a function has;
- variadic type -- how the function behaves in relation to its last parameter.
Arity Bounds
[Definition: The declared arity of a function is the number of parameters defined in the function declaration.] The declared arity includes both required and optional parameters.
[Definition: An optional parameter is a parameter with a default value.] The default value may either be specified in the function declaration, or determined by the logic described below.
[Definition: A declared optional parameter is an optional parameter specified in the function declaration.] TODO: Define a syntax for specifying declared optional parameters. [Note: see issue #64 for a proposal on doing this.]
The property A
is the declared arity of a function.
The property D
is the number of optional parameters. This is determined as follows:
- If there are any declared optional parameters,
D
is the number of declared optional parameters. - If the last parameter is a
MapTest
orRecordTest
,D
is 1. - If the last parameter is a sequence type with a minimum item occurrence of 0 (e.g. using the
*
occurrence indicator),D
is 1. - If none of the above apply,
D
is 0.
The property R
is the number of required parameters, and is determined by evaluating A-D
.
Variadic Type
The variadic type is given by the %variadic(enum("no", "map", "sequence"))
annotation. It is determined as follows:
- If the last parameter is a
MapTest
orRecordTest
,%variadic("map")
is specified. - If the last parameter is a sequence type with an unbounded maximum item occurrence (e.g. using the
*
or+
occurrence indicator),%variadic("sequence")
is specified. - If none of the above apply,
%variadic("no")
is specified.
[Definition: The variadic parameter of a function refers to the last parameter of a map-variadic or sequence-variadic function.]
The values of the MinA/MaxA, MinP/MaxP, and MinK/MaxK properties are given by the following table, where A
and R
are defined in the arity bounds section.
| %variadic
| MinA | MaxA | MinP | MaxP | MinK | MaxK |
|----------------|------|-----------|------|-----------|------|-----------|
| no
| R | A | 0 | A | 0 | A |
| map
| R | unbounded | 0 | A | 0 | unbounded |
| sequence
| R | unbounded | R | unbounded | 0 | 0 |
For %variadic("no")
and %variadic("map")
functions, positional and keyword arguments can be mixed, or the arguments can be specified as either all positional arguments, or all keyword arguments.
Note:
If a keyword argument has the name of the variadic parameter for a map-variadic function, it is used to specify the value of that map, and not a key in a constructed map. In this case, the other keyword arguments must specify parameter names as the value of the variadic parameter has already been specified, and would result in a conflicting value if any of the keyword arguments were specifying keys in the variadic parameter.
For %variadic("sequence")
functions, only positional parameters are allowed.
Design Note:
Keyword arguments could be supported for sequence-variadic functions if the presence of a keyword argument makes it function like
%variadic("no")
. That is, it is not unbounded in this case. This would work, as keyword arguments occur after positional arguments, and the variadic parameter would need to be specified as a keyword argument.The tricky part of this is that MinA/MaxA would no longer be statically determinable, in that they would depend on whether the function call used keyword arguments.
The
sequence
row would be modified as follows:|
%variadic
| MinA | MaxA | MinP | MaxP | MinK | MaxK | |----------------|------|-----------|------|-----------|------|-----------| |sequence
| R | variable [1] | 0 | unbounded | 0 | A |[1] If the function call has at least one keyword argument,
MaxA
isA
. Otherwise,MaxA
isunbounded
.
Evaluating Static Function Calls
...
-
Positional argument values are mapped to parameters in the function declaration as follows: Let the number of declared parameters be N.
- A positional argument with position M, (M < N) corresponds to the parameter in position M.
- For sequence-variadic functions, the values of arguments in positions greater than or equal to N are concatenated into a sequence, and the resulting sequence is supplied as the value of parameter N. If there are no such arguments (that is, if N-1 arguments are supplied), then the value supplied for parameter N is an empty sequence.
-
Keyword argument values are mapped to parameters in the function declaration as follows: Let the keyword corresponding to a keyword argument be K.
- If there is a parameter with name K, the keyword argument corresponds to the named parameter K.
- For map-variadic functions, the keyword argument is assembled into a map. For each keyword argument, the map has an entry whose name is the keyword (as an instance of
xs:string
) and whose corresponding value is the argument value. - For non-variadic functions, an XPST#### error is raised if there is no parameter with name K.
Design Note:
If supporting keyword arguments for sequence-variadic functions, 4/iii would handle them. That is, an error is raised if the keyword name does not match a parameter name.
-
If no argument corresponds to a parameter in the function declaration:
- If the parameter has a default value, then that value is used. TODO: define how the default value is evaluated, i.e. what context is used.
- For sequence-variadic functions, the value supplied for parameter N is an empty sequence.
- For map-variadic functions, the value supplied for parameter N is the map constructed in step 4. If no keyword arguments were used to construct the map, and empty map is used.
- If none of the above apply, an XPST#### error is raised.
-
If more than one argument corresponds to a parameter in the function declaration, an XPST#### error is raised.
...
Issue #66 created #created-66
ThinArrowTarget should use FunctionBody
For consistency with FunctionDecl
and InlineFunctionExpr
(both of which use FunctionBody
for the function body instead of EnclosedExpr
), ThinArrowTarget
should also use FunctionBody
for the inline function call version (e.g. 2 -> { . + 1 }
):
ThinArrowTarget ::= "->" ( (ArrowStaticFunction ArgumentList) |
(ArrowDynamicFunction PositionalArgumentList) |
FunctionBody )
Issue #65 created #created-65
Support using different input/output element namespaces
Use Case
There have been requests for specifying the output namespace in XQuery akin to the @xpath-default-namespace
element in XSLT. With the element and type namespaces now being able to be set independently, it would be a good idea to make this change as well, splitting the input and output default XML namespaces.
Grammar
DefaultNamespaceDecl ::= "declare" "default" ((("input" | "output")? "element") | "type" | "function")
"namespace" URILiteral
New Semantics
The default element namespace static context item is split into a default input element namespace that applies to input element contexts (e.g. path steps), and a default output element namespace that applies to output element contexts (e.g. direct/constructed elements).
The scope of the default namespace declaration is the element
, function
, input element
, output element
, or type
namespace specified in the declaration.
A default namespace declaration with the element scope will set any of the input element, output element, and type namespaces that have not been set by a corresponding input element, output element, or type scoped default namespace declaration.
Example:
Given
declare default input element namespace "A"; declare default element namespace "B";
, the output element and type namespaces will be specified by the element scope default namespace declaration "B", and the input element namespace will be specified by the input element scope default namespace declaration "A".
TODO
Map all element symbols/contexts as using either the input element or output element default namespace for NCName EQNames.
Issue #64 created #created-64
Specify optional parameters to create bounded variadic functions
The current Editor's Draft for XPath and XQuery define a %variadic("bounded")
function type, but does not define a syntax for specifying these.
Grammar
ParamList ::= RequiredParamList ( "," OptionalParamList )?
RequiredParamList ::= Param ("," Param)*
Param ::= "$" EQName TypeDeclaration?
OptionalParamList ::= OptionalParam ("," OptionalParam)*
OptionalParam ::= Param ":=" ExprSingle
Note:
I've followed the structure of positional and keyword arguments here, so the optional parameters are only valid at the end of the function. If it is decided that optional parameters can be declared anywhere in the parameter list, the grammar simplifies to:
ParamList ::= Param ("," Param)* Param ::= "$" EQName TypeDeclaration? ( ":=" ExprSingle )?
Semantics
[Definition: a parameter is an optional parameter if it has a default value specified using the := ExprSingle
syntax.] Optional parameters affect the value of R (the number of parameters that do not have a default value) in the 4.4.1 Static Functions section.
Notes
There are open questions on what to allow in the default value expression. Specifically, how to support things like the context item for functions such as fn:data#0
that use the context item if not specified (e.g. when used at the end of a path expression).
An investigation should be done on the standard functions and vendor built-in functions to see what values they take as defaults.
Issue #63 created #created-63
fn:slice, array:slice: Signatures, Examples
EDIT: 1. is obsolete, 2. and 3. are still up-to-date:
1. The current specification for fn:slice
has only one signature.
It might be recommendable to also provide signatures with 1 and 2 arguments (especially for users who don’t want to use the new syntax for specifying optional arguments).
2. The last examples look wrong; I would expect the input as results:
The expression fn:slice(("a", "b", "c", "d"), 0) returns (). The expression array:slice(["a", "b", "c", "d"], 0) returns [].
3. The first argument of array:slice
should be renamed from $input
to $array
.
Issue #62 created #created-62
[FO] The parameter types for fn:unique and array:partition are incorrectly specified.
- In both signatures of
fn:unique
the$values
parameter has the typexs:anyAtomicType**
which should bexs:anyAtomicType*
. - In
array:partition
the$input
parameter isitem(*)*
which should beitem()*
.
Issue #61 created #created-61
[FO] fn:all and fn:some have an xs:integer* return type, but describe an xs:boolean return type
The fn:all
function states:
The result of the function is true if and only if the expression every $i in $input satisfies $predicate($i) is true.
but the return type is specified as xs:integer*
. -- It should have a return type of xs:boolean
.
A similar issue occurs with fn:some
.
Issue #60 created #created-60
[FO] fn:namespace-uri-for-prefix no longer supports passing a prefix by string
The type signature of the $prefix
variable has changed from xs:string?
to union(xs:NCName, enum(''))?
. This means that passing a prefix like "fn"
will no longer work as it is not an xs:NCName
and is not a zero-length string (enum('')
).
Note: The only other affected function is the new fn:in-scope-namespaces
method. It would be useful in some cases to be able to pass the value as an xs:string
(e.g. "fn"
) without having to cast the value.
Issue #59 created #created-59
[FO] fn:replace no longer has the 3 an 4 argument variants
The signature for fn:replace in FO 4.0 [1] only has the new 5 argument variant, whereas FO 3.1 has 3 and 4 argument variants.
- https://qt4cg.org/branch/master/xpath-functions-40/Overview-diff.html#func-replace
- https://www.w3.org/TR/xpath-functions-31/#func-replace
Issue #58 created #created-58
[XQuery] String Value Templates
A string value template (SVT) is a StringLiteral that supports enclosed expression values and entities. It is written as either T"..."
or T'...'
, where the T
stands for "template".
Note: An SVT is similar to an attribute value template or text value template in XSLT.
For instance, the following expression:
for $s in ("one", "two", "red", "blue")
return T"{$s} fish"
evaluates to the sequence ("one fish", "two fish", "red fish", "blue fish")
.
Note: A string value template
T"xyz"
is equivalent to the expression<svt t="xyz"/>/@t/string()
.
Grammar
PrimaryExpr ::= ... | StringValueTemplate
StringValueTemplate ::= ('T"' (EscapeQuot | QuotAttrValueContent)* '"')
| ("T'" (EscapeApos | AposAttrValueContent)* "'")
Note: The
T"
andT'
are a single token/unit (i.e. no whitespace/comments are allowed between the characters), just like theQ{
in BracedURILiterals.
Issue #57 created #created-57
The item-type(T) syntax is not defined
Section 3.7.2 The judgement subtype-itemtype(A, B) of the XPath 4.0 and XQuery 4.0 specifications mention item-type(N)
, as does section 5.19 Item Type Declarations of the XQuery 4.0 specification. It is also not in the EBNF grammar -- searching for "item-type"
only finds the ItemTypeDecl
symbol in the XQuery 4.0 EBNF.
This should be defined in section 3.6 Item Types.
Issue #56 created #created-56
Allow item-type to be matched within its definition scope
In https://qt4cg.org/branch/master/xpath-functions-40/Overview-diff.html#func-random-number-generator, the rng
item type is defined as:
record(
number as xs:double,
next as (function() as record(number, next, permute, *)),
permute as (function(item()*) as item()*),
*
)
It would be helpful and more type specific if this could be defined as:
record(
number as xs:double,
next as (function() as rng),
permute as (function(item()*) as item()*),
*
)
where the next
field references the rng
type being defined -- this is like how structures in other languages (C/C++, Java, C#) can reference themselves as property types.
This would also provide an alternative for the ..
(self reference) specifier.
Note: the
..
syntax is still useful in the case of anonymous record types.
Issue #55 created #created-55
Provide an XML version of the stack trace
While the string version of fn:stack-trace()
is useful for debugging and including in log messages, being able to process that (from an XML representation) is also useful.
Use Cases
- providing extended functionality, like implementing a
current-function-name()
helper function -- e.g.fn:stack-trace("json")[1]?function-name
; - customizing the format of the stack trace (e.g. standardizing it across different implementations);
- using the information in libraries/IDEs/editors that call the queries -- e.g. by returning the XML and processing it in the library/IDE/editor, such as mapping the data to stack frames in the IDE/editor. Note: This is what I'm doing in my IntelliJ plugin with the MarkLogic stack XML to process query exceptions and the stack when debugging a query.
fn:stack-trace
fn:stack-trace($format as enum("text", "xml", "json") := "text") as item()
Like the current specification version of this function (with the same default semantics), but also supports XML and JSON formats. The "text" format returns an instance of xs:string
in an implementation-defined format, the "xml" format returns an instance of element(fn:stack-trace)
, and the "json" format returns an instance of array(fn:stack-frame)
.
Here, fn:stack-frame
is defined as:
declare type fn:stack-frame as record(
uri: xs:string,
function-name: xs:QName?,
line-number: xs:integer?,
column-number: xs:integer?,
*
);
The XML version has the same information as elements in the fn:
namespace (e.g. fn:uri
).
fn:format-stack-trace
fn:format-stack-trace($stack as item(),
$format as enum("text", "xml", "json") := "text") as item()
If $stack
is an instance of element(fn:stack-trace)
, it is converted into the desired output format. (If the output format is "xml", no processing is performed.)
If $stack
is an instance of array(fn:stack-frame)
, it is converted into the desired output format. (If the output format is "json", no processing is performed.)
Otherwise, an err:XPTY0004
error is raised.
fn:parse-stack-trace
fn:parse-stack-trace($stack as xs:string,
$format as enum("xml", "json")) as item()
This function takes a stack trace in the implementation-defined format and parses it to XML or JSON. The "xml" format returns an instance of element(fn:stack-trace)
, and the "json" format returns an instance of array(fn:stack-frame)
.
If $stack
is not in the correct format, an error (error code TBD) is raised.
Note: This could be useful when processing log messages or similar output.
Issue #54 created #created-54
[XPath] [XQuery] Keyword arguments don't work with all parameters/keys in static functions.
The KeywordArgument symbol restricts the argument name to an NCName. This has two issues:
- for non-variadic and bounded-variadic functions, a parameter can be a QName, so may be in a different namespace, or there can be ambiguity if there are multiple parameters with the same local-name in different namespaces;
- for map-variadic functions, a parameter key can contain spaces, so cannot be expressed as an NCName.
Syntax
KeywordArgument ::= KeywordArgumentName ":" ExprSingle
KeywordArgumentName ::= EQName | StringLiteral
NOTE: I'm using the favoured map-based syntax here. If that is not used, then the ":"
should be ":="
as it is in the current draft.
Semantics
For non-variadic and bounded-variadic functions, a KeywordArgumentName is matched as follows:
- An EQName matches against the expanded QName of the parameter;
- A StringLiteral is cast to an NCName (with an XPTY0004 error if it is not a valid NCName), which is in no namespace (like other variables such as VarName symbols); the resulting expanded QName then matches against the expanded QName of the parameter.
For map-variadic functions, a KeywordArgumentName is matched as follows:
- An NCName uses the local-name as the key in the constructed map cast to the key type of the map. This follows the XQFO casting rules with the source type of the local-name being
xs:NCName
and the target type being the map's key type.; - A QName or URIQualifiedName results in an XPTY0004 error as it does not form a valid key name;
- A StringLiteral uses the value of the string as the key in the constructed map.
Issue #53 created #created-53
Allow function keyword inline functions without parameters
The current draft InlineFunctionExpr adds ->
as a shorthand. This shorthand allows optional parameter lists (e.g. -> { true() }
), but the function
keyword version of this requires a parameter list. For consistency, the function
keyword version should also have an optional parameter list.
This means that the syntax for InlineFunctionExpr can be simplified to:
InlineFunctionExpr ::= ("function" | "->") FunctionSignature? FunctionBody
Update: From recent discussions, the ->
operator as both a thin arrow expression and an inline function definition is confusing. As such, a replacement for ->
in the inline function context should be identified.
In the context of the variant without a parameter definition (e.g. when used with arrow operators), the question is how should it work. I suggest:
- it should be a 0 and 1 arity function with the parameter argument defaulting to
()
; - if the parameter is a single value, it should bind to the
.
(context item) and~
(context value -- https://github.com/qt4cg/qtspecs/issues/129); - if the parameter is an empty sequence, or multi-valued sequence, it should bind to the
~
(context value -- https://github.com/qt4cg/qtspecs/issues/129) only.
This way, it will be usable in multiple contexts.
Issue #52 created #created-52
Allow record(*) based RecordTests
The other ItemTypes that support specifying information about the type allow type(*)
to represent any instance of the type. The new RecordTest ItemType should support this.
Syntax
The:
RecordTest ::= "record" "(" FieldDeclaration ("," FieldDeclaration)* ExtensibleFlag? ")"
symbol should be changed to:
RecordTest ::= AnyRecordTest | TypedRecordTest
AnyRecordTest ::= "record" "(" "*" ")"
TypedRecordTest ::= "record" "(" FieldDeclaration ("," FieldDeclaration)* ExtensibleFlag? ")"
NOTE: This follows the structure of the other any/typed tests (e.g. MapTest
).
Semantics
The record(*)
item type test is equivalent to map(*)
.
Issue #51 created #created-51
Generalize lookup operator for function items
The current lookup operator is a specialized expression for maps and arrays. All kinds of data structures can be realized with functions, and maps and arrays are functions as well, so it would be pretty straightforward to extend the lookup operator to arbitrary function items:
Use Cases
Return name elements whose string values contain supplied substrings
declare variable $DOC := <xml>
<name>Jack Daniels</name>
<name>Jim Beam</name>
<name>Johnny Walker</name>
</xml>;
let $names := function($key) {
$DOC//name[contains(string(), $key)]
}
return $names?('Jack', 'Jim', 'Johnny')
(: result :)
<name>Jack Daniels</name>,
<name>Jim Beam</name>,
<name>Johnny Walker</name>
Return squares of supplied integers
let $square := math:pow(?, 2)
return $square?(1 to 5)
(: result :)
1, 4, 9, 16, 25
Remarks
XPTY0004
must be raised if the wildcard*
is specified as key, and if the input is neither a map nor an array.- The extension could easily be combined with the extension for sequences (see #50).
Issue #50 created #created-50
[XPath] Introduce the lookup operator for sequences
In XPath 3.1 it is convenient to use the ?
lookup operator on arrays and maps.
It is easy and readable to construct expressions, such as:
[10, 20, 30]?(2, 3, 1, 1, 2)
And this understandably produces the sequence:
20, 30, 10, 10, 20
However, it is not possible to write:
(10, 20, 30)[2, 3, 1, 1, 2]
or
(10, 20, 30)(2, 3, 1, 1, 2)
or
(10, 20, 30)?(2, 3, 1, 1, 2)
This proposal is to allow the use on sequences
of the postfix lookup operator ?
with the same syntax as it is now used for arrays
.
The ?
lookup operator will be applied on sequences whose first item isn't an array or a map. The only change would be to allow the type of the left-hand side to be a sequence
, in addition to the currently allowed map
and array
types. At present, applying ?
on any such sequence results in error. In case the first item of the LHS sequence is an array or a map, then the current XPath 3.1 semantics is in force, which applies the RHS to each item in the sequence.
The restriction in the above paragraph can be eliminated if we decide to use a different than ?
symbol for this operator, for example ^
The goal of this feature is achieving conciseness, readability, understandability and convenience.
For example, now one could easily produce from a sequence a projection / rearrangement with any desired multiplicity and ordering.
Thus, it would be easy to express the function reverse()
as simply:
$seq?($len to 1 by -1)
Issue #49 created #created-49
[XQuery] The 'member' keyword is still present on ForMemberBinding
The latest editor's draft (13 January 2021) moves the member
keyword to a new ForMemberClause
symbol:
ForMemberClause ::= "for" "member" ForMemberBinding ("," ForMemberBinding)*
With this change, the ForMemberBinding
syntax has retained the optional member
keyword from the previous change to ForBinding
:
ForMemberBinding ::= "member"? "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle
This means that for member member ...
and for member $a in [], member $b in [] ...
are valid with the current grammar.
The ForMemberBinding
grammar should be:
ForMemberBinding ::= "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle
Issue #44 closed #closed-44
[XPath] [XQuery] Support RecordTest self references without occurrence indicators
Pull request #48 created #created-48
Create a schema-for-xslt40.xsd file for the current draft spec.
Note: schema-for-xslt30.xsd is still referenced by other source files, such as xslt-first-cut.xml, so it has not been removed.
Issue #38 closed #closed-38
Create a schema-for-xslt40.xsd file.
Issue #47 created #created-47
[XPath] [XQuery] Allow argument placeholders on keyword arguments
This would allow a user to name the arguments that are used as placeholders, making the code more readable. For example:
let $pow2 := math:pow(2, y: ?)
Syntax
This proposal would change KeywordArgument
from:
KeywordArgument ::= NCName ":=" ExprSingle
to:
KeywordArgument ::= NCName ":=" Argument
or (using the proposed :
syntax) to:
KeywordArgument ::= NCName ":" Argument
Semantics
A function call with N argument placeholders will create an N-arity function. The order of the argument placeholders correspond to the order of the parameters in that new function. Those parameters map to the corresponding parameter in the target (partially applied) function, which can be in a different order, or bind to keys in an options map (in the case of functions like fn:serialize
). For example:
math:pow(y: ?, x: ?)
would create a function that calculates y^x
instead of x^y
as the arguments are reversed.
Issue #46 created #created-46
xsl:sequence: @as
I'd like to see@as on xsl:sequence. That way i can write, e.g.
<xsl:function name="dc:slice-count" as="xs:integer">
<xsl:param name="toast" as="element(toast)" />
<xsl:for-each select="$toast" as="xs:integer">
<xsl:sequence select="@cooked-slices + @raw-slices" as="xs:integer" />
</xsl:for-each>
</xsl:function>
It would be an error for the for-each to have other than exactly one integer as its result, and the same for the @sequence
. In this simple example there's not much scope for that to happen of course,
Maybe on anything with a select
attribute?
Parenthetically, a context-item attribute on xsl:sequence would obviate the XSLT1-ish xsl:for-each there, although $toast/(@a, @b) => sum() would work as well and be XSLT 3-ish.
Issue #45 created #created-45
Second parameter of fn:sum must be neutral element for +
Currently fn:sum specifies the intent of the second parameter in a note:
The second argument allows an appropriate value to be defined to represent the sum of an empty sequence. For example, when summing a sequence of durations it would be appropriate to return a zero-length duration of the appropriate type. This argument is necessary because a system that does dynamic typing cannot distinguish "an empty sequence of integers", for example, from "an empty sequence of durations".
When implementing fn:sum on sequences of billions of items (numerics, or durations, etc), another aspect arises: this second parameter must also be, for this to work and for optimizations to be possible, a neutral element for +.
Indeed, a distributed system like Spark will produce intermediate sums for (possibly empty) subsets, and will naturally use $zero
for the sum of an empty subset. Intermediate totals are aggregated in a treewise fashion. For the result to be correct, it must be the case that $zero + $x eq $x
for any item in the sequence provided as the first parameter. It is fully aligned with the idea of the note above, but I would suggest to make this requirement a bit stricter.
Issue #44 created #created-44
[XPath] [XQuery] Support RecordTest self references without occurrence indicators
This would be useful for defining things like binary trees, where the fields are optional but (if supplied) the values are not. So it is more logical to define them as:
declare item-type binary-tree as record(
left? as ..,
right? as ..,
value as item()*
);
Issue #43 created #created-43
Support standard and user-defined composite values using item type definitions
The composite values defined in 4.14.4 Composite Atomic Values are currently specified as a table. This means that it is not possible for users to define their own properties for custom types. It is also harder for editors/IDEs, or other tools to implement as there is an element of hard-coding the logic.
These could be implemented as a properties/values record associated with the defined type. The values of the record could then be arity-1 functions that are called with the supplied value when accessed via maps. For example, in XQuery:
declare %composite-values("composite-values") type xs:date external; (: built-in :)
declare type date-composite-values := record(
year: fn:year-from-date#1,
(: ... :)
);
and XSLT:
<xsl:item-type name="xs:date" composite-values="date-composite-values"/>
<xsl:item-type name="date-composite-values" as="record(
year: fn:year-from-date#1,
(: ... :)
)"/>
So xs:date("1999-10-15")?year
would be evaluated as date-composite-values?year(xs:date("1999-10-15"))
.
Issue #42 created #created-42
Relax type incompatibility in order by clause (impl. dep. instead of XPST0004)
In the case where XQuery is used with very large sequences (billions/trillions of items or of tuples) with a parallel evaluation [1], the order by clause in its current state is costly to evaluate, because checking the primitive types for compatibility requires an extra step and materialization (in the case of Spark: an additional action to perform this check).
Relaxing this by making the order between different primitive types implementation-dependent (for the purpose of order by) rather than throwing XPST0004, in case of several incompatible primitive types in the comparison keys, would make parallel implementations more efficient.