@qt4cg statuses in 2021

This page displays status updates about the QT4 CG project from 2021.

See also recent statuses.

Issue #105 created #created-105

26 Dec at 17:54:17 GMT
Maps with Infinite Number of Keys: Total Maps and Decorated maps

Maps with Infinite Number of Keys: Total Maps and Decorated maps

1. Total Maps

Maps have become one of the most useful tools for creating readable, short and efficient XPath code. However, a significant limitation of this datatype is that a map can have only a finite number of keys. In many cases we might want to implement a map that can have more than a fixed, finite number of arguments.

Here is a typical example (Example 1):
A hotel charges per night differently, depending on how long the customer has been staying. For the first night the price is $100, for the second $90, for the third $80 and for every night after the third $75. We can immediately try to express this pricing data as a map, like this:

map {
1 : 100,
2 : 90,
3 : 80
(:  ??? How to express the price for all eventual next nights? :)
}

We could, if we had a special key, something like "TheRest", which means any other key-value, which is not one of the already specified key-values.

Here comes the first part of this proposal:

  1. We introduce a special key value, which, when specified in a map means: any possible key, different from the other keys, specified for the map. For this purpose we use the string: "\"

Adding such a "discard symbol" makes the map a total function on the set of any possible XPath atomic items.

Now we can easily express the hotel-price data as a map:

map {
1 : 100,
2 : 90,
3 : 80
'\' : 75
}

Another useful Example (2) is that now we can express any XPath item, or sequence of items as a map. Let's do this for a simple constant, like π:

let $π := map {
'\' : math:pi()
}
 return $π?*   (: produces 3.141592653589793  :)

the map above is empty (has no regular keys) and specifies that for any other key-value $k it holds that $π($k) eq math:pi()

Going further, we can express even the empty sequence (Example 3) as the following map:

let $Φ := map {
'\' : ()
}
 return $Φ?*   (: produces the empty sequence :)

Using this representation of the empty sequence, we can provide a solution for the "Forgiveness problem" raised by Jarno Jelovirta in the XML.Com #general channel in March 2021:

This expression will raise an error:

[map {"k0": 1}, map{"k0": [1, 2, 3]}]?*?("k0")?*

[XPTY0004] Input of lookup operator must be map or array: 1.

To prevent ("forgive", thus "Forgiveness Problem") the raising of such errors we could accept the rule that in XPath 4.0 any expression that evaluates to something different than a map or an array, could be coerced to the following map, which returns the empty sequence as the corresponding value for any key requested in a lookup:

map {
'\' : ()
}  (: produces the empty sequence  for any lookup:)

To summarize, what we have achieved so far:

  1. The map constructed in Example 1 is now a total function over the domain of all natural numbers. Any map with a "\" (discard key) is a total function over the value-space of all xs:anyAtomicType values
  2. We can represent any XPath 4.0 item or sequence in an easy and intuitive way as a map.
  3. It is now straight-forward to solve the "Forgiveness Problem" by introducing the natural and intuitive rule for coercing any non-map value to the empty map, and this allows to use anywhere the lookup operator ? without raising an error.

2. Decorated Maps

Although we already achieved a lot in the first part, there are still use-cases for which we don't have an adequate map solution:

  1. In the example (1) of expressing the hotel prices, we probably shouldn't get $75 for a key such as -1 or even "blah-blah-blah" But the XPath 4.0 language specification allows any atomic values to be possible keys and thus to be the argument to the map:get() function. If we want validation for the actually-allowed key-values for a specific given map, we need to have additional processing/functionality.

  2. With a discard symbol we can express only one infinite set of possible keys and group them under the same corresponding value. However, there are problems, the data for which needs several infinite sets of key-values to be projected onto different values. Here is one such problem:

Imagine we are the organizers of a very simple lottery, selling many millions of tickets, identified by their number, which is a unique natural number.

We want to grant prizes with this simple strategy.

  • Any ticket number multiple of 10 wins $10.
  • Any ticket number multiple of 100 wins $20
  • Any ticket number multiple of 1000 wins $100
  • Any ticket number multiple of 5000 wins $1000
  • Any ticket number which is a prime number wins $25000
  • Any other ticket number doesn't win a prize (wins $0)

None of the sets of key-values for each of the 6 categories above can be conveniently expressed with the map that we have so far, although we have merely 6 different cases!

How can we solve this kind of problem still using maps?

Decorators to the rescue...

What is decorator, what is the decorator pattern and when it is good to use one? According to Wikipedia:

What solution does it describe? Define Decorator objects that

  • implement the interface of the extended (decorated) object (Component) transparently by forwarding all requests to it
  • perform additional functionality before/after forwarding a request.

This allows working with different Decorator objects to extend the functionality of an object dynamically at run-time.

The idea is to couple a map with a function (the decorator) which can perform any needed preprocessing, such as validation or projection of a supplied value onto one of a predefined small set of values (that are the actual keys of the map). For simplicity, we are not discussing post-processing here, though this can also be part of a decorator, if needed.

Let us see how a decorated-map solution to the lottery problem looks like:

let $prize-table := map {
  "ten" : 10,
  "hundred" : 20,
  "thousand" : 100,
  "five-thousand" : 1000,
  "prime" : 25000,
 "\" : 0
},
$isPrime := function($input as  xs:integer) as xs:boolean
{
  exists(index-of((2, 3, 5, 7, 11, 13, 17, 19, 23), $input)) (: simplified primality checker :)
},
$decorated-map := function($base-map as map(*), $input as xs:anyAtomicType) as item()*
{
  let $raw-result :=
         (
          let $key := 
           if(not($input castable as xs:positiveInteger)) then '\'  (: we can call the error() function here :) 
             else if($input mod 5000 eq 0) then 'five-thousand'
             else if($input mod 1000 eq 0) then 'thousand'
             else if($input mod 100 eq 0) then 'hundred'
             else if($input mod 10 eq 0) then 'ten'
             else if($isPrime($input)) then 'prime'
             else "\"
          return $base-map($key)
         ),
      $post-process := function($x) {$x},  (: using identity here for simplicity :)
      $post-processed := $post-process($raw-result)
    return $post-processed
},

$prizeForTicket := $decorated-map($prize-table, ?),       (: Note: this is exactly the lookup operator  ?    :)
$ticketNumbers := (1, 10, 100, 1000, 5000, 19, -3, "blah-blah-blah")

return $ticketNumbers ! $prizeForTicket(.)          (: produces 0, 10, 20, 100, 1000, 25000, 0, 0 :)

Conclusion

  1. In the 2nd part of this proposal, a new type/function -- the decorated-map was described.

  2. We defined the signature of a decorated-map and gave an example how to construct and use one in solving a specific problem. In particular, the proposal is to have a standard function:

    decorated-map ($base-map as map(*), $input as xs:anyAtomicType) as item()*

  3. Finally, we showed that the lookup operator ? on a decorated map $dm is identical to and should be defined as :

    $dm($base-map, ?)

What remains to be done?

The topic of decorators is extremely important, as a decorator may and should be possible to be defined on any function, not just on maps. This would be addressed in one or more new proposals. Stay tuned 😊

Issue #104 created #created-104

26 Dec at 11:36:08 GMT
name of map:replace/array:replace

The name of map:replace/array:replace is easily confused with fn:replace. One might think that map:replace applies a regular expression to all the keys of a map (which might be a quite useful replacement for JSON object).

It might also be confused with the replace function of Java's hashmap, which only inserts a new value if the key already exists in the map.

One could name it map:put-with-function or map:put-f or map:putf or map:puf for short

Or something else: map:change, map:modify, map:alter

Issue #103 created #created-103

18 Dec at 21:15:13 GMT
fn:all, fn:some

a) the text says the function returns boolean, but the signature says integer*

b) the text considers a case where the second argument is omitted, but there is no one argument function signature

c) fn:some is a wrapper around XQuery's some expression, and fn:all is a wrapper around XQuery's every expression. Is this not confusing and people would expect it to be a wrapper around some kind of all expression? Or be called fn:every?

d) I think it is pointless to have such functions when there are already the some/every XQuery expressions

Issue #102 created #created-102

13 Dec at 16:52:25 GMT
[xslt30] Meaning of the term "lexical space"

The XSLT 3.0 specification uses the term "lexical space" rather freely, without definition.

In XSD, the "lexical space" for a data type is the set of lexical representations AFTER any whitespace removal. For example, the lexical space for xs:integer does not allow leading or trailing whitespace. Such whitespace is valid in an instance document, but it is stripped prior to validation by a "pre-lexical" transformation.

In XSLT, as far as I can see, the intended reading of a phrase such as "a string in the lexical space of xs:integer" is "a string that is castable to xs:integer", which includes strings with leading and trailing whitespace. In some cases the text can only be read this way.

The F&O spec gets this right. Section 19.2, relating to casting from xs:string, says "The supplied string is mapped to a typed value of the target type as defined in [XML Schema Part 2: Datatypes Second Edition]. Whitespace normalization is applied as indicated by the whiteSpace facet for the datatype. The resulting whitespace-normalized string must be a valid lexical form for the datatype. The semantics of casting follow the rules of XML Schema validation."

Issue #101 created #created-101

13 Dec at 13:14:48 GMT
fn:serialize line breaks

Normally fn:serialize uses LF for line breaks

But on Windows you want to have CR LF

There could be an option for that

Issue #100 created #created-100

01 Dec at 09:43:22 GMT
[FO] Typo in §17.5.3

implementation-dependant => implementation-dependent

Issue #99 created #created-99

28 Nov at 01:40:19 GMT
Functions that determine equality of two sequences or equality of two arrays

The only standard XPath 3.1 function that compares two arrays or two sequences for equality is the deep-equal() function. It implements "value-based equality" which may not always be the equality one needs to check for. For example, the standard XPath 3.1 operator is implements a check for "identity-based equality" on nodes.

Thus for two nodes $n1 and $n2 it is possible that:

deep-equal($n1, $n2) ne ($n1 is $n2)

The functions defined below can be used to verify a more generic kind of equality between two sequences or between two arrays. These functions accept as a parameter a user-provided function $compare(), which is used to decide whether or not two corresponding items of the two sequences, or two constituents of the two arrays are "equal".

fn:sequence-equal($seq1 as item()*, $seq2 as item()*, 
                  $compare as function(item(), item()) as xs:boolean := deep-equal#2) as xs:boolean

fn:array-equal($ar1 as array(*), $ar2 as array(*), 
               $compare as function(item()*, item()*) as xs:boolean := deep-equal#2) as xs:boolean

Examples:

fn:sequence-equal((1, 2, 3), (1, 2, 3))  (: returns true() :)
fn:sequence-equal((1, 2, 3), (1, 2, 5))  (: returns false() :)
fn:sequence-equal((1), (1, 2))  (: returns false() :)
fn:sequence-equal((), ())  (: returns true() :)
let $compare := function($ arg1 as xs:integer, $arg2 as xs:integer) {$arg1 mod 2 eq $arg2 mod 2}
   return fn:sequence-equal((1, 2, 3), (5, 6, 7), $compare)  (: returns true() :)

let $compare := function($ arg1 as xs:integer, $arg2 as xs:integer) {$arg1 mod 2 eq $arg2 mod 2}
   return fn:sequence-equal((1, 2, 3), (5, 6, 8), $compare)  (: returns false() :)
fn:array-equal([1, 2, 3], [1, 2, 3]) (: returns true() :)
fn:array-equal([1, 2, 3], [1, 2, 5])  (: returns false() :)
fn:array-equal([1], [1, 2])  (: returns false() :) 
fn:array-equal([], [])  (: returns true() :)
fn:array-equal([], [()])  (: returns false() :)

Possible implementations:

  1. Here is a pure XPath implementation of fn:sequence-equal:
let $compare := function($it1 as item(), $it2 as item()) as xs:boolean 
                {deep-equal($it1, $it2)},
    $sequence-equal := function($seq1 as item()*, $seq2 as item()*, 
                                $compare as function(item(), item()) as xs:boolean, 
                                $self as function(*)) as xs:boolean
{
   let $size1 := count($seq1), $size2 := count($seq2)
    return
      if($size1 ne $size2) then false()
      else
         $size1 eq 0
        or
         $compare(head($seq1), head($seq2)) and $self(tail($seq1), tail($seq2), $compare, $self)
}
 return
   $sequence-equal((1, 2, 3), (1, 2, 3), $compare, $sequence-equal)
  1. Below is a pure XPath implementation of fn:array-equal:
let  $compare := function($val1 as item()*, $val2 as item()*) as xs:boolean 
                {deep-equal($val1, $val2)},
     $array-equal := function($ar1 as array(*), $ar2 as array(*), 
                              $compare as function(item()*, item()*) as xs:boolean, 
                              $self as function(*)) as xs:boolean
{
   let $size1 := array:size($ar1), $size2 := array:size($ar2)
    return
      if($size1 ne $size2) then false()
      else
         $size1 eq 0
        or
         $compare(array:head($ar1), array:head($ar2)) and $self(array:tail($ar1), array:tail($ar2), $compare, $self)
}
 return
   $array-equal([], [()], $compare, $array-equal)

Issue #98 created #created-98

24 Nov at 17:05:47 GMT
Support ignoring whitespace/indentation differences in fn:deep-equal.

Signatures

fn:deep-equal( $input1 as item()*,
               $input2 as item()*,
               $collation as xs:string? := (),
               $boundary-space as enum("preserve", "strip") := "preserve") as xs:boolean

Notes

If $boundary-space is "preserve", then any whitespace differences are checked and would result in the function returning false().

If $boundary-space is "strip", then any whitespace differences are ignored and would result in the function returning true() if the inputs are otherwise identical.

Use Case

Comparing two XML fragments in a unit test assertion where you don't care about indentation differences.

Issue #97 created #created-97

22 Nov at 18:53:57 GMT
[XPath] Functions symmetric to `head()` and `tail()` for sequences and arrays

In Xpath 3.1 we already have head(), tail(), and last()

But there is no function that produces the subsequence of all items of a sequence except the last one. There exists such a function in other programming languages. For example, in Haskell this is the init function.

And the last() function isn't the symmetric opposite of head() -- it doesn't give us the last item in a sequence, just its position. So we need another function: fn:heel() for this.

fn:init($sequence as item()*) as item()*

fn:heel($sequence as item()*) as item()?

init($seq) is a convenient shorthand for subsequence($seq, 1, count($seq) -1)

heel($seq) is a convenient shorthand for slice($seq, -1)

Examples:

fn:init(('a', 'b', 'c')) returns 'a', 'b'

fn:init(('a', 'b')) returns 'a'

fn:init('a') returns ()

fn:init(()) returns ()

fn:heel('a', 'b', 'c') returns 'c'

('a', 'b', 'c') => init() => heel() returns 'b'

It makes sense to have fn:init() and fn:heel() defined on arrays, too.

array:init($array as array(*)) as array(*)

array:heel($array as array(*)) as item()*

Examples:

array:init([1, 2, 3, 4, 5]) returns [1, 2, 3, 4]

array:init([1]) returns []

array:heel([1, 2, 3, (4, 5)]) returns (4, 5)

array:heel([()]) returns () (the empty sequence)

array:init([]) produces error

array:heel([]) produces error

[1, 2, 3, (4, 5)] =>array:heel() => heel() returns 5

I would challenge anyone to re-write the last example in understandable way using fn:slice() 💯

Issue #96 created #created-96

22 Nov at 15:56:45 GMT
[XPath] Functions that determine if a given sequence starts with another sequence or ends with another sequence

It is surprising that we are at version 4 and still are missing:

(1) fn:starts-with-sequence($container as item()*, $maybe-start as item()*, 
                            $compare as function(item(), item()) as xs:boolean := deep-equal)
                            ) as xs:boolean

and

(2) fn:ends-with-sequence($container as item()*, $maybe-end as item()*, 
                          $compare as function(item(), item()) as xs:boolean := deep-equal)
                          ) as xs:boolean

(2) above is a shorthand for:

fn:starts-with-sequence(reverse($container), reverse($maybe-end)) 

Examples:

fn:starts-with-sequence(('a', 'b', 'c', 'd'), ('a', 'b')) returns true()

fn:starts-with-sequence(('a', 'b', 'c', 'd'), ('a', 'c')) returns false()

fn:ends-with-sequence(('a', 'b', 'c', 'd'), ('c', 'd')) returns true()

fn:ends-with-sequence(('a', 'b', 'c', 'd'), ('b', 'd')) returns false()

('a', 'b', 'c', 'd') => starts-with-sequence(('a', 'b')) returns true()

('a', 'b', 'c', 'd') => starts-with-sequence(('a', 'c')) returns false()

('a', 'b', 'c', 'd') => ends-with-sequence(('c', 'd')) returns true()

('a', 'b', 'c', 'd') => ends-with-sequence(('b', 'd')) returns false()

One possible implementation:

let $starts-with-sequence := function($seq1 as item()*, $seq2 as item()*, $self as function(*))
{
   empty($seq2)
  or
   head($seq1) eq head($seq2) and $self(subsequence($seq1, 2), subsequence($seq2, 2), $self)
}
  return
    $starts-with-sequence(('a', 'b', 'c', 'd'), ('a', 'b', 'c'), $starts-with-sequence)

Issue #95 closed #closed-95

22 Nov at 09:23:55 GMT

[XPath] URI validation function

Issue #95 created #created-95

21 Nov at 18:15:41 GMT
[XPath] URI validation function

Apparently there is no function to validate URI syntax against the relevant RFC. I expected that casting to xs:anyURI validates but that seems not to be the case:

Because it is impractical for processors to check that a value is a context-appropriate URI reference, this specification follows the lead of [RFC 2396] (as amended by [RFC 2732]) in this matter: such rules and restrictions are not part of type validity and are not checked by ·minimally conforming· processors. Thus in practice the above definition imposes only very modest obligations on ·minimally conforming· processors.

https://www.w3.org/TR/xmlschema-2/#anyURI

Valid URIs are quite critical in strict RDF output formats such as RDF/XML.

Issue #94 created #created-94

20 Nov at 19:32:01 GMT
Functions that determine if a given sequence is a subsequence of another sequence

It is surprising that we are at version 4 and still are missing:

(1) fn:has-subsequence($container as item()*, $maybe-subsequence as item()*, 
                       $compare as function(item(), item()) as xs:boolean := deep-equal)
                       ) as xs:boolean

and

(2) fn:has-subsequence($container as item()*, $maybe-subsequence as item()*, $contiguous-subsequence := true(),
                       $compare as function(item(), item()) as xs:boolean := deep-equal)
                       ) as xs:boolean

and

(3) fn:has-non-contigous-subsequence($container as item()*, $maybe-subsequence as item()*,
                                     $compare as function(item(), item()) as xs:boolean := deep-equal)
                                     ) as xs:boolean

(3) above is a shorthand for:

fn:has-subsequence(?, ?, false()) 

Examples:

fn:has-subsequence(('a', 'b', 'c', 'd'), ('b', 'c')) returns true()

fn:has-subsequence(('a', 'b', 'c', 'd'), ('b', 'd')) returns false()

fn:has-non-contigous-subsequence(('a', 'b', 'c', 'd'), ('b', 'd')) returns true()

fn:has-non-contigous-subsequence(('a', 'b', 'c', 'd'), ('d', 'b')) returns false()

('a', 'b', 'c', 'd') => has-subsequence(('b', 'c')) returns true()

('a', 'b', 'c', 'd') => has-subsequence(('b', 'd')) returns false()

('a', 'b', 'c', 'd') => has-non-contigous-subsequence(('b', 'd')) returns true()

('a', 'b', 'c', 'd') => has-non-contigous-subsequence(('d, 'b')) returns false()

Issue #93 created #created-93

20 Nov at 16:10:04 GMT
Support order by ascending/descending from a string value.

Use Case

It is a common pattern to have a sort key and direction on requests that support listing or searching an object (e.g. authors).

Currently, in order to switch between ascending/descending in a FLWOR expression two separate expressions need to be written. For example:

if ($sort-order eq "ascending") then
    for $name in $authors order by $name ascending return $name
else
    for $name in $authors order by $name descending return $name

It would be cleaner if this could be rewritten as:

for $name in $authors
order by $name in $sort-order order
return $name

Syntax

OrderModifier ::= ("ascending"  |  "descending"  |  OrderDirection)?
                  ("empty"  ("greatest"  |  "least"))?
                  ("collation" URILiteral)?
OrderDirection ::= "in" ExprSingle "order"

Semantics

If OrderDirection is used, the expression is evaluated.

  1. If the expression is not a single atomic value, an XQST#### error is raised.
  2. If the expression evaluates to the string "ascending", this is the same as using the ascending keyword.
  3. If the expression evaluates to the string "descending", this is the same as using the descending keyword.
  4. Otherwise, an XQST#### error is raised.

Issue #92 created #created-92

24 Oct at 13:58:58 GMT
Simplify rule for attribute values on Extension Instructions used to invoke named templates

Regarding the rule in the current proposal for Invoking Named Templates with Extension Instructions:

The way in which attribute values are handled depends on the type declaration of the template parameter...

I have some problems with this dependency on parameter type (to control whether value is an AVT or XPath expression):

  1. In many cases, a xs:string or xs:boolean type passed as a param will be a variable reference so a coder needs to enter name="{$myName}" instead of name="$myName" in their XSLT editor.
  2. If passing a literal xs:string type, the syntax: name="first" would be easy for a human reader to misinterpret as a NameTest instead of a StringLiteral.
  3. The dependency on param type means more effort (and thus poorer performance) for a tokenizer or syntax-highlighter as it may need to get type information from included/imported XSLT stylesheet modules or from extension elements declared later in the same XSLT module.

The third point above is most important from my viewpoint as maintainer of an XSLT editor, but I believe the first two points are also valid.

For these reasons, I propose that: all attribute-values on extension instructions used to invoke named templates are treated as XPath expressions.

Issue #91 created #created-91

17 Sep at 17:58:23 GMT
name of map:substitute

map:substitute is a weird name for the function. It sounds as it would change just one value with a new value like map:put

Actually it is mapping all values. map:map or map:map-values would be more fitting

Or map:for-each would have been logical. Unfortunately it is already taken. fn:for-each takes a sequence and returns a sequence, array:for-each takes an array and returns an array. map:for-each takes a map and returns a ~map~ sequence. makes no sense. Anyways, map:for-each-value would also be a good name

Other languages have other names. It could also be called map:transform like C++, or map:apply like pari/gp

Issue #90 created #created-90

16 Sep at 11:35:42 GMT
Simplified simplified stylesheets

A couple of suggestions for making "simplified stylesheets" more useful:

(a) Allow the xsl:version (and therefore the XSLT namespace declaration) to be omitted; the default is supplied by the processor. So this becomes a valid stylesheet:

<out id="{/*/@id}">
  <x>{/thing/foo[1]/x}</x>
  <y>{/thing/foo[2]/x}</y>
</out>

(b) Allow "single-template" stylesheets as an intermediate form between simplified stylesheets and full stylesheets:

<xsl:xslt xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:strip-space elements="*"/>
  <xsl:param name="id"/>
  <out id="{$id}">
    <x>{/thing/foo[1]/x}</x>
    <y>{/thing/foo[2]/x}</y>
  </out> 
</xsl:xslt>

The last child element of xsl:xslt, if not in the XSLT namespace, is implicitly wrapped in <xsl:template match="/">

Issue #89 created #created-89

02 Sep at 18:58:07 GMT
[XQuery] DirPIConstructor permits ':' in the PI name.

Overview

The PITarget symbol allows a colon in the grammar, but the rest of the spec and XQuery implementations (tested on BaseX, Saxon, and MarkLogic) disalow : in DirPIConstructor productions.

Details

The DirPIConstructor construct is defined as:

[151] DirPIConstructor ::= "<?"  PITarget  (S DirPIContents)?  "?>" | /* ws: explicit */
[232] PITarget ::= [http://www.w3.org/TR/REC-xml#NT-PITarget]XML /* xgc: xml-version */

with XML defining PITarget as:

[17] PITarget ::= Name - (('X' \| 'x') ('M' \| 'm') ('L' \| 'l'))

The "excluding 'xml' in any case insensitive form" part is covered by the 3.9.2 Other Direct Constructors section.

While the XML specification states:

The Namespaces in XML Recommendation [XML Names] assigns a meaning to names containing colon characters. Therefore, authors should not use the colon in XML names except for namespace purposes, but XML processors must accept the colon as a name character.

various XQuery processors disallow a colon here in line with the rest of the XQuery specification.

Proposal

Change PITarget to:

[232] PITarget ::= NCName

to reflect actual usage and align it with the rest of the XQuery specification.

Issue #88 created #created-88

16 Aug at 10:06:55 GMT
[XPATH] breaking ancestor or descendant axes

A common issue that I often have to deal with in Xpath (within xslt most of the time) is to be able to break descendants or ancestors axes. To do that I have to use predicates which are sometimes quite complicated, because it has to appen to every encountered nodes, which force to think really globally. Maybe this is proper to functionnal langages but maybe it would be possible to add simple feature for this common use-case.

An really simple example (no difficult to resolve here, but it's sometimes much more complex) :

Let's say I want to get all "doc" elements that have a table as content :

<doc id="doc1">
  <header/>
  <table/>
  <doc id="doc2">
    <p/>
    <doc id="doc3">
      <table/>
    </doc>
  </doc>
  <footer>
    <doc id="doc4">
      <table/>
    </doc>
  </footer>
</doc>

In this example doc1, doc3 and doc4 all have a table "as content", but doc2 doesn't, though it has a table as descendant.

The xpath to get all doc that have a table as content would be something like :

//doc[let $self := . return exists (descendant::table[ancestor::doc[1] is $self])]

I don't really have any idea on how to express a new way to break this axis, let's suggest a predicate on the axis itself, something like :

//doc[descendant[break-axis-on-matching='self::doc']::table]

This syntax is really not nice, but I guess you see the idea ? Maybe I missed a way to achieve this in Xpath 3.1 ?

Issue #87 created #created-87

16 Aug at 09:16:24 GMT
[XSL] Support for "master files"

Oxygen allows to set one or more "master files" on a project. This is quite usefull when validating or searching for references while editing XSLT "modules" that depends on a main XSLT.

The main use-case is when I have an big XSLT that I want to split into modules (typically on module per mode). I could off course gather each global variable / parameters / function into the same module, but then I have to import it from each module if I want every XSLT to be valid. It would make more sens to import it once from the main XSLT, but then none of the other modules are valid anymore.

Being able to set master files would help in validation as a common XSLT feature. It would have no incidence on compilation but only as validation feature.

What do you think ?

Issue #86 created #created-86

09 Aug at 09:14:09 GMT
Fallback for named timezones

§9.8.4.6 says "If no timezone name can be identified, the timezone offset is output using the fallback format +01:01." But "+01:01" is not a valid format. It should say either "01:01" or (preferably, I think) "00:00t".

Issue #85 closed #closed-85

08 Aug at 16:07:15 GMT

New separators (apply-templates, for-each) vs attribute, value-of, serialization's item-separator

Issue #85 created #created-85

08 Aug at 15:32:46 GMT
New separators (apply-templates, for-each) vs attribute, value-of, serialization's item-separator

Don't know if the new separator attributes for apply-templates and for-each are designed to work differently than the existing separator functionality in XSLT 3. According to one of the examples, if the instruction produces sibling text nodes then separators are included between the text nodes. In XSLT 3, sibling text nodes are always merged and separators ignored. Perhaps this inconsistency should be reconsidered, or at least a note added for clarification.

Issue #84 created #created-84

30 Jul at 13:26:48 GMT
Proposal : allow ignorable <xsl:div> wrapper for documentation or organize the code

Hi,

It's a long time I'm missing a way to organize XSLT code. Using enclosed mode will help a lot, but it's not its main purpose and it's not enought I guess to be able :

  • Group templates or function that go together (according to the author)
  • easily comments blocks of code for debugging purpose
  • add documentation on any XSLT elements : not only top level elements as with oxygen "xd" elements
  • add foreign xml structures that can help for static analysing of the code (ex : informations to help with xslt schematron validation, that need autocompletion with a specific xml schema, that means processing instructions are not engough here)

What about a xsl:div element (for division), this is a well known element's name, used in HTML but also Relax NG. That element might have a process-content attribute with 2 possible values:

  • true (default): to say the content should be "applied" as xsl:div might be nested with different process-content attribute values
  • false: to say the content has to be completely skipped at compilation

Example:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
  xmlns:local="local"
  xmlns:xslq="https://github.com/mricaud/xsl-quality"
  xmlns:a="http://my-annotations.org"
  version="4.0">
  
  <xsl:div>
    <xsl:div process-content="false">
      <p>This block is about "foo"</p>
    </xsl:div>
    
    <xsl:function name="local:has-foo-child" as="xs:boolean">
      <xsl:param name="e">
        <xsl:div process-content="false"><xd:doc>Any elements</xd:doc></xsl:div>
      </xsl:param>
      <xsl:sequence select="exists($e/foo)"/>
    </xsl:function>
    
    <xsl:template match="foo">
        <xsl:div process-content="false">
           <xslq:schematron ignore-rule="mode-name-must-be-namepace-prefixed"/>
         <xsl:div>
        <xsl:value-of select="normalize-space(.)"><xsl:div process-content="false"><xd:doc>Normalization is needed here</xd:doc></xsl:div>
    </xsl:template>
    
  </xsl:div>
  
</xsl:stylesheet>

Writing this example let me see that it's a bit verbose. Another proposal would be to declare a set of namespaces that are to be ignored at compilation time, wether by skipping it or by apply what's inside of it. The same example would give something like this:

<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
  xmlns:local="local"
  xmlns:xslq="https://github.com/mricaud/xsl-quality"
  xmlns:a="http://my-annotations.org"
  version="4.0">
  
  <xsl:ignore-namespaces select="map{
    'http://www.oxygenxml.com/ns/doc/xsl' : 'skip', 
    'http://my-annotations.org' : 'apply',
    'https://github.com/mricaud/xsl-quality' : 'skip'
    }"/>
    
  
  <a:div label="This block is about foo">
    
    <xsl:function name="local:has-foo-child" as="xs:boolean">
      <xsl:param name="e"><xd:doc>Any elements</xd:doc></xsl:param>
      <xsl:sequence select="exists($e/foo)"/>
    </xsl:function>
    
    <xsl:template match="foo" mode="bar">
      <xslq:schematron ignore-rule="mode-name-must-be-namepace-prefixed"/>
      <xsl:value-of select="normalize-space(.)" xd:doc="normalization is needed here"/>
    </xsl:template>
    
  </a:div>
  
</xsl:stylesheet>

Well, there are probably multiple ways to achieve this need.

I would be really happy to have such possiblilities, but I don't know if some of you have the same need.

Thanks for any comments / replies / ideas / feedbacks

Issue #83 created #created-83

29 Jul at 15:08:06 GMT
[XPath]Proposal: Notation for using an operator as a function

In this thread of (XML slack)#general: https://app.slack.com/client/T011VK9115Z/C011NLXE4DU/thread/C011NLXE4DU-1627497085.455200?cdn_fallback=2 there is this expression:

for-each-pair($aa, $bb, function($x, $y) {$x ne $y}) 
           => index-of(true())

Notice the long and unreadable: function($x, $y) {$x ne $y}

Writing, understanding and maintaining XPath code, would be enhanced if we had a better way of expressing the use of an operator as a function. In Haskell one simply writes:

(/)  4, 2
half   = (/2)
(-) 4, 2
negate = (0-)
ne = (/=)

We could accept a similar convention, so the original expression above is written simply as:

for-each-pair($aa, $bb, (ne)) 
           => index-of(true())

Or we could use something less overloaded than parenthesis, for example:

`ne`

Then the original expression looks like this:

for-each-pair($aa, $bb, `ne`) 
           => index-of(true())

Regardless which lexical representation is chosen, being able to represent an operator as a function leads to significant code simplification, and improves its readability.

Please, share your thoughts/questions on this proposal.

Issue #82 created #created-82

29 Jul at 12:25:51 GMT
Should the mode attribute for apply-templates in templates of enclosed modes default to #current?

XSLT 4 with enclosed modes allows to nest xsl:template declarations inside of an xsl:mode declaration, to kind of wrap all templates belonging to a certain mode.

On XmlSlack, it was suggested, that for such templates, if they have an xsl:apply-templates instruction without a mode attribute, the mode should implicitly default to #current, meaning the enclosed mode, and not to the default mode of the stylesheet.

So the section in https://qt4cg.org/branch/master/xslt-40/Overview-diff.html#using-modes saying about the optional mode attribute of xsl:apply-templates that "If the attribute is omitted, the default mode for the stylesheet module is used." needs to be adjusted to say that the enclosed mode is used if the template containing the xsl:apply-templates is declared inside of such a mode.

Issue #81 created #created-81

27 Jul at 09:20:24 GMT
[xslt30] Typo in §4.4

The text of the first Note in §4.4 reads

This list excludes documents passed as the values of stylesheet parameters or parameters of the initial named template or initial function, trees created by functions such as parse-xml, parse-xml-fragment, analyze-string, or json-to-xml, nor values returned from extension functions.

"nor" should be "and".

Issue #80 created #created-80

14 Jun at 11:25:12 GMT
[FO] fn:while (before: fn:until)

Motivation

Similar to fold-left, the function allows for an alternative writing of code that would otherwise be solved recursively, and that would possibly cause stack overflows without tail call optimizations.

In contrast to sequence-processing functions (fold functions, for-each, filter, others), the initial input of fn:while can be arbitrary and will not determine the number of maximum iterations.

Summary

Applies the predicate function $test to $input. If the result is false, $action is invoked with the start value – or, subsequently, with the result of this function – until the predicate function returns false.

Signature

Edit: The $input argument (before: $zero) is now defined as first parameter.

fn:while(
  $input  as item()*,
  $test   as function(item()*) as xs:boolean,
  $action as function(item()*) as item()*
) as item()*

Examples / Use Cases

Calculate the square root of a number by iteratively improving an initial guess:

let $input := 3936256
return fn:while(
  $input,
  function($result) { abs($result * $result - $input) >= 0.0000000001 },
  function($guess) { ($guess + $input div $guess) div 2 }
)

Find the first number that does not occur in a sequence:

let $values := (1 to 999, 1001 to 2000)
return while(1, -> { . = $values }, -> { . + 1 })

Equivalent Expression

declare function local:while(
  $input  as item()*,
  $test   as function(item()*) as xs:boolean,
  $action as function(item()*) as item()*
) {
  if($test($input)) then (
    local:while($action($input), $test, $action)
  ) else (
    $input
  )
};

Issue #79 created #created-79

04 Jun at 07:46:43 GMT
fn:deep-normalize-space($e as node())

Summary: removes redundant whitespace within the content of a given node, leaving the element structure intact.

Example:

 <p>  My <i>crazy</i>
<b> content</b>.
</p> 

becomes

<p>My <i>crazy</i> <b>content</b>.</p> Rules (expressed informally, and may need refining):

  • The string value of the result is the normalize-space() of the string-value of the input.
  • Every non-whitespace character in the result has the same ancestor path as the corresponding character in the input (for example if it was in an i element in the input, then it will be in an i element in the output).
  • When several adjacent whitespace characters from different elements in the input are combined into a single space in the output, the resulting space will be in a text node whose parent is the result node corresponding to the common ancestor of those different elements.

For example <i>easy </i><b> peasy</b> becomes <i>easy</i> <b>peasy</b>

Could perhaps also add an option to word-wrap to a given line length.

Issue #78 created #created-78

30 May at 00:33:20 GMT
Specify strict order of evaluation for a subexpression

As discussed in a related issue #71, given an XPath expression such as (1):

for $d in  ( 10, 2, 3 , current-date())
  return
     $d[. castable as xs:date][xs:date(.) le current-date()]

anyone who expects this expression to be evaluated without errors and to produce as result a sequence of one item, will be disappointed to get an error (as per BaseX 9.5.2):

Error: Stopped at C:/W3C-XPath/DupsSolutions/file, 3/39: [XPTY0004] Cannot convert xs:integer to xs:date: 10.

At present the recommended solution to the problem is to write an expression of this kind instead (2):

for $d in  ( 10, 2, 3, current-date() )
  return
     if($d[. castable as xs:date] eq $d)
       then $d[xs:date(.) le current-date()]
       else()

Evaluating this produces the expected result (a sequence of one item, which is the current-date() ).

There are many challenges with such a recommendation:

  1. The expression above is unreadable.
  2. It is very difficult and error-prone to convert manually (1) to (2)
  3. It would be nearly impossible to transform a more complex expression and it would be tremendously difficult to read, understand and maintain such code.

Proposed solution:

Introduce the strict-order evaluation operator ~

Then achieving a strict-order evaluation for the subexpression (of (1) above): [xs:date(.) le current-date()] would be simply (3):

for $d in  ( 10, 2, 3 , current-date())
  return
     $d[. castable as xs:date]~[xs:date(.) le current-date()]

In this particular case the XPath processor will rewrite the above expression into:

for $d in  ( 10, 2, 3, current-date() )
  return
     $d[. castable as xs:date] => 
                                  (function($x) {
                                                 $x[xs:date($x) le current-date()]
                                                 }
                                  ) ()

Issue #77 created #created-77

25 May at 22:00:28 GMT
Allow manipulation of maps and arrays

As discussed in the xml.com Slack workspace's xpath-ng channel, there is interest in extending the XQuery Update Facility to allow manipulation of maps and arrays—in effect, to facilitate the editing of large, deep JSON documents.

For example, @DrRataplan provided this use case (the first code snippet can be viewed at fontoxml's playground):

I think XQUF for JSON may have its merit. Editing larger JSON documents using XQuery is not the most elegant. I mean, in JavaScript, changing a value in a deep map is theMap['key']['deeperKey'].push(42). In XPath, it is more like:

$theMap 
=> map:put('key', $theMap?key)
=> map:put('deeperKey', array:append($theMap?key?deeperKey, 42)))

In XQUF terms, I think this would look a bit like:

insert 42 as last into $theMap?key?deeperKey

... which is at least a lot shorter.

At some point when working on a project that tried to edit some JSON metadata objects in XQuery I implemented a function that accepted a map, a path of keys, a value and some semantics, such as inserting at the start vs. at the end. It did not work too great in the end and we went for JavaScript functions instead. Just too explicit and hard to debug.

See also this discussion at StackOverflow, where a user was struggling to use map:put or map:remove on deeper entries in a map; asked, "Is XQuery 3.1 designed for advanced JSON editing?"; and worried that XQuery "might not be the right choice" for his use case. Highlights from the responses:

@michaelhkay wrote:

You're correct that doing what I call deep update of a map is quite difficult with XQuery 3.1 (and indeed XSLT 3.0) as currently defined. And it's not easy to define language constructs with clean semantics. I attempted to design a construct as an XSLT extension instruction - see https://saxonica.com/documentation10/index.html#!extensions/instructions/deep-update -- but I don't think its anywhere near a perfect solution.

@ChristianGruen wrote:

Updates primitives had been defined for JSONiq (https://www.jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#section-json-updates), but I believe they haven’t made it into the reference implementation. They could also be considered for XQuery 4.

@michaelhkay responded:

If I'm not mistaken, maps in JSONiq have object identity, which is not true of XQuery maps (which are pure functional data structures). That makes the semantics of deep update much easier to define, but makes it more difficult to make simple operations such as put() and remove() efficient.

In Slack @liamquin also wrote:

the proposals i've seen for this in the past required that maps and arrays be given identity in some way, but then you have the problem that e.g. map:insert returns a new map, which is not how an XQuery update expression works

@jonathanrobie also wrote:

Yes, but the first question is this: how much will is there to support JSON updates in XQuery update?

I would love to have this. I no longer work for an implementation of XQuery.

@adamretter added:

Sounds like a nice idea

Issue #76 created #created-76

24 May at 22:00:46 GMT
non-deterministic time

The current-date/Time functions are deterministic, so they always return the same time, which is very confusing to everyone

There could be actual-current-date/Time functions that return the actual time non-deterministically. Or call it wall-date/Time or system-date/Time

Issue #75 created #created-75

14 May at 20:10:41 GMT
Support processing HTML 5 template element content

The Problem

The HTML 5 specification introduces a template element [1], [2] where the content of that element doesn't represent children of it, but are part of a content property. The root node of the content property is a DocumentFragment which is a light-weight document node. These specifications provide some non-normative guidelines for interacting with XSLT and XPath [3].

The DocumentFragment interface is defined in the HTML DOM 4.1 [4] as an instance of a Node. Within the HTML 5 specification, it is only referenced in relation to the template element.

This affects the proposed fn:parse-html (issue #74) function as well as databases and query processors that support storing and accessing HTML5 content via fn:doc and other APIs.

Requirements

  1. Accurately represent the contents of the template element in the DOM/data model.
  2. Allow a conforming implementation to process the template content as if it was XML content -- i.e. using the child:: axis to access the content.
  3. Allow a conforming implementation to process the template content separately from child content -- e.g. if the implementation has support for the HTML DOM.
  4. Allow authors to select the content of a template element.
  5. Minimize changes to the data model specification. [*]

[*] I don't believe it is possible to support this without some changes to the data model (see the Design section below).

Design

Storing the content of the template

There are 3 options to handling the content of a template element.

1. As children

Store the content as child elements of the template element.

This is how conforming processors that only understand XML content will process and view the document.

2. As a document node

Store the content as children of a document node, where the parent of the document node is the template element.

This would be the minimal amount of changes needed to make the HTML5 model work. The only change I can see is that this won't conform to section 6.1.2 Accessors of the data model, in that:

dm:parent Returns the empty sequence

becomes:

dm:parent If this is a document fragment for a template element, returns the template element. Otherwise, returns the empty sequence.

Implementors using the HTML DOM would need to map DocumentFragment nodes to document-node().

3. As a new document-fragment node

Store the content as children of a new document-fragment node type, where the parent of the document-fragment node is the template element.

This is the option that is most compatible with the HTML DOM as it mirrors the DocumentFragment interface from that, but is also the one that is the most invasive. It will require (among other things):

  1. Defining rules in section 6. Nodes of the data model for Document Fragment Nodes -- accessors, construction from infoset and PSVI, and infoset mapping.
  2. Adding a new document-fragment() KindTest to the supported node/item types.
  3. Adding subtype-itemtype rules for the document fragment nodes.
  4. Adding a new document-fragment { ... } computed constructor for XQuery.

Selecting template content

A new forward axis should be added that supports selecting fragment nodes. Some of the possible names include:

  1. fragment:: -- following the pattern defined by the attribute:: axis; or
  2. content:: -- following the nomenclature from the HTML specification for the template element contents.

The behaviour will depend on which of the 3 options above is selected for storing the content type:

  1. If an implementation only supports XML (option 1), the new axis will work the same as child::. The principle node kind is element.
  2. If option 2 is chosen (reuse the document node), the new axis will match document nodes whose parent is a template element. The principle node kind is document. Note: This has an ambiguity with the reverse axes, as it is checking the parent of the node as well as the node type.
  3. If option 3 is chosen (create a document fragment node), the new axis will match any document fragment nodes. The principle node kind is document fragment. Note: This makes more sense when the fragment:: name is used for the axis, and would be more generally applicable, such as for computed constructor created fragments, or HTML DocumentFragments created from a JavaScript or web browser XPath/XSLT/XQuery binding such as Saxon-JS.

References

[1] https://www.w3.org/TR/html52/semantics-scripting.html#the-template-element [2] https://html.spec.whatwg.org/#the-template-element [3] https://www.w3.org/TR/html52/semantics-scripting.html#interaction-of-template-elements-with-xslt-and-xpath [4] https://www.w3.org/TR/dom41/#documentfragment

Issue #74 created #created-74

14 May at 15:09:38 GMT
[FO] Support parsing HTML

It is common for applications that use an XQuery database engine to want to parse HTML documents when adding content from HTML pages into a database, or in other applications like generating epub documents from HTML source files. Vendors like MarkLogic (xdmp:tidy via HTML Tidy for HTML4), BaseX (html:parse via TagSoup), Saxon (saxon:parse-html via TagSoup), and eXist-db (util:parse-html via Neko) have provided custom methods to support this.

Q: Should there also be functions to list the supported methods and character encodings?

fn:parse-html

Summary

Parses HTML-based input into an XML document.

Signature

fn:parse-html($input as union(xs:string, xs:hexBinary, xs:base64Binary),
              $options as map(*) := map { "method": "html5" }) as document-node()

Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The $options map conforms to record(method as union(enum("html5"), xs:string), encoding as xs:string?, *). A vendor may provide ·implementation-dependent· options that may vary between the different method values.

The method property of $options defines the approach used to convert the HTML document to XML. This specification supports html5 for using the HTML5 parsing rules for HTML content. The exact version of HTML5 used is ·implementation-dependent·.

The encoding property of $options defines the character encoding used to decode binary data. By default, this is an empty sequence. Implementations must support at least utf-8, utf8, ascii, and latin1. Other encoding values are ·implementation-dependent·, but it is recommended that the encodings documented in the WHATWG Encoding specification [3] are supported.

If $input is an xs:string, no character decoding is performed as the input is already decoded.

If $input is an xs:hexBinary or xs:base64Binary, the character encoding used to decode the binary data is determined as follows:

  1. if the binary data has a valid Unicode Byte Order Mark (BOM), the character encoding specified by that BOM is used.
  2. if encoding is specified in $options, that value is used;
  3. if prescanning the first 1024 bytes of data contains a character encoding (using the rules from https://html.spec.whatwg.org/multipage/parsing.html#prescan-a-byte-stream-to-determine-its-encoding), the detected encoding is used;
  4. if ·implementation-dependent· heuristics (in line with the HTML5 rules) detect a character encoding, that encoding is used;
  5. otherwise, the encoding is "utf-8".

If the detected character encoding name is not supported, an FO###### error is raised. Otherwise, the character encoding method associated with the character encoding is used.

If the parsing method is not supported, an FO###### error is raised.

The $input is then parsed according to the specified parsing method, building an intermediate HTML Document object. The XML document-node is then constructed by mapping the HTML document, element, attribute, text, and comment nodes to their XML equivalents.

If a HTML document contains a template element, the contents of that element are added as children of the template element. It is ·implementation-dependent· whether or not a processor ignores this content when evaluating path expressions on these template elements, and how they are represented in any DOM interfaces.

Notes

The character encoding logic follows the https://html.spec.whatwg.org/multipage/parsing.html#encoding-sniffing-algorithm rules.

HTML does not support processing instructions. They are treated as comments in the HTML5 specification.

The HTML template element is complex as the HTML specification defines its content as being part of a separate document that is associated with the template contents property of that element, not its children. The WHATWG specification provides a non-normative guide for XSLT and XPath interacting with these elements (https://html.spec.whatwg.org/#template-XSLT-XPath).

A conforming implementation may choose to parse and return the HTML into a HTML-based data model (e.g. the HTML DOM) instead of generating an XML infoset or PSVI. This is valid as long as the accessor functions (https://www.w3.org/TR/xpath-datamodel-31/#accessors) and the various syntax that works with XML nodes also works for the HTML nodes. That is, expressions like $html/html/body/p instance of element(p) are supported.

Examples

The expression fn:parse-html("<html>") returns an empty html document constructed using the HTML5 document construction rules.

The expression fn:parse-html($html, encoding: "latin2") uses the latin2 character encoding to parse $html, or generates an FO###### error if the processor does not support that encoding.

The expression fn:parse-html($html, method: "html5", encoding: ()) is equivalent to fn:parse-html($html).

The expression fn:parse-html($html, method: "tidy") uses the tidy method (e.g. from the HTML Tidy application) to parse $html into an XML document if supported by the implementation. Otherwise an FO###### error is raised.

The expression fn:parse-html($html, method: "tagsoup", nons: true()) uses the tagsoup method (e.g. from the TagSoup application) to parse $html into an XML document if supported by the implementation, passing the --nons attribute. Otherwise an FO###### error is raised.

References

  1. HTML 5.2, W3C.
  2. HTML Living Standard, WHATWG.
  3. Encoding Living Standard, WHATWG.

Issue #73 created #created-73

07 May at 07:33:59 GMT
Split a string by graphemes

The new fn:characters function is useful, but doesn't solve a problem of manipulating strings where multiple codepoints correspond to a single grapheme. For example:

  1. characters with one or more combining characters;
  2. emoji with skin tone variant selectors;
  3. emoji with gender variant selectors;
  4. multi-sequence emoji -- family, wales flag, etc.;
  5. region indicator pairs for flags.

Getting this right is complex, and implementing it as a regular expression is easy to get wrong/make mistakes.

fn:graphemes

Summary

Splits the supplied string into a sequence of single-grapheme (one or more character) strings.

Signature

fn:graphemes($value as xs:string?) as xs:string*

Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

The function returns a sequence of strings, containing the corresponding ·grapheme· in $value. These are determined by the corresponding Unicode rules for what constitutes a ·grapheme·. The version of Unicode and the Unicode Emoji standards is ·implementation-dependent·.

If $value is a zero-length string or the empty sequence, the function returns the empty sequence.

Examples

The expression fn:graphemes("Thérèse") returns ("T", "h", "é", "r", "è", "s", "e"), irrespective of whether the e characters use combining characters or not.

The expression fn:graphemes("") returns ().

The expression fn:graphemes(()) returns ().

The expression fn:graphemes("👋🏻👋🏼👋🏽👋🏾👋🏿") returns ("👋🏻", "👋🏼", "👋🏽", "👋🏾", "👋🏿").

The expression fn:graphemes("👪") returns ("👪").

The expression fn:graphemes("👨‍🔬👩‍🔬") returns ("👨‍🔬", "👩‍🔬").

The expression fn:graphemes("🇪🇪🇩🇪🇫🇷🏴󠁧󠁢󠁷󠁬󠁳󠁿🇮🇸") returns ("🇪🇪", "🇩🇪", "🇫🇷", "🏴󠁧󠁢󠁷󠁬󠁳󠁿", "🇮🇸").

Issue #72 created #created-72

04 May at 07:38:50 GMT
[FO] Provide better support for URI processing within an expression

Use Case 1: Decode an encoded URI string.

This is difficult to implement correctly, and is a commonly asked question/request on sites like stackoverflow. Vendors have even implemented their own functions, like xdmp:uri-decode.

Use Case 2: Extracting the hash/parameters from a URI string.

This is common when manipulating URI strings and not using something like RESTXQ to bind the query parameters to function parameters. The API should:

  1. extract the hash as a string, and the parameters as a name/value map;
  2. combine parameters with the same name into the same map entry;
  3. decode the values where necessary.

Use Case 3: Extract the other parts of a URI string.

This can be useful if writing a RESTXQ or similar implementation in XSLT/XQuery. It can also be useful for generating response headers such as Origin, or doing HTTP to HTTPS redirects.

It is easy to make mistakes and the wrong assumptions when writing a URI parser by hand. Additionally, it is not easy to implement in XSLT/XQuery as functions like analyse-string and tokenize are not powerful enough to implement a lexer, and manipulating codepoints is difficult without stateful logic.

Issue #71 created #created-71

13 Apr at 08:42:38 GMT
[XSLT] Use of multiple predicates: order of evaluation

I notice I added an example pattern to the draft XSLT4 spec match=".[. castable as xs:date][xs:date(.) le current-date()]" which is incorrect because processors are allowed to change the order of predicates, so you can't use the first predicate as a guard to stop the second predicate throwing an error. I've seen users fall over this (Saxon does sometimes reorder predicates). My instinct is to ban reordering of predicates; if you want to allow it, you can use the "and" operator. An alternative would be an "and" operator (say "and-also") with explicit ordering semantics, as in XPath 1.0.

Issue #70 created #created-70

12 Apr at 16:47:00 GMT
[FO] Built-in function changes to support default values

This issue tracks the changes needed to the built-in functions to allow them to combine the declarations into a single definition with default parameter values.

The general approach to this is to make required arguments optional if they are for a function signature that is not the lowest argument count signature, and move any associated logic into the function.

array:subarray

  1. Change array:subarray/$length from xs:integer to xs:integer?.

Rules

Except in error cases, the result of the function is the value of the expression op:A2S($array) => fn:subsequence($start, $length) => op:S2A().

Error Conditions

A dynamic error is raised [err:FOAY0001] if $start is less than one or greater than array:size($array) + 1.

A dynamic error is raised [err:FOAY0002] if $length is not an empty sequence and is less than zero.

A dynamic error is raised [err:FOAY0001] if $length is not an empty sequence and $start + $length is greater than array:size($array) + 1.

fn:concat

This should be modified to use a sequence-variadic signature, either as a 1 parameter function (taking an xs:anyAtomicType* value, allowing 0 and 1 arguments), or a 3 parameter function with the last parameter having the type xs:anyAtomicType*.

fn:differences

The $options parameter should be moved to the end of the parameter list in order to make the function a map-variadic function when default values are applied. This then makes it possible to specify the collation argument using a keyword argument in addition to specifying options as keyword arguments.

fn:resolve-uri

  1. Change fn:resolve-uri/$base from node() to node()?.
- If the $base argument is not supplied,
+ If the $base argument is the empty sequence,

fn:subsequence

  1. Change fn:subsequence/$length from xs:double to xs:double?.

When $length is the empty sequence, this function returns:

$input[fn:round($start) le position()]

When $length is not the empty sequence, this function returns:

$input[fn:round($start) le position() 
         and position() lt fn:round($start) + fn:round($length)]

fn:substring

  1. Change fn:subsequence/$length from xs:double to xs:double?.

More specifically, when $length is not the empty sequence the function returns the characters in $value whose position $p satisfies:

fn:round($start) <= $p and $p < fn:round($start) + fn:round($length)

When $length is the empty sequence the function assumes that $length is infinite and thus returns the ·characters· in $value whose position $p satisfies:

fn:round($start) <= $p

fn:tokenize

  1. Change fn:tokenize/$pattern from xs:string to xs:string?.

If $pattern is the empty sequence, the $value argument is set to fn:normalize-space($value) and $pattern is set to ' '.

fn:unparsed-text / fn:unparsed-text-available

  1. Change fn:unparsed-text/$encoding from xs:string to xs:string?.
  2. Change fn:unparsed-text-available/$encoding from xs:string to xs:string?.

fn:unparsed-text-lines

  1. Change fn:unparsed-text-lines/$encoding from xs:string to xs:string?.

The result of the function is the same as the result of the expression fn:tokenize(fn:unparsed-text($href, $encoding), '\r\n|\r|\n')[not(position()=last() and .='')].

Collations

  1. Change fn:collation-key/$collation from xs:string to xs:string?.
  2. Change fn:compare/$collation from xs:string to xs:string?.
  3. Change fn:contains/$collation from xs:string to xs:string?.
  4. Change fn:contains-token/$collation from xs:string to xs:string?.
  5. Change fn:deep-equal/$collation from xs:string to xs:string?.
  6. Change fn:differences/$collation from xs:string to xs:string?.
  7. Change fn:distinct-values/$collation from xs:string to xs:string?.
  8. Change fn:ends-with/$collation from xs:string to xs:string?.
  9. Change fn:index-of/$collation from xs:string to xs:string?.
  10. Change fn:max/$collation from xs:string to xs:string?.
  11. Change fn:min/$collation from xs:string to xs:string?.
  12. Change fn:starts-with/$collation from xs:string to xs:string?.
  13. Change fn:substring-after/$collation from xs:string to xs:string?.
  14. Change fn:substring-before/$collation from xs:string to xs:string?.
  15. Change fn:uniform/$collation from xs:string to xs:string?.
  16. Change fn:unique/$collation from xs:string to xs:string?.

Passing the empty sequence to the $collation argument is equivalent to supplying the default collation to that argument.

Issue #69 created #created-69

12 Apr at 16:46:46 GMT
fn:document, fn:function-available: default arguments

This issue tracks the changes needed to the built-in functions to allow them to combine the declarations into a single definition with default parameter values.

The general approach to this is to make required arguments optional if they are for a function signature that is not the lowest argument count signature, and move any associated logic into the function.

fn:document

  1. Change fn:document/$base-node from node() to node()?.
- If $base-node is supplied,
+ If $base-node is not empty,

fn:function-available

  1. Change fn:function-available/$arity from xs:integer to xs:integer?.

If $arity is the empty sequence, the function-available function returns true if and only if there is at least one available function (with some arity) whose name matches the value of the $name argument.

If $arity is not the empty sequence, the function-available function returns true if and only if there is an available function whose name matches the value of the $function-name argument and whose arity matches the value of the $arity argument.

Issue #68 closed #closed-68

12 Apr at 14:25:35 GMT

Don't attempt to upgrade the host

Pull request #68 created #created-68

12 Apr at 14:25:29 GMT
Don't attempt to upgrade the host

The CI script shouldn't attempt to upgrade the host. CircleCI have customized some of the packages so upgrading doesn't work. And it shouldn't really be necessary anyway.

Issue #67 created #created-67

09 Apr at 09:59:18 GMT
Allow optional parameters and keyword arguments on map and sequence variadic functions.

These proposed draft changes seek to address the following issues with, and limitations of, the current draft specification:

  1. A %variadic("sequence") function where the sequence type uses the + occurrence indicator should not have an implicit default value. That would mean passing () to the sequence, which would generate a coercion error.
  2. Map-variadic and sequence-variadic functions cannot have user-specified default parameter values with the current draft wording. In this case the map/sequence last parameter need to be given a default in the function declaration. This allows those to be defaulted to something other than an empty map/sequence, as well as specifying the defaults for other parameters (e.g. in the case where a map is the last of several parameters).
  3. It should be possible to allow parameters to be specified as keyword arguments for map-variadic functions. For map-variadic functions, a keyword argument will be bound to a parameter if it matches the parameter, or added to the map if not.

Design Note:

It would be nice to support keyword arguments for sequence-variadic functions. The other design notes detail a possible way to implement this logic. This would resolve issue #26, and make the features (keyword arguments in this case) usable in all cases.

Proposal

There are two orthogonal concepts related to variadic functions:

  1. arity bounds -- the number of required and optional parameters a function has;
  2. variadic type -- how the function behaves in relation to its last parameter.

Arity Bounds

[Definition: The declared arity of a function is the number of parameters defined in the function declaration.] The declared arity includes both required and optional parameters.

[Definition: An optional parameter is a parameter with a default value.] The default value may either be specified in the function declaration, or determined by the logic described below.

[Definition: A declared optional parameter is an optional parameter specified in the function declaration.] TODO: Define a syntax for specifying declared optional parameters. [Note: see issue #64 for a proposal on doing this.]

The property A is the declared arity of a function.

The property D is the number of optional parameters. This is determined as follows:

  1. If there are any declared optional parameters, D is the number of declared optional parameters.
  2. If the last parameter is a MapTest or RecordTest, D is 1.
  3. If the last parameter is a sequence type with a minimum item occurrence of 0 (e.g. using the * occurrence indicator), D is 1.
  4. If none of the above apply, D is 0.

The property R is the number of required parameters, and is determined by evaluating A-D.

Variadic Type

The variadic type is given by the %variadic(enum("no", "map", "sequence")) annotation. It is determined as follows:

  1. If the last parameter is a MapTest or RecordTest, %variadic("map") is specified.
  2. If the last parameter is a sequence type with an unbounded maximum item occurrence (e.g. using the * or + occurrence indicator), %variadic("sequence") is specified.
  3. If none of the above apply, %variadic("no") is specified.

[Definition: The variadic parameter of a function refers to the last parameter of a map-variadic or sequence-variadic function.]

The values of the MinA/MaxA, MinP/MaxP, and MinK/MaxK properties are given by the following table, where A and R are defined in the arity bounds section.

| %variadic | MinA | MaxA | MinP | MaxP | MinK | MaxK | |----------------|------|-----------|------|-----------|------|-----------| | no | R | A | 0 | A | 0 | A | | map | R | unbounded | 0 | A | 0 | unbounded | | sequence | R | unbounded | R | unbounded | 0 | 0 |

For %variadic("no") and %variadic("map") functions, positional and keyword arguments can be mixed, or the arguments can be specified as either all positional arguments, or all keyword arguments.

Note:

If a keyword argument has the name of the variadic parameter for a map-variadic function, it is used to specify the value of that map, and not a key in a constructed map. In this case, the other keyword arguments must specify parameter names as the value of the variadic parameter has already been specified, and would result in a conflicting value if any of the keyword arguments were specifying keys in the variadic parameter.

For %variadic("sequence") functions, only positional parameters are allowed.

Design Note:

Keyword arguments could be supported for sequence-variadic functions if the presence of a keyword argument makes it function like %variadic("no"). That is, it is not unbounded in this case. This would work, as keyword arguments occur after positional arguments, and the variadic parameter would need to be specified as a keyword argument.

The tricky part of this is that MinA/MaxA would no longer be statically determinable, in that they would depend on whether the function call used keyword arguments.

The sequence row would be modified as follows:

| %variadic | MinA | MaxA | MinP | MaxP | MinK | MaxK | |----------------|------|-----------|------|-----------|------|-----------| | sequence | R | variable [1] | 0 | unbounded | 0 | A |

[1] If the function call has at least one keyword argument, MaxA is A. Otherwise, MaxA is unbounded.

Evaluating Static Function Calls

...

  1. Positional argument values are mapped to parameters in the function declaration as follows: Let the number of declared parameters be N.

    1. A positional argument with position M, (M < N) corresponds to the parameter in position M.
    2. For sequence-variadic functions, the values of arguments in positions greater than or equal to N are concatenated into a sequence, and the resulting sequence is supplied as the value of parameter N. If there are no such arguments (that is, if N-1 arguments are supplied), then the value supplied for parameter N is an empty sequence.
  2. Keyword argument values are mapped to parameters in the function declaration as follows: Let the keyword corresponding to a keyword argument be K.

    1. If there is a parameter with name K, the keyword argument corresponds to the named parameter K.
    2. For map-variadic functions, the keyword argument is assembled into a map. For each keyword argument, the map has an entry whose name is the keyword (as an instance of xs:string) and whose corresponding value is the argument value.
    3. For non-variadic functions, an XPST#### error is raised if there is no parameter with name K.

Design Note:

If supporting keyword arguments for sequence-variadic functions, 4/iii would handle them. That is, an error is raised if the keyword name does not match a parameter name.

  1. If no argument corresponds to a parameter in the function declaration:

    1. If the parameter has a default value, then that value is used. TODO: define how the default value is evaluated, i.e. what context is used.
    2. For sequence-variadic functions, the value supplied for parameter N is an empty sequence.
    3. For map-variadic functions, the value supplied for parameter N is the map constructed in step 4. If no keyword arguments were used to construct the map, and empty map is used.
    4. If none of the above apply, an XPST#### error is raised.
  2. If more than one argument corresponds to a parameter in the function declaration, an XPST#### error is raised.

...

Issue #66 created #created-66

30 Mar at 18:28:55 GMT
ThinArrowTarget should use FunctionBody

For consistency with FunctionDecl and InlineFunctionExpr (both of which use FunctionBody for the function body instead of EnclosedExpr), ThinArrowTarget should also use FunctionBody for the inline function call version (e.g. 2 -> { . + 1 }):

ThinArrowTarget ::= "->" ( (ArrowStaticFunction ArgumentList) |
                           (ArrowDynamicFunction PositionalArgumentList) |
                           FunctionBody )

Issue #65 created #created-65

29 Mar at 17:13:45 GMT
Support using different input/output element namespaces

Use Case

There have been requests for specifying the output namespace in XQuery akin to the @xpath-default-namespace element in XSLT. With the element and type namespaces now being able to be set independently, it would be a good idea to make this change as well, splitting the input and output default XML namespaces.

Grammar

DefaultNamespaceDecl ::= "declare"  "default"  ((("input" | "output")? "element")  |  "type"  |  "function")
                         "namespace"  URILiteral

New Semantics

The default element namespace static context item is split into a default input element namespace that applies to input element contexts (e.g. path steps), and a default output element namespace that applies to output element contexts (e.g. direct/constructed elements).

The scope of the default namespace declaration is the element, function, input element, output element, or type namespace specified in the declaration.

A default namespace declaration with the element scope will set any of the input element, output element, and type namespaces that have not been set by a corresponding input element, output element, or type scoped default namespace declaration.

Example:

Given declare default input element namespace "A"; declare default element namespace "B";, the output element and type namespaces will be specified by the element scope default namespace declaration "B", and the input element namespace will be specified by the input element scope default namespace declaration "A".

TODO

Map all element symbols/contexts as using either the input element or output element default namespace for NCName EQNames.

Issue #64 created #created-64

12 Mar at 08:55:43 GMT
Specify optional parameters to create bounded variadic functions

The current Editor's Draft for XPath and XQuery define a %variadic("bounded") function type, but does not define a syntax for specifying these.

Grammar

ParamList ::= RequiredParamList ( "," OptionalParamList )?
RequiredParamList ::= Param ("," Param)*
Param ::= "$" EQName TypeDeclaration?
OptionalParamList ::= OptionalParam ("," OptionalParam)*
OptionalParam ::= Param ":=" ExprSingle

Note:

I've followed the structure of positional and keyword arguments here, so the optional parameters are only valid at the end of the function. If it is decided that optional parameters can be declared anywhere in the parameter list, the grammar simplifies to:

ParamList ::= Param ("," Param)*
Param ::= "$" EQName TypeDeclaration? ( ":=" ExprSingle )?

Semantics

[Definition: a parameter is an optional parameter if it has a default value specified using the := ExprSingle syntax.] Optional parameters affect the value of R (the number of parameters that do not have a default value) in the 4.4.1 Static Functions section.

Notes

There are open questions on what to allow in the default value expression. Specifically, how to support things like the context item for functions such as fn:data#0 that use the context item if not specified (e.g. when used at the end of a path expression).

An investigation should be done on the standard functions and vendor built-in functions to see what values they take as defaults.

Issue #63 created #created-63

02 Mar at 16:02:01 GMT
fn:slice, array:slice: Signatures, Examples

EDIT: 1. is obsolete, 2. and 3. are still up-to-date:

1. The current specification for fn:slice has only one signature.

It might be recommendable to also provide signatures with 1 and 2 arguments (especially for users who don’t want to use the new syntax for specifying optional arguments).

2. The last examples look wrong; I would expect the input as results:

The expression fn:slice(("a", "b", "c", "d"), 0) returns (). The expression array:slice(["a", "b", "c", "d"], 0) returns [].

3. The first argument of array:slice should be renamed from $input to $array.

Issue #62 created #created-62

19 Feb at 13:04:37 GMT
[FO] The parameter types for fn:unique and array:partition are incorrectly specified.
  1. In both signatures of fn:unique the $values parameter has the type xs:anyAtomicType** which should be xs:anyAtomicType*.
  2. In array:partition the $input parameter is item(*)* which should be item()*.

Issue #61 created #created-61

19 Feb at 13:01:28 GMT
[FO] fn:all and fn:some have an xs:integer* return type, but describe an xs:boolean return type

The fn:all function states:

The result of the function is true if and only if the expression every $i in $input satisfies $predicate($i) is true.

but the return type is specified as xs:integer*. -- It should have a return type of xs:boolean.

A similar issue occurs with fn:some.

Issue #60 created #created-60

19 Feb at 12:58:38 GMT
[FO] fn:namespace-uri-for-prefix no longer supports passing a prefix by string

The type signature of the $prefix variable has changed from xs:string? to union(xs:NCName, enum(''))?. This means that passing a prefix like "fn" will no longer work as it is not an xs:NCName and is not a zero-length string (enum('')).

Note: The only other affected function is the new fn:in-scope-namespaces method. It would be useful in some cases to be able to pass the value as an xs:string (e.g. "fn") without having to cast the value.

Issue #59 created #created-59

19 Feb at 12:48:10 GMT
[FO] fn:replace no longer has the 3 an 4 argument variants

The signature for fn:replace in FO 4.0 [1] only has the new 5 argument variant, whereas FO 3.1 has 3 and 4 argument variants.

  1. https://qt4cg.org/branch/master/xpath-functions-40/Overview-diff.html#func-replace
  2. https://www.w3.org/TR/xpath-functions-31/#func-replace

Issue #58 created #created-58

15 Feb at 12:06:08 GMT
[XQuery] String Value Templates

A string value template (SVT) is a StringLiteral that supports enclosed expression values and entities. It is written as either T"..." or T'...', where the T stands for "template".

Note: An SVT is similar to an attribute value template or text value template in XSLT.

For instance, the following expression:

for $s in ("one", "two", "red", "blue")
return T"{$s} fish"

evaluates to the sequence ("one fish", "two fish", "red fish", "blue fish").

Note: A string value template T"xyz" is equivalent to the expression <svt t="xyz"/>/@t/string().

Grammar

PrimaryExpr ::= ... | StringValueTemplate
StringValueTemplate ::= ('T"' (EscapeQuot | QuotAttrValueContent)* '"')
                      | ("T'" (EscapeApos | AposAttrValueContent)* "'")

Note: The T" and T' are a single token/unit (i.e. no whitespace/comments are allowed between the characters), just like the Q{ in BracedURILiterals.

Issue #57 created #created-57

04 Feb at 21:09:21 GMT
The item-type(T) syntax is not defined

Section 3.7.2 The judgement subtype-itemtype(A, B) of the XPath 4.0 and XQuery 4.0 specifications mention item-type(N), as does section 5.19 Item Type Declarations of the XQuery 4.0 specification. It is also not in the EBNF grammar -- searching for "item-type" only finds the ItemTypeDecl symbol in the XQuery 4.0 EBNF.

This should be defined in section 3.6 Item Types.

Issue #56 created #created-56

04 Feb at 20:59:19 GMT
Allow item-type to be matched within its definition scope

In https://qt4cg.org/branch/master/xpath-functions-40/Overview-diff.html#func-random-number-generator, the rng item type is defined as:

record(
    number   as xs:double,
    next     as (function() as record(number, next, permute, *)),
    permute  as (function(item()*) as item()*),
    *
)

It would be helpful and more type specific if this could be defined as:

record(
    number   as xs:double,
    next     as (function() as rng),
    permute  as (function(item()*) as item()*),
    *
)

where the next field references the rng type being defined -- this is like how structures in other languages (C/C++, Java, C#) can reference themselves as property types.

This would also provide an alternative for the .. (self reference) specifier.

Note: the .. syntax is still useful in the case of anonymous record types.

Issue #55 created #created-55

04 Feb at 19:45:01 GMT
Provide an XML version of the stack trace

While the string version of fn:stack-trace() is useful for debugging and including in log messages, being able to process that (from an XML representation) is also useful.

Use Cases

  1. providing extended functionality, like implementing a current-function-name() helper function -- e.g. fn:stack-trace("json")[1]?function-name;
  2. customizing the format of the stack trace (e.g. standardizing it across different implementations);
  3. using the information in libraries/IDEs/editors that call the queries -- e.g. by returning the XML and processing it in the library/IDE/editor, such as mapping the data to stack frames in the IDE/editor. Note: This is what I'm doing in my IntelliJ plugin with the MarkLogic stack XML to process query exceptions and the stack when debugging a query.

fn:stack-trace

fn:stack-trace($format as enum("text", "xml", "json") := "text") as item()

Like the current specification version of this function (with the same default semantics), but also supports XML and JSON formats. The "text" format returns an instance of xs:string in an implementation-defined format, the "xml" format returns an instance of element(fn:stack-trace), and the "json" format returns an instance of array(fn:stack-frame).

Here, fn:stack-frame is defined as:

declare type fn:stack-frame as record(
    uri: xs:string,
    function-name: xs:QName?,
    line-number: xs:integer?,
    column-number: xs:integer?,
    *
);

The XML version has the same information as elements in the fn: namespace (e.g. fn:uri).

fn:format-stack-trace

fn:format-stack-trace($stack as item(),
                      $format as enum("text", "xml", "json") := "text") as item()

If $stack is an instance of element(fn:stack-trace), it is converted into the desired output format. (If the output format is "xml", no processing is performed.)

If $stack is an instance of array(fn:stack-frame), it is converted into the desired output format. (If the output format is "json", no processing is performed.)

Otherwise, an err:XPTY0004 error is raised.

fn:parse-stack-trace

fn:parse-stack-trace($stack as xs:string,
                     $format as enum("xml", "json")) as item()

This function takes a stack trace in the implementation-defined format and parses it to XML or JSON. The "xml" format returns an instance of element(fn:stack-trace), and the "json" format returns an instance of array(fn:stack-frame).

If $stack is not in the correct format, an error (error code TBD) is raised.

Note: This could be useful when processing log messages or similar output.

Issue #54 created #created-54

26 Jan at 08:55:08 GMT
[XPath] [XQuery] Keyword arguments don't work with all parameters/keys in static functions.

The KeywordArgument symbol restricts the argument name to an NCName. This has two issues:

  1. for non-variadic and bounded-variadic functions, a parameter can be a QName, so may be in a different namespace, or there can be ambiguity if there are multiple parameters with the same local-name in different namespaces;
  2. for map-variadic functions, a parameter key can contain spaces, so cannot be expressed as an NCName.

Syntax

KeywordArgument ::= KeywordArgumentName  ":"  ExprSingle
KeywordArgumentName ::= EQName | StringLiteral

NOTE: I'm using the favoured map-based syntax here. If that is not used, then the ":" should be ":=" as it is in the current draft.

Semantics

For non-variadic and bounded-variadic functions, a KeywordArgumentName is matched as follows:

  1. An EQName matches against the expanded QName of the parameter;
  2. A StringLiteral is cast to an NCName (with an XPTY0004 error if it is not a valid NCName), which is in no namespace (like other variables such as VarName symbols); the resulting expanded QName then matches against the expanded QName of the parameter.

For map-variadic functions, a KeywordArgumentName is matched as follows:

  1. An NCName uses the local-name as the key in the constructed map cast to the key type of the map. This follows the XQFO casting rules with the source type of the local-name being xs:NCName and the target type being the map's key type.;
  2. A QName or URIQualifiedName results in an XPTY0004 error as it does not form a valid key name;
  3. A StringLiteral uses the value of the string as the key in the constructed map.

Issue #53 created #created-53

23 Jan at 10:43:02 GMT
Allow function keyword inline functions without parameters

The current draft InlineFunctionExpr adds -> as a shorthand. This shorthand allows optional parameter lists (e.g. -> { true() }), but the function keyword version of this requires a parameter list. For consistency, the function keyword version should also have an optional parameter list.

This means that the syntax for InlineFunctionExpr can be simplified to:

InlineFunctionExpr ::= ("function" | "->")  FunctionSignature?  FunctionBody

Update: From recent discussions, the -> operator as both a thin arrow expression and an inline function definition is confusing. As such, a replacement for -> in the inline function context should be identified.

In the context of the variant without a parameter definition (e.g. when used with arrow operators), the question is how should it work. I suggest:

  1. it should be a 0 and 1 arity function with the parameter argument defaulting to ();
  2. if the parameter is a single value, it should bind to the . (context item) and ~ (context value -- https://github.com/qt4cg/qtspecs/issues/129);
  3. if the parameter is an empty sequence, or multi-valued sequence, it should bind to the ~ (context value -- https://github.com/qt4cg/qtspecs/issues/129) only.

This way, it will be usable in multiple contexts.

Issue #52 created #created-52

21 Jan at 12:29:14 GMT
Allow record(*) based RecordTests

The other ItemTypes that support specifying information about the type allow type(*) to represent any instance of the type. The new RecordTest ItemType should support this.

Syntax

The:

RecordTest ::= "record"  "("  FieldDeclaration  (","  FieldDeclaration)*  ExtensibleFlag?  ")"

symbol should be changed to:

RecordTest ::= AnyRecordTest | TypedRecordTest
AnyRecordTest ::= "record"  "("  "*"  ")"
TypedRecordTest ::= "record"  "("  FieldDeclaration  (","  FieldDeclaration)*  ExtensibleFlag?  ")"

NOTE: This follows the structure of the other any/typed tests (e.g. MapTest).

Semantics

The record(*) item type test is equivalent to map(*).

Issue #51 created #created-51

18 Jan at 11:07:46 GMT
Generalize lookup operator for function items

The current lookup operator is a specialized expression for maps and arrays. All kinds of data structures can be realized with functions, and maps and arrays are functions as well, so it would be pretty straightforward to extend the lookup operator to arbitrary function items:

Use Cases

Return name elements whose string values contain supplied substrings

declare variable $DOC := <xml>
  <name>Jack Daniels</name>
  <name>Jim Beam</name>
  <name>Johnny Walker</name>
</xml>;

let $names := function($key) {
  $DOC//name[contains(string(), $key)]
}
return $names?('Jack', 'Jim', 'Johnny')

(: result :)
<name>Jack Daniels</name>,
<name>Jim Beam</name>,
<name>Johnny Walker</name>

Return squares of supplied integers

let $square := math:pow(?, 2)
return $square?(1 to 5)

(: result :)
1, 4, 9, 16, 25

Remarks

  • XPTY0004 must be raised if the wildcard * is specified as key, and if the input is neither a map nor an array.
  • The extension could easily be combined with the extension for sequences (see #50).

Issue #50 created #created-50

16 Jan at 21:06:19 GMT
[XPath] Introduce the lookup operator for sequences

In XPath 3.1 it is convenient to use the ? lookup operator on arrays and maps.

It is easy and readable to construct expressions, such as:

  [10, 20, 30]?(2, 3, 1, 1, 2)

And this understandably produces the sequence:

20, 30, 10, 10, 20

However, it is not possible to write:

(10, 20, 30)[2, 3, 1, 1, 2]

or

(10, 20, 30)(2, 3, 1, 1, 2)

or

(10, 20, 30)?(2, 3, 1, 1, 2)

This proposal is to allow the use on sequences of the postfix lookup operator ? with the same syntax as it is now used for arrays.

The ? lookup operator will be applied on sequences whose first item isn't an array or a map. The only change would be to allow the type of the left-hand side to be a sequence, in addition to the currently allowed map and array types. At present, applying ? on any such sequence results in error. In case the first item of the LHS sequence is an array or a map, then the current XPath 3.1 semantics is in force, which applies the RHS to each item in the sequence.

The restriction in the above paragraph can be eliminated if we decide to use a different than ? symbol for this operator, for example ^

The goal of this feature is achieving conciseness, readability, understandability and convenience.

For example, now one could easily produce from a sequence a projection / rearrangement with any desired multiplicity and ordering.

Thus, it would be easy to express the function reverse() as simply:

$seq?($len to 1 by -1)

Issue #49 created #created-49

16 Jan at 11:42:44 GMT
[XQuery] The 'member' keyword is still present on ForMemberBinding

The latest editor's draft (13 January 2021) moves the member keyword to a new ForMemberClause symbol:

ForMemberClause           ::=          "for" "member" ForMemberBinding ("," ForMemberBinding)*

With this change, the ForMemberBinding syntax has retained the optional member keyword from the previous change to ForBinding:

ForMemberBinding          ::=          "member"? "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle

This means that for member member ... and for member $a in [], member $b in [] ... are valid with the current grammar.

The ForMemberBinding grammar should be:

ForMemberBinding          ::=          "$" VarName TypeDeclaration? PositionalVar? "in" ExprSingle

Issue #44 closed #closed-44

16 Jan at 11:30:26 GMT

[XPath] [XQuery] Support RecordTest self references without occurrence indicators

Pull request #48 created #created-48

14 Jan at 16:11:06 GMT
Create a schema-for-xslt40.xsd file for the current draft spec.

Note: schema-for-xslt30.xsd is still referenced by other source files, such as xslt-first-cut.xml, so it has not been removed.

Issue #38 closed #closed-38

14 Jan at 15:59:45 GMT

Create a schema-for-xslt40.xsd file.

Issue #47 created #created-47

13 Jan at 07:50:56 GMT
[XPath] [XQuery] Allow argument placeholders on keyword arguments

This would allow a user to name the arguments that are used as placeholders, making the code more readable. For example:

let $pow2 := math:pow(2, y: ?)

Syntax

This proposal would change KeywordArgument from:

KeywordArgument ::= NCName  ":="  ExprSingle

to:

KeywordArgument ::= NCName  ":="  Argument

or (using the proposed : syntax) to:

KeywordArgument ::= NCName  ":"  Argument

Semantics

A function call with N argument placeholders will create an N-arity function. The order of the argument placeholders correspond to the order of the parameters in that new function. Those parameters map to the corresponding parameter in the target (partially applied) function, which can be in a different order, or bind to keys in an options map (in the case of functions like fn:serialize). For example:

math:pow(y: ?, x: ?)

would create a function that calculates y^x instead of x^y as the arguments are reversed.

Issue #46 created #created-46

13 Jan at 00:36:15 GMT
xsl:sequence: @as

I'd like to see@as on xsl:sequence. That way i can write, e.g.

<xsl:function name="dc:slice-count" as="xs:integer">
  <xsl:param name="toast" as="element(toast)" />
  <xsl:for-each select="$toast" as="xs:integer">
    <xsl:sequence select="@cooked-slices + @raw-slices"  as="xs:integer" />
  </xsl:for-each>
</xsl:function>

It would be an error for the for-each to have other than exactly one integer as its result, and the same for the @sequence. In this simple example there's not much scope for that to happen of course,

Maybe on anything with a select attribute?

Parenthetically, a context-item attribute on xsl:sequence would obviate the XSLT1-ish xsl:for-each there, although $toast/(@a, @b) => sum() would work as well and be XSLT 3-ish.

Issue #45 created #created-45

12 Jan at 12:56:55 GMT
Second parameter of fn:sum must be neutral element for +

Currently fn:sum specifies the intent of the second parameter in a note:

The second argument allows an appropriate value to be defined to represent the sum of an empty sequence. For example, when summing a sequence of durations it would be appropriate to return a zero-length duration of the appropriate type. This argument is necessary because a system that does dynamic typing cannot distinguish "an empty sequence of integers", for example, from "an empty sequence of durations".

When implementing fn:sum on sequences of billions of items (numerics, or durations, etc), another aspect arises: this second parameter must also be, for this to work and for optimizations to be possible, a neutral element for +.

Indeed, a distributed system like Spark will produce intermediate sums for (possibly empty) subsets, and will naturally use $zero for the sum of an empty subset. Intermediate totals are aggregated in a treewise fashion. For the result to be correct, it must be the case that $zero + $x eq $x for any item in the sequence provided as the first parameter. It is fully aligned with the idea of the note above, but I would suggest to make this requirement a bit stricter.

Issue #44 created #created-44

06 Jan at 17:35:41 GMT
[XPath] [XQuery] Support RecordTest self references without occurrence indicators

This would be useful for defining things like binary trees, where the fields are optional but (if supplied) the values are not. So it is more logical to define them as:

declare item-type binary-tree as record(
    left? as ..,
    right? as ..,
    value as item()*
);

Issue #43 created #created-43

06 Jan at 15:37:58 GMT
Support standard and user-defined composite values using item type definitions

The composite values defined in 4.14.4 Composite Atomic Values are currently specified as a table. This means that it is not possible for users to define their own properties for custom types. It is also harder for editors/IDEs, or other tools to implement as there is an element of hard-coding the logic.

These could be implemented as a properties/values record associated with the defined type. The values of the record could then be arity-1 functions that are called with the supplied value when accessed via maps. For example, in XQuery:

declare %composite-values("composite-values") type xs:date external; (: built-in :)
declare type date-composite-values := record(
    year: fn:year-from-date#1,
    (: ... :)
);

and XSLT:

<xsl:item-type name="xs:date" composite-values="date-composite-values"/>
<xsl:item-type name="date-composite-values" as="record(
    year: fn:year-from-date#1,
    (: ... :)
)"/>

So xs:date("1999-10-15")?year would be evaluated as date-composite-values?year(xs:date("1999-10-15")).

Issue #42 created #created-42

05 Jan at 09:49:45 GMT
Relax type incompatibility in order by clause (impl. dep. instead of XPST0004)

In the case where XQuery is used with very large sequences (billions/trillions of items or of tuples) with a parallel evaluation [1], the order by clause in its current state is costly to evaluate, because checking the primitive types for compatibility requires an extra step and materialization (in the case of Spark: an additional action to perform this check).

Relaxing this by making the order between different primitive types implementation-dependent (for the purpose of order by) rather than throwing XPST0004, in case of several incompatible primitive types in the comparison keys, would make parallel implementations more efficient.

[1] http://www.vldb.org/pvldb/vol14/p498-muller.pdf