@qt4cg statuses in 2022

This page displays status updates about the QT4 CG project from 2022.

See also recent statuses.

Issue #297 created #created-297

28 Dec at 23:51:51 GMT
Lookup in deeply nested JSON, an abbreviated syntax for map:find

In XML, you can select all X nodes with an abbreviated syntax //X

There is no abbreviated syntax for JSON

I propose to add a ?? syntax. Like / is doubled for //, it doubles the ? lookup operator.

The syntax is basically the same as for ?:

[200] UnaryLookupRecursion ::= "??" KeySpecifier [143] LookupRecursion ::= "??" KeySpecifier [144] KeySpecifier |::= NCName | IntegerLiteral | StringLiteral | VarRef | ParenthesizedExpr | "*"

For the semantic it can call map:find, except for * and varref:

Unary variant:

??"string"    becomes map:find(. , "string")
??NCName      becomes map:find( ., "NCName")
??123         becomes map:find(., 123)

??*           Recursively every member/value of every array/map underneath .
              E.g. for `[{"a": {"x": 123}}, 456]`: `{"a": {"x": 123}}, {"x": 123}, 123, 456`

??$varref     calls ?$varref on every nested array/map.
              Like (.,??*)?$varref   (except for type errors)

Postfix variant:

E??S would be E!??S if it is atomic, or let $s := data(S) return E!??$s if S is parenthesized

This probably conflicts with #171

Issue #296 created #created-296

22 Dec at 10:44:18 GMT
Default namespace for elements; especially in the context of HTML

There can be little doubt that the fact that an unprefixed name in XPath fails to select an unprefixed element in the source document is one of the major gotcha's, causing massive bewilderment to all newbie users.

The XPath 2.0 solution of using a default element namespace in the static context is a partial solution; its main drawback is that it doesn't help the newbies who didn't know about the problem or its solution.

The HTML "living standard" introduces a "wilful violation" of the XPath 1.0 spec to address the issue. Given that most elements in an HTML DOM will be in the XHTML namespace, it states:

If the QName has no prefix and the principal node type of the axis is element, then the default element namespace is used. Otherwise if the QName has no prefix, the namespace URI is null. The default element namespace is a member of the context for the XPath expression. The value of the default element namespace when executing an XPath expression through the DOM3 XPath API is determined in the following way:

If the context node is from an HTML DOM, the default element namespace is "http://www.w3.org/1999/xhtml". Otherwise, the default element namespace URI is null.

It then adds a note which is blatantly untrue:

This is equivalent to adding the default element namespace feature of XPath 2.0 to XPath 1.0, and using the HTML namespace as the default element namespace for HTML documents. It is motivated by the desire to have implementations be compatible with legacy HTML content while still supporting the changes that this specification introduces to HTML regarding the namespace used for HTML elements, and by the desire to use XPath 1.0 rather than XPath 2.0.

Since the XPath 2.0 facility picks up the default namespace from the static context, while the HTML "wilful violation" picks it up dynamically from a property of the context node (namely "being from an HTML DOM") there is no way these can be considered equivalent.

(Note also, there's a significant ambiguity in the "wilful violation" rules: what exactly is the "context node" that determines this behaviour? I think they're suggesting it is the context node at the point of XPath API invocation, not the context node for the specific axis step. This makes it rather unclear how the rule is supposed to apply to XSLT. And: if an XSLT stylesheet creates a temporary tree with nodes in the XHTML namespaces, do we consider those nodes as being "from an HTML DOM"?)

Nevertheless, the intent of the "violation" is worthy, and it would be nice if we can find a solution to this problem that works both for HTML and for other vocabularies.

Our current proposal for fn:parse-html is that HTML elements should go in the XHTML namespace and this means that users familiar with XPath 1.0 implementations in the browser will trip over this problem. A lot.

Issue #170 closed #closed-170

21 Dec at 00:03:29 GMT

XPath "otherwise" operator

Issue #191 closed #closed-191

21 Dec at 00:01:46 GMT

Definition of "dynamic type"

Issue #213 closed #closed-213

20 Dec at 23:57:40 GMT

Lookup/Indexing operator for sequences (supersedes #50)

Issue #295 created #created-295

20 Dec at 23:43:38 GMT
Extend support for self-reference in record types

We currently allow a field in a record to have type "..", that is, the same type as the containing record definition.

This isn't good enough for the fn:random-number-generator, where we need something like:

random-number-generator-record:
record(
   number as xs:double,
   next as function() as #random-number-generator-record,
   permute as function(item()*) as item()*,
   *,
)

There are two ways we could tackle this. We could extend the syntax to allow ".." here, so it becomes next as function() as .. Or we could allow named item types to refer to themselves:

<xsl:item-type name="random-number-generator-record"
   as="record(
   number as xs:double,
   next as function() as type(random-number-generator-record),
   permute as function(item()*) as item()*,
   *,
)">

We haven't really reviewed the proposal for named item types. It's easy enough to declare them in XQuery and XSLT (and not really very difficult to define the rules under which self-referential definitions are allowed). Free-standing XPath is a bit more of a problem.

QT4 CG meeting 016 draft minutes #minutes-12-20

20 Dec at 17:26:00 GMT

Draft minutes published.

Issue #294 created #created-294

20 Dec at 15:48:55 GMT
fn:remove removing multiple items

map:remove and array:remove take a list of keys/positions to be removed; fn:remove only accepts one. I propose changing fn:remove to bring it into line.

Without this, removing multiple items is tricky because removing one item changes the positions of the others.

Example use case:

let $p := index-where($persons ->{@status='retired'))
return $persons => remove($p)

(Of course, this could always be done with a filter. But removing a small number of items from a large sequence might be more efficient than a filter)

Issue #293 created #created-293

19 Dec at 20:14:27 GMT
Error in fn:doc-available specification

Observed on the XML.com slack (Oct 6 2022):

A rule in the XQFO 3.1 specification seems to be inconsistent. In the error summary, it’s stated that:

err:FODC0005, Invalid argument to fn:doc or fn:doc-available. Raised (optionally) by fn:doc and fn:doc-available if the argument is not a valid URI reference.

The rules for fn:doc-available say: “[…] In all other cases this function returns false. This includes the case where an invalid URI is supplied, and also the case where a valid relative URI reference is supplied, and cannot be resolved, for example because the static base URI is absent.”

Pull request #292 created #created-292

19 Dec at 18:43:21 GMT
Merge signatures with optional params

This addresses issue #291, concerned with the validity of function-catalog.xml against its XSD schema, and the validity of the intermediate file xpath-functions-40.xml against the xmlspec DTD (as amended for QT).

It also fixes the trivial error raised in issue #288, and addresses issue #70 by making each function have a single prototype with default values where appropriate. (This is not purely an editorial change, it enables you for example to supply () as the $length argument of fn:substring).

There's more work on issue #257 (editorial improvements to handling of named record types).

This commit is mainly for technical review by NDW. It combines changes to the stylesheets, schema, and content, and may therefore require picking apart before it can be applied.

Issue #291 created #created-291

19 Dec at 11:31:43 GMT
DTD validity of F&O spec

Despite PR #228, I'm still seeing a lot of validation errors when (using Oxygen) I apply DTD validation to build/expanded/xpath-functions-40/xpath-functions-40.xml

Many of these relate to style attributes not being allowed on table, td, and code elements.

The merge-function-specs.xsl explicitly creates elements with these attributes:

See

  • <xsl:template match="fos:options"> line 382
  • <xsl:template match="fos:option"> line 408
  • <xsl:template match="processing-instruction('local-function-index')"> line 347

It all works because xmlspec-2016.xsl handles these attributes even though the DTD doesn't allow them: see <xsl:template name="style-attributes"> at line 2096.

There seem to be three possible options:

  • Extend the DTD to allow these attributes
  • Change merge-function-specs.xsl to not generate these attributes, instead perhaps generating a role attribute which gets picked up in the final styling
  • Sweep the issue under the carpet; avoid validating the intermediate file.

The main problem with the third approach is that it allows other errors to go unnoticed, for example putting text directly within an <item> without a containing <p>.

Pull request #290 created #created-290

17 Dec at 19:21:51 GMT
Fix issue #18 (function type hierarchy)

Editorial change to fix a technical error in the data model spec (issue #18). Does not change the specification.

Pull request #289 created #created-289

17 Dec at 18:45:49 GMT
Proposal to add fallback behaviour to map:get and array:get

Note: I got a build failure trying to build this, it's doing DTD validation but there are invalidities, unrelated to the changes I made.

Issue #100 closed #closed-100

17 Dec at 18:35:21 GMT

[FO] Typo in §17.5.3

Issue #88 closed #closed-88

17 Dec at 17:53:14 GMT

[XPATH] breaking ancestor or descendant axes

Issue #124 closed #closed-124

17 Dec at 17:41:13 GMT

[XPath] [XQuery] Incorrect subtype-itemtype rules for pure and local union types

Issue #242 closed #closed-242

16 Dec at 13:44:39 GMT

Coercion rules used to convert function result to expected type

QT4 CG meeting 016 draft agenda #agenda-12-20

16 Dec at 09:41:30 GMT

Draft agenda published.

Issue #288 created #created-288

15 Dec at 14:49:30 GMT
Error in fn:path specification

On the XML.com slack, Phil Fearon observes:

The XPath 3.1 specification has an error in the definition of fn:path

The properties section states:

The one-argument form of this function is ·deterministic·, ·context-dependent·, and [·focus-dependent·] (https://www.w3.org/TR/xpath-functions-31/#dt-focus-dependent). The two-argument form of this function is ·deterministic·, ·context-independent·, and ·focus-independent·.

The term one-argument form should be zero-argument form and consequently, two-argument form should be one-argument form

Issue #277 closed #closed-277

14 Dec at 14:13:47 GMT

Overriding functions using xsl:import

Issue #279 closed #closed-279

14 Dec at 14:13:46 GMT

Rewrite XSLT §10.3.4 (function overriding) for clarity

Issue #287 closed #closed-287

14 Dec at 14:13:45 GMT

PR #279 with merge conflicts resolved

Pull request #287 created #created-287

14 Dec at 13:57:39 GMT
PR #279 with merge conflicts resolved

Close #279 Close #277

Issue #225 closed #closed-225

14 Dec at 12:27:54 GMT

[XDM] Terminology around "Atomic value" and "Type Annotation"

Pull request #286 created #created-286

13 Dec at 21:06:36 GMT

Spec changes to allow child::(a|b|c) - Issue 107

Issue #114 closed #closed-114

13 Dec at 20:09:56 GMT

[fo] array:index-where

Issue #258 closed #closed-258

13 Dec at 20:08:52 GMT

Issue #114 - add array:index-where() function

Issue #265 closed #closed-265

13 Dec at 17:58:27 GMT

Type hierarchy tables/diagrams

Issue #268 closed #closed-268

13 Dec at 17:58:26 GMT

New type-hierarchy images / descriptions

QT4 CG meeting 015 draft minutes #minutes-12-13

13 Dec at 17:23:01 GMT

Draft minutes published.

QT4 CG meeting 015 draft agenda #agenda-12-13

09 Dec at 07:44:30 GMT

Draft agenda published.

Issue #285 created #created-285

08 Dec at 11:34:57 GMT
Stability of collections

The specification for fn:collection says:

By default, this function is [·deterministic·]. This means that repeated calls on the function with the same argument will return the same result. However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is [·implementation-defined·]. If the user has not selected such an option, a call to this function must either return a deterministic result or must raise a dynamic error [[err:FODC0003].

I think this is unrealistic. The cost of making fn:collection deterministic is disproportionate to the benefits. It's very rare in practice for a query or stylesheet to process the same collection more than once, and retaining the information needed to deliver the identical results on these rare occasions is expensive (typically it means holding a long-term lock on the data, or keeping a copy of the entire collection in memory). It also inhibits techniques such as multi-threaded evaluation.

I would like to relax this requirement.

Issue #1 closed #closed-1

07 Dec at 13:00:11 GMT

[FO] Conversion between xs:QName and Q{uri}local format

Issue #229 closed #closed-229

07 Dec at 12:36:54 GMT

Proposal: Add the missing functions for arrays: array:exists() and array:empty()

Pull request #284 created #created-284

07 Dec at 11:55:51 GMT
Add grammar for "if (test) then {expr}" with no else

As discussed in issue #234. In reviewing this PR, I suggest we consider it together with the existing proposals for ternary conditionals (x ?? y !! z) and the "otherwise" operator.

Issue #97 closed #closed-97

07 Dec at 10:05:05 GMT

[XPath] Functions symmetric to `head()` and `tail()` for sequences and arrays

Issue #250 closed #closed-250

07 Dec at 10:04:27 GMT

New functions fn:foot, fn:truncate, array:foot, array:truncate

QT4 CG meeting 014 draft minutes #minutes-12-06

06 Dec at 17:21:01 GMT

Draft minutes published.

Issue #283 created #created-283

05 Dec at 11:23:06 GMT
Enumeration types

The draft specification includes a proposal to provide enumeration types. The proposal is incomplete, for example it does not include all the rules for conversions and casting. This issue is raised in order to outline where we are, what needs to be done, and to elicit consensus on whether we want to proceed with this.

What's the motivation? Primarily, making function signatures more expressive.

Q0: do we really need this?

The essence of the proposal is an ItemType that matches an enumerated set of xs:string values:

[131] | EnumerationType | ::= | "enum" "("  StringLiteral (","  StringLiteral)* ")" For example, the type enum("red", "green", "blue") matches the string "green".

Q1: why restrict it to strings? I think my main reason was that the syntax gets complicated if we try to do it for data types that have no literal representation.

Subtyping is based on the value space. An enumeration type E is a subtype of another enumeration type F if the set of strings in E is a subset of those in F. All enumeration types are subtypes of xs:string.

Strings are not "labelled" as belonging to an enumeration type, matching is purely based on the value ("datum"). This doesn't provide very strong typing. If "violet" and "pink" are allowed by both the enumeration types colour and flower, then it's not intrinsically an error to use a variable of type flower where a colour is expected. Equally, a variable of type xs:string can be used where a colour is expected.

Q2: should enumerations be more strongly typed?

I would expect that an enumeration type can be used as the target of "cast" or "castable", but this is not currently in the spec.

The coercion rules as currently written say that if the expected type is an enumeration type, then the rules are largely the same as for any other subtype of xs:string: casting from xs:untypedAtomic applies, promotion from xs:anyURI does not apply, 1.0 compatibility mode rules do not apply.

Operators and functions on enumeration types are the same as for xs:string. This means, for example, that there are no special rules for comparison: values are ordered as strings and are compared using the default collation.

Issue #271 closed #closed-271

05 Dec at 10:26:51 GMT

Amendments to data model spec as per minutes of 2022-11-22

QT4 CG meeting 014 draft agenda #agenda-12-06

05 Dec at 10:20:13 GMT

Draft agenda published.

Issue #40 closed #closed-40

05 Dec at 10:16:09 GMT

[XPath] [XQuery] The modified SingleType EBNF symbol is redundant.

Issue #167 closed #closed-167

05 Dec at 10:08:26 GMT

XSLT Conditional Instructions

Issue #247 closed #closed-247

05 Dec at 09:51:51 GMT

Actions QT4CG-011-01 and QT4CG-011-03.

Issue #282 closed #closed-282

05 Dec at 09:51:49 GMT

Copy of MK actions-2022-11-15 branch by NW

Issue #249 closed #closed-249

05 Dec at 09:49:53 GMT

Issue 213: new function fn:items-at

Pull request #282 created #created-282

05 Dec at 09:46:42 GMT
Copy of MK actions-2022-11-15 branch by NW

Resolves merge conflicts in #247

Issue #155 closed #closed-155

04 Dec at 00:17:42 GMT

Proposal to support optional parameter values on static functions.

Issue #157 closed #closed-157

04 Dec at 00:16:21 GMT

Proposal to support optional parameters that bind to the context item.

Issue #281 created #created-281

03 Dec at 22:12:41 GMT
XPath: Short-circuiting Functions and Lazy Evaluation Hints

Short-circuiting Functions and Lazy Evaluation Hints


1. Introduction

As shown in Wikipedia, most contemporary programming languages offer reasonable support for short-circuit evaluation (also known as minimal or McCarthy evaluation), including several standard language short-circuit operators.

Short-circuiting, as we will call the above in this document, is commonly used to achieve:

  1. Avoiding undesired side effects of evaluating the second argument, such as excessive evaluation time or throwing an error

Usual example, using a C-based language:

   int denom = 0;
   if (denom != 0 && num / denom)
   {
   ...//ensures that calculating num/denom never results in divide-by-zero error
   }

Consider the following example:

   int a = 0;
   if (a != 0 && myfunc(b))
   {
     do_something();
   }

In this example, short-circuit evaluation guarantees that myfunc(b) is never called. This is because a != 0 evaluates to false. This feature permits two useful programming constructs.

  1. If the first sub-expression checks whether an expensive computation is needed and the check evaluates to false, one can eliminate expensive computation in the second argument.

  2. It permits a construct where the first expression guarantees a condition without which the second expression may cause a run-time error.

  3. Idiomatic conditional construct

Perl idioms:

   some_condition or die; # Abort execution if some_condition is false

   some_condition and die; # Abort execution if some_condition is true


2. Short-circuiting in XPath

In short (pun intended) there is no such thing mentioned in any officially-published W3C version (<= 3.1) of XPath.

This topic was briefly mentioned in the discussion of another proposal: that of providing the capability to specify strictly the order of evaluation.

Aspects of incorporating hints for lazy evaluation (a topic related to short-cutting) were discussed also in the thread to this question on the Xml.com Slack.

The situation at present is that the XPath processor that is being used decides whether or not to perform shortcutting, even in obvious cases. Thus, varying from one XPath processor to another, the differences in performance evaluation could be dramatic. For example, the following XPath expression is evaluated on BaseX (ver. >= 10.3) for 0 seconds, and the same expression is evaluated by Saxon ver. 11 for about 100 seconds.


let $fnAnd := function($x)
   {
     function($y)
     {
      if(not($x)) then false()
                  else $y
     }
   }
   return
      $fnAnd(false())(some $b in ( ((1 to 1000000000000000000) !true()) )  satisfies not($b)   )


3. Analysis

We can define the term “function with shortcutting” (just for a 2-argument function, but this can be extended for N-argument function where N >= 2) in the following way:

Given a function $f($x, $y), we denote in XPath its partial application for a given value of $x (say let $x := $t) as:

$f($t, ?)

The above is a function of one argument. By definition:

$f($x, $y) is equivalent to $f($x, ?) ($y), for every pair $x and $y.

That is, the partial application of the 2-argument function $f with fixed 1st argument is another function $g which when applied on the 2nd argument ($y) of $f($x, $y) produces the same value as $f($x, $y):

If $g is defined as $f($x, ?), then $g($y) produces the same value as $f($x, $y) for every pair $x and $y.

Let us take a specific function:

let $fAnd := function($x as xs:boolean, $y as xs:boolean) as xs:boolean
                     { $x and $y}

Then one equivalent way of defining $fAnd is:

let $fAnd := function($x as xs:boolean, $y as xs:boolean) as xs:boolean
                     {
                       let $partial := function($x as xs:boolean) as function(xs:boolean) as xs:boolean
                                               {
                                                  if(not($x)) then ->(){false()}
                                                              else ->($t) {$t}
                                               }
                         return $partial($x)($y)
                    }
   return
       $fAnd(false(), true())

The $partial function is the result of the partial application $fAnd($x, ?) and by definition this is a function of arity 1, which when applied on the 2nd argument of $fAnd, produces the same result as $fAnd($x, $y)

From the code above we see that actually there exists a value of $x (the value false() ) for which $fAnd($x, ?) is not a function of one argument, but a constant function (of 0 arguments) – that produces the value false().

Definition:

We say that a function f(x, y) allows shortcutting if there exists at least one value t such that

f(t, ?) is a constant.


4. Solution

How can an XPath processor treat a function with shortcutting?

Obviously, if the XPath processor knows that f(x, y) allows shortcutting, then it becomes possible to delay the evaluation of the 2nd argument y and only perform this evaluation if the arity of the function returned by f(t, ?) is 1, and not 0.

How can an XPath processor know that a given function allows shortcutting?

  • One way to obtain this knowledge is to evaluate f(t, ?) and get the arity of the resulting function. XPath 3.1 allows getting the arity of any function item with the function fn:function-arity(). However, doing this on every function call could be expensive and deteriorate performance.

  • Another way of informing the XPath processor that a given function f(x, y) allows shortcutting is if the language provides hints for lazy evaluation:
    let $fAnd := function($x as xs:boolean, lazy $y as xs:boolean) as xs:boolean

    Only in the case when there is a lazy hint specified the XPath processor will check the arity of f(x, ?) and will not need to evaluate the y argument if this arity is 0.

Let us return to the original example:

let $fAnd := function($x as xs:boolean, $y as xs:boolean) as xs:boolean
                     {
                       let $partial := function($x as xs:boolean) as function(xs:boolean) as xs:boolean
                                               {
                                                  if(not($x)) then ->(){false()}
                                                              else ->($t) {$t}
                                               }
                         return $partial($x)($y)
                    }
   return
       $fAnd(false(), true())

Executing this with an Xpath 3.1 processor, an error is raised: “1 argument supplied, 0 expected: function() as xs:boolean { false() }.

image

But according to the updated “Coercion Rules / Function Coercion” in Xpath 4.0, no error will occur:

If F has lower arity than the expected type, then F is wrapped in a new function that declares and ignores the additional argument; the following steps are then applied to this new function.

For example, if the expected type is function(node(), xs:boolean) as xs:string, and the supplied function is fn:name#1, then the supplied function is effectively replaced by function($n as node(), $b as xs:boolean) as xs:string {fn:name($n)}

This is exactly the place where the XPath processor will call the lower-arity function without providing to it the ignored, and not needed to be evaluated, additional argument.

Thus, according to this rule, an XPath 4.0 processor will successfully evaluate the above expression and will not issue the error shown above.

Finally, we can put the lazy hint on a function declaration or on a function call, or on both places:

let $fAnd := function($x as xs:boolean, lazy $y as xs:boolean) as  xs:boolean
   {
     let $partial := function($x as xs:boolean) as function(lazy xs:boolean) as xs:boolean
                           {
                              if(not($x)) then ->(){false()}
                                          else ->($t) {$t}
                           }
      return $partial($x)( lazy $y)
   }
   return
       $fAnd(false(), lazy true())

How to write short-circuiting functions?

The code above is a good example how one can write a short-circuiting function evaluating which the XPath processor would be aware that a short-circuit is happening but instead of signaling arity error as an XPath 3.1 processor does, will logically ignore the unneeded 2nd argument.

Issue #280 created #created-280

01 Dec at 15:14:06 GMT
Why is resolve-uri forbidden from resolving against a URI that contains a fragment identifier?

The 3.1 F&O spec says, of fn:resolve-uri():

A dynamic error is raised [err:FORG0002] if $base is not a valid IRI according to the rules of RFC3987, extended with an implementation-defined subset of the extensions permitted in LEIRI, or if it is not a suitable IRI to use as input to the chosen resolution algorithm (for example, if it is a relative IRI reference, if it is a non-hierarchic URI, or if it contains a fragment identifier).

(emphasis added by me)

What in the name of all things is that about? I've never noticed that before, and I haven't seen any other API (the URL API in Node and the browser for example) that cares.

resolve-uri('test.xml', 'http://example.com/path/file.xml#foo') === http://example.com/path/test.xml

I don't see why the presence of a fragment identifier should matter in the least.

e.g.

>> let url = new URL("http://example.com/path/file.xml#foo")
>> console.log(url.href)
http://example.com/path/file.xml#foo
>> let resolved = new URL("test.xml", url);
>> console.log(resolved.href)
http://example.com/path/test.xml

Issue #270 closed #closed-270

30 Nov at 10:57:04 GMT

Incorrect statement about named modes

Issue #273 closed #closed-273

30 Nov at 10:56:10 GMT

Issue270 xslt mode visibility

Pull request #279 created #created-279

29 Nov at 21:59:02 GMT
Rewrite XSLT §10.3.4 (function overriding) for clarity

Essentially editorial - clarifies the existing rules, as described in issue #277

Issue #254 closed #closed-254

29 Nov at 17:29:00 GMT

Improvements/fixes for the coercion rules

Issue #278 created #created-278

29 Nov at 17:16:53 GMT
array bound checking

Similar functions on arrays and sequences have different behaviour as regards bound checking. For example, fn:head() returns an empty sequence if the input is empty, while array:head() throws an error.

Sometimes we want the error, sometimes we don't, but this should be orthogonal to whether we are using sequences or arrays.

Is there a way we can adapt the sequence functions to throw an error, or adapt the array functions so they don't?

Various ideas have been put forward, including:

  • add extra optional parameters to functions to select the behaviour
  • mirror the relevant functions (e.g. into a different namespace) to create an alternative version with different behaviour
  • add options to the static context (array-bound-checking=yes|no, sequence-bound-checking=yes|no) to switch the behaviour (with appropriate mechanisms in XQuery and XSLT -- and perhaps XPath -- to set these options)
  • do nothing, let users solve the problem for themselves by writing user-defined functions.

Note 1: the difference in behaviour affects operators as well as functions: contrast $seq[0] (which returns ()) with $array(0) (throws error). Making $seq[0] throw an error would mean we have to define it in a different way, since the formulation $seq[position()=$N] is intrinsically error-free.

Note 2: if $array(0) doesn't throw an error, it's not immediately obvious what it should do. Returning an empty sequence isn't ideal because the empty sequence is a valid entry in an array. For maps we have the same problem, which is why we have two functions map:get() and map:contains().

Issue #277 created #created-277

29 Nov at 12:50:27 GMT
Overriding functions using xsl:import

In XSLT 3.0, you can have a module M that contains an xsl:function F#2, and in another module you can import M, and declare another xsl:function F#3.

In the 4.0 spec we appear to disallow this with the paragraph (in §10.3.4):

A stylesheet function may be overridden by another stylesheet function with the same name that has higher [import precedence]. This is only allowed, however, if the [arity range] of the overriding function includes the totality of the arity range of the overridden function.

In fact, the error conditions we go on to define (XTSE0769 and XTSE0770) are more carefully worded and do not make the above situation an error. XTSE0769 says that if F has higher import precedence than G, then either the arity ranges of F and G must be disjoint, or the arity range of F must include the totality of G. XTSE0770 says that if F and G have the same import precedence, then their arity ranges must be disjoint.

The paragraph cited should be replaced with:

A stylesheet function may be overridden by another stylesheet function with the same name that has higher [import precedence]. This is only allowed, however, if either (a) the [arity range] of the overriding function includes the totality of the arity range of the overridden function, or (b) the two arity ranges are non-overlapping.

There is also scope for editorial improvement to §10.3.4. It describes three separate scenarios:

(a) overriding functions using xsl:import and import precedence

(b) overriding functions using xsl:use-package and xsl:override

(c) overriding extension/external functions using xsl:function (or vice versa)

and it would be much easier to read the section if these were clearly distinguished.

Issue #276 closed #closed-276

27 Nov at 17:42:58 GMT

Make './gradlew publish' work on Windows #255

Issue #255 closed #closed-255

27 Nov at 17:42:58 GMT

Build error running gradlew publish on Windows

Pull request #276 created #created-276

27 Nov at 17:37:17 GMT
Make './gradlew publish' work on Windows #255

Fix #255

Repeat after me, "filenames are not URIs." Not on some platforms, anyway.

I still get a warning about "correctness" because of the interaction between a couple of tasks. That doesn't happen on a *nix platform so I don't know if it's related to the difference between forward and backward slashes or if it's a consequence of the build changes I made to support the new SVG (that might be) in data model. They seem harmless for the moment.

Issue #275 created #created-275

26 Nov at 22:57:37 GMT
Problems with nt/xnt links to grammar terms

I'm trying to work out why we're getting linking errors during the build when linking to grammatical terms.

In etc/XT40.xml (after changing extract.xsl to produce tidier namespace declarations, I'm seeing entries like:

   <nt def="doc-xpath40-SequenceType" xlink:type="simple">SequenceType</nt>
   <nt def="doc-xpath40-ItemType" xlink:type="simple">ItemType</nt>
   <nt def="doc-xpath40-OccurrenceIndicator" xlink:type="simple">OccurrenceIndicator</nt>
   <nt def="prod-xpath40-AnyItemTest" xlink:type="simple">AnyItemTest</nt>
   <nt def="doc-xpath40-TypeName" xlink:type="simple">TypeName</nt>
   <nt def="doc-xpath40-KindTest" xlink:type="simple">KindTest</nt>

and the problem entries seem to be the ones prefixed "prod-" rather than "doc-".

The extract.xsl stylesheet simply copies what it finds in xpath-assembled.xml.

The decision seems to be made in grammar2spec.xsl, for example line 523 reads

<xsl:param name="result_id_docprod_part"/> <!-- 'doc-' or 'prod-' -->

The parameter is set to "prod-" in add-non-terminals (line 473) and in add-terminals (line 488), and is set to "doc-" in show-prod (line 513).

Looking more carefully, all productions listed in XT40.xml have a "prod-" entry (at least one...) and most of them also have a "doc-" entry. The problem cases are those that do not have a "doc-" entry. Which suggests that show-prod is not selecting them.

In a small number of cases, when processing the XSLT specification, the show-prod template is outputting the message

WARNING!! production with name="MapTest" not found

This is produced for MapTest, ArrayTest, EnumerationType, and NamedItemType. The problem here appears to be that the production is referenced in the XSLT spec but in the grammar file it is not shown with if="xslt40-patterns", so it is not present in the pattern grammar.

For productions like AnyItemTest, the problem is different. It looks to me as if show-prod is not being called for these terms.

show-prod is called from one place only: assemble-spec.xsl line 208. This is in a template rule with match="prodrecap"

Sure enough, AnyItemTest does not appear to have a prodrecap in the XPath spec.

Issue #274 created #created-274

26 Nov at 10:47:08 GMT
What would it take/would it be possible to build a module repository for QT?

We have an ever growing list of proposed convienence functions. I am not opposed, in principle, to adding convenience functions, but we don’t have any principled criteria (AFAICT) for which ones to add and which ones to reject. That’s not surprising, and I’m also not opposed to that. But I’m sure there are hundreds, perhaps thousands, of such functions. At some point, we’re going to start to resist adding more simply because we’ve added so many. Some of us may already be nearing that point.

It seems to me that the alternative is to do what TeX, Perl, Python, Node, etc. do: make it easy for users to download, install, and use libraries. (I’m carefully using the term “library” here where I might prefer to use “package” or “module” because we already have “package” and “module” which mean other things.)

What would it take to make that possible?

One problem we have is that there are two (perhaps three, or more, depending on how you count) different QT languages and they aren’t all mutually interoperable. My XSLT implementation of fn:parse-uri for example, isn’t directly usable by an XQuery product that doesn’t implement XSLT or some other product that only uses XPath.

Suppose we added an import library declaration to XPath, similar to the import module declaration in XQuery

LibraryImport := "import" "library"
                   ("namespace" NCName "=")?
                   "at" URILiteral

and a corresponding <xsl:library> instruction to XSLT.

<xsl:library
  namespace = uri
  href = uri />

The semantics of each is that it searches an implementation-defined set of locations for a module that matches the URI. If it finds one, it loads the functions declared in that library. If a namespace is given, it loads only the functions in the namespace provided.

We’d expect all implementations to be able to load libraries that only used XPath constructions. An XSLT processor might also be able to load XSLT constructions. An XQuery processor might also be able to load XQuery constructions.

We could define a library file format that allowed an implementor to provide several different implementations of a function, where the processor could choose the best one (in some implementation-dependent way). This would also give us a place to hang version numbers and other relevant metadata.

With that much in place, would it be more practical to use XPath extension modules?

Consider the following scenario. I want to use a URI relativization function (as requested in #269). Dimitre provided a pure XPath implementation, so we don’t actually have to implement it as a native function, we just have to make it easy to use. Imagine that EXPath.org (for example) provided a machine readable list of of libraries.

I run a hypothetical “expath” command to search the machine readlabe directory.

$ expath search relative
xpath uri-relativize -- returns the relative location between two URIs
xslt  doc-relative -- convenience functions for accessing “uncles”, “aunts”, etc.
xpath relative-rank -- funtions to score XML documents

It found three libraries that matched “relative”. That first one sounds promising.

$ expath show uri-relativize
The uri-relativize library provides uri-relativize(), an XPath
function that resolves one absolute URI relative to another.

That sounds like what I want, so I install it.

$ expath install uri-relativize
Downloading uri-relativize … installing … done.

Then in my stylesheet I simply add

<xsl:library href="uri-relativize"/>

Or in a language that only uses XPath, I add

import library at "uri-relativize"

and I can use the uri-relativize() function.

I think the important parts are that the implementation searches for libraries so that I don’t have to identify precisly where they were installed and that we somehow make it practical to use them without, though it pains me to say this, explicit namespace bindings.

Perhaps we could allow libraries to “inject” functions into the default function namespace, or we could have a function namespace search list and maybe libraries could extend that.

The format of the library might be something like this:

<library xmlns="xpath-library" name="uri-relativize" version="1.0.3"
         namespace="http://example.com/my/namespace">
<provides>
function uri-relativize($path1 as xs:anyURI, $path2 as xs:anyURI) as xs:anyURI
</provides>
<xpath version="3.0">
…xpath implementation…
</xpath>
<xslt version="4.0">
…xslt 4.0 implementation…
</xslt>
<xslt version="3.0">
…xslt 3.0 implementation…
</xslt>
<library>

But I’m sure if we looked closely at the metadata provided in other system’s packages, we’d see I’ve left a bunch of stuff out. You’d probably, for example, want some way of saying one package depends on another and having the processor load those automatically.

Pull request #273 created #created-273

25 Nov at 18:09:34 GMT
Issue270 xslt mode visibility

This PR fixes issue #270 concerning the visibility of XSLT modes.

It also deals with a lot of editorial issues, some highlighted in issue #275.

It changes the manual change markup in the XSLT spec to use at-"date" format rather than at="draft-number" (dates are more useful for the incremental development process we are following). And it fixes some cross-spec-reference issues, and some violations of hyphenation diktats.

Also bundled with this bug fix are other editorial changes to fix cross-spec linking errors; for details see the individual commit messages.

QT4 CG meeting 013 draft agenda #agenda-11-29

25 Nov at 14:32:27 GMT

Draft agenda published.

Issue #253 closed #closed-253

25 Nov at 12:19:24 GMT

Fix xnt references

Issue #237 closed #closed-237

25 Nov at 11:53:54 GMT

Issue167 xsl conditionals

Issue #272 created #created-272

25 Nov at 10:14:59 GMT
Setting parameter values in xsl:use-package

It's possible for two different components of an application to use the same library package (via xsl:use-package) and in principle each of them should be able to configure that package (by setting its global parameters) in different ways. Currently though it's not at all clear how the global parameters of a used package should be set (and there are some inconsistencies in the spec concerning how the visibility attribute on such parameters is supposed to work).

I think that it's fairly straightforward to plug this gap by allowing xsl:use-package to have xsl:with-param children, naming the stylesheet parameters in the used package and assigning them values. For static parameters the values must be assigned using static expressions; for non-static parameters any expression can be used: because of the scoping rules and the syntactic constraints on xsl:use-package, the value of the expression can only depend on global variable and parameters in the using stylesheet package.

Pull request #271 created #created-271

24 Nov at 16:10:40 GMT
Amendments to data model spec as per minutes of 2022-11-22

Changes to data model spec, see actions QT4CG-012-01, -02, -03, -04, -06.

Issue #270 created #created-270

24 Nov at 13:56:42 GMT
Incorrect statement about named modes

On the description of the 'visibility' attribute here it says:

A named mode is not eligible to be used as the initial mode if its visibility is private.

But, if the mode is designated as default mode of the implicit or explicit xsl:package then it's eligible as an initial mode. Having a private visibility does not affect it's initial status. Furthermore, the unnamed mode is always private, and always initial.

Issue #269 created #created-269

24 Nov at 13:20:05 GMT
Function for URI relativization

Signature:

relativize($uri as xs:anyURI, $base as xs:anyURI) as xs:anyURI

Example: URI::relativize in Java

Pull request #268 created #created-268

24 Nov at 10:28:19 GMT
New type-hierarchy images / descriptions

Close #265

This is my attempt to improve the type-hierarchy images.

  1. There were graphics already, but they were accidentally not being copied into the right place.
  2. But, they're for 3.1 and they're in some format I don't recognize, so
  3. I created new SVG ones anyway, that folks may or may not like
  4. I replaced the big, ugly yellow tables with prose. I concluded that accessibility was the only reason they were present.
  5. The colors are a little different, and a little "off", but they're explicitly chosen from a pallet that offers unambiguously differenty colors for the three most common forms of color-blindness.
  6. One of the SVG images is too wide, I'm not sure what to do about that
  7. The labels are text, so you can search for them, and xs:anyAtomicType is a link. We could make more links.

Issue #267 closed #closed-267

24 Nov at 10:16:29 GMT

Stylesheet updates for inline-SVG in the data model

Pull request #267 created #created-267

24 Nov at 10:08:49 GMT
Stylesheet updates for inline-SVG in the data model

I need to commit this first, and separately, so that the build will be correct.

Also fixed an obvious CSS typo.

Issue #266 created #created-266

23 Nov at 18:49:48 GMT
Add an option on xsl:copy-of to copy a subtree with a change of namespace

It's a common requirement to copy a subtree with a change of namespace.

It can be done easily enough in XSLT with apply-templates in a custom mode, but an option on xsl:copy-of could make it a lot easier. It could also potentially be a lot more efficient.

Alternatively, this could be provided as a function, or an option on the copy-of function.

Or it could be a new higher order function copy-renaming($node, function($name){ xs:QName('new uri', local-name-from-QName($name) }).

There's a danger of course of packing in too much functionality and making it just as complex/inefficient as using a custom mode.

Issue #265 created #created-265

23 Nov at 12:46:18 GMT
Type hierarchy tables/diagrams

On 22 November, we mentioned in passing that the type hierarchy tables in the Data Model spec are hard to read. They're a garish yellow for no obvious reason and the font size is very small. Would replacing them with diagrams like this be an improvement?

out

(This is the largest and most difficult to represent, I think. I've grouped some of the atomic types together to make the drawing more aesthetically pleasing. I don't think that interferes with comprehension, though might need to be explained.)

There's no particular rhyme or reason to the order of the items in the "second column" except to leave room for the longer hierarchies further to the right.

Issue #232 closed #closed-232

23 Nov at 09:52:58 GMT

Issue 225 - Data model clarifications

Issue #264 closed #closed-264

23 Nov at 09:36:05 GMT

THIS IS JUST A TEST IGNORE THIS

Pull request #264 created #created-264

23 Nov at 09:29:41 GMT

THIS IS JUST A TEST IGNORE THIS

QT4 CG meeting 012 draft minutes #minutes-11-22

22 Nov at 17:33:38 GMT

Draft minutes published.

Issue #263 closed #closed-263

22 Nov at 12:29:47 GMT

Exclude spec XSL from PRs

Pull request #263 created #created-263

22 Nov at 12:22:15 GMT
Exclude spec XSL from PRs

Letting the specification-specific XSL through means you can get a mismatch with the common XSL. So don't do that.

QT4 CG meeting 012 draft agenda #agenda-11-22

21 Nov at 11:34:37 GMT

Draft agenda published.

Issue #251 closed #closed-251

21 Nov at 10:02:57 GMT

Use external CSS for styling

Issue #248 closed #closed-248

21 Nov at 09:49:21 GMT

Editorial: update change logs and status

Issue #262 created #created-262

20 Nov at 23:10:00 GMT
Navigation in deep-structured arrays

At present there is no convenient way to navigate down a deep-structured array (whose members are themselves arrays and maybe even (recursively) deep-structured arrays themselves).

For example, given the array:

 [1, [2, 3], [4, [5, 6]], (7, 8, 9) ]

we cannot navigate to 6 with a single function call, nor do we have a convenient search mechanism that will give us all index(es)-paths that navigate to 6, in this case just the single index-path (3, 2, 2).

This proposal is to extend the array:get() function and the array lookup using function call syntax to accept as their last argument not just a single integer position, but a sequence of integers.

The sequence of integer position is called "item-navigation-path", or simply: "navigation-path".

For the array defined above, the navigation path to the contained item 6 is:

(3, 2, 2)

XPath implementation:

let $ar := [1, [2, 3], [4, [5, 6]], (7, 8, 9) ],
    $get := -> ($input as array(*), $indices as xs:integer*)
            {
              let $getHelper := -> ($input as array(*), $indices as xs:integer*, $self as function(*))
              {
                let $headIndex := head($indices), $restindices := tail($indices)
                  return
                    if(exists($restindices))
                      then $self($input($headIndex),$restindices, $self)
                      else if(exists($headIndex))
                             then $input($headIndex)
                             else $input
               }
              return $getHelper($input, $indices, $getHelper)
            }
  return
    $ar => $get((3, 2, 2))
     (: (: Or alternatively : :) $get($ar, (3, 2, 2)) :)

When the above expression is evaluated, the expected, correct result is produced:

6

Goals

  1. To allow a simple and intuitive deep-indexing navigation with a single function call.

  2. To allow for sophisticated deep-searching functionality (like the current array:index-where() and array:index-of(), but not just scratching the surface) to return the navigation paths to wanted items of interest, which then could be stored, passed to other functions and easily retrieved.

Other examples:

let $ar := [1, [2, 3], [4, [5, 6]], (7, 8, 9) ],
    $get := -> ($input as array(*), $indices as xs:integer*)
            {
              let $getHelper := -> ($input as array(*), $indices as xs:integer*, $self as function(*))
              {
                let $headIndex := head($indices), $restindices := tail($indices)
                  return
                    if(exists($restindices))
                      then $self($input($headIndex),$restindices, $self)
                      else if(exists($headIndex))
                             then $input($headIndex)
                             else $input
               }
              return $getHelper($input, $indices, $getHelper)
            }
  return
     ( $get($ar, (3, 2, 2)), $get($ar, 2), $get($ar, (3,1)),   $get($ar, (3,2)), $get($ar, ())  )

When the above expression is evaluated, all the expected, correct results are produced:

6
[2,3]
4
[5,6]
[1,[2,3],[4,[5,6]],(7,8,9)]

image

Pull request #261 created #created-261

20 Nov at 09:48:56 GMT
Proposed fn:char function - see issue 121

Although discussion on issue #121 did not converge on a consensus, this PR proposes a new function which I believe meets the requirements expressed.

Issue #260 created #created-260

19 Nov at 23:54:43 GMT
array:index-of

Seems we are missing the corresponding array function to the standard (on sequences) fn:index-of().

Summary

Returns a sequence of positive integers giving the positions within the array $input of items that are equal to the $search-member.

Signature

array:index-of(
               $input  as array(*),	
               $searched-member as item()*,
               $compare($x as item()*, $y as item()*) as xs:boolean := fn:deep-equal#2
                 ) as xs:integer*

Properties

This function is deterministic, context-independent, and focus-independent

Rules

The result of the function is a sequence of integers, in monotonic ascending order, representing the 1-based positions in the input array of those members $mem for whom $compare($mem, $searched-member) is true() .

More formally, the function returns the result of the XPath expression:

  (1 to array:size($input)) ! (-> {  .[$compare($input(.), $searched-member)] }) (.)

Examples

image

And also this:

image

Pull request #259 created #created-259

18 Nov at 23:24:25 GMT
Issue #74 - add the fn:parse-html function

This PR makes the following changes:

  1. Add an .editorconfig file to allow editors and IDEs to indent the XML source documents consistently.
  2. Group the XML and JSON parsing and serialization functions into a common top-level section.
  3. Update the html5 bibref to the WHATWG specification -- the old HTML5 link redirects there.
  4. Add a new fn:parse-html function specification.
  5. Define a mapping between the HTML DOM and the XDM nodes.

The function specification itself is complete. The HTML DOM to XDM node mapping currently only contains the overview. I'm going to write that for this PR, this is just to start the ball rolling in specifying this function.

Pull request #258 created #created-258

18 Nov at 19:25:33 GMT
Issue #114 - add array:index-where() function

The function is symmetric with fn:index-where.

Issue #257 created #created-257

18 Nov at 18:02:00 GMT
Improving the styling/presentation/prepresentation of the record types in the F&O spec

I'm not keen on the † symbol being used to indicate and link record types. This is not used elsewhere, and when a record type is used as a type, there is a link to the record type definition. I think that that link is sufficient.

The presentation of the record type definition has the following format:

†type-name:
record(
    ...
)

The record part is fine, but I find the †type-name: part clunky. For functions, omiting the declare function part from the XQuery syntax makes sense as the function signature is readable without that.

The XQuery 4.0 draft spec has the following (as of yet unapproved) syntax for defining type aliases: https://qt4cg.org/specifications/xquery-40/xquery-40.html#id-item-type-declaration. I would suggest using something similar, e.g.:

type-name as record(
    ...
)

That is in line with the way function declarations are specified and looks more readable to me.

Finally, ommitting the id from the record element results in †: before the record. -- It would be nice if instead of:

        <div3 id="html-parser-options">
           <head>HTML parser options</head>
           <example role="record">
              <record id="parse-html-options">
                 <arg name="method" type="union(enum(&quot;html5&quot;), xs:string)"/>
                 <arg name="*"/>
              </record>
           </example>

you could write something like:

        <div3 id="parse-html-options">
           <head>HTML parser options</head>
           <example role="record">
              <record type-name="parse-html-options">
                 <arg name="method" type="union(enum(&quot;html5&quot;), xs:string)"/>
                 <arg name="*"/>
              </record>
           </example>

Issue #256 created #created-256

18 Nov at 16:52:36 GMT
Function declarations: static and dynamic context for default parameter values

In the new text for default values on parameters in XQuery function declarations, we don't say clearly what the static context for the default value expression is. In particular we don't say that it excludes the other parameters of the function. The XSLT spec has similar (though slightly different) shortcomings.

There's a slight complication in that we say the dynamic context for the default value expression is the dynamic context of the function call. But what if the default value is a variable reference $x? Statically, this will be (presumably, though we don't currently say) be bound to a global variable $x. Now, we say (under "dynamic context") that the dynamic "variable values" contains the same [expanded QNames] as the [in-scope variables] in the [static context] for the expression. But, the static context for the default value expression and the static context for the function call have different in-scope variables and they must therefore have different variable values in the dynamic context, so it's wrong to say that the dynamic context for the default value expression is the same as that of the function call.

My first instinct would be to restrict the default value to being what XSLT calls a "static expression" (this isn't defined in XQuery, but it could be defined easily enough). However, that would disallow using "." as the default value expression, which is something we wanted to permit.

The next option would be to say that the dynamic context for a default value expression is the same as the dynamic context of the function call except that "variable values" contains bindings for global variables only. This feels rather kludgey, but it's workable in principle. (It's worth noting, and might be worth noting in the spec, that in XQuery all components of the dynamic context except the focus and the variable values are typically immutable within an execution scope. The same isn't true in XSLT, where we have additional dynamic context components like regex-group() and current-output-uri() to worry about.).

Issue #41 closed #closed-41

18 Nov at 16:44:04 GMT

[XQuery] The TypeswitchExpr and CaseClause symbols have repeated VarNames

Issue #5 closed #closed-5

18 Nov at 16:41:48 GMT

[FO] The math:atan2 notes incorrectly defines its behaviour.

Issue #255 created #created-255

18 Nov at 12:01:42 GMT
Build error running gradlew publish on Windows

When running ./gradlew publish in Windows (via Git Bash) or running the publish gradle task in an IntelliJ Run/Debug configuration, I get the following error:

> Task :xquery_assemble_xpath
Transforming specifications/xquery-40/src/xpath.xml...
Error at char 9 in expression in xsl:variable/@select on line 168 column 60 of assemble-spec.xsl:
  FODC0005  Invalid URI
  file:/D:/Projects/xquery-xslt/qtspecs/build/xquery-40/src/xpath-preprocessed.xml/../D:\Projects\xquery-xslt\qtspecs\build/xquery-40/temp-xpath-grammar.xml. Caused by java.net.URISyntaxException: Illegal character in opaque part at index 2: D:\Projects\xquery-xslt\qtspecs\build/xquery-40/temp-xpath-grammar.xml
  In template rule with match="prodrecap" on line 152 of assemble-spec.xsl
     invoked by xsl:apply-templates at file:/D:/Projects/xquery-xslt/qtspecs/style/assemble-spec.xsl#51
  In template rule with match="node()[fn:empty(...)]" on line 48 of assemble-spec.xsl
     invoked by xsl:apply-templates at file:/D:/Projects/xquery-xslt/qtspecs/style/assemble-spec.xsl#51
  In template rule with match="node()[fn:empty(...)]" on line 48 of assemble-spec.xsl
     invoked by xsl:apply-templates at file:/D:/Projects/xquery-xslt/qtspecs/style/assemble-spec.xsl#51
  In template rule with match="node()[fn:empty(...)]" on line 48 of assemble-spec.xsl
     invoked by xsl:apply-templates at file:/D:/Projects/xquery-xslt/qtspecs/style/assemble-spec.xsl#51
  In template rule with match="node()[fn:empty(...)]" on line 48 of assemble-spec.xsl
     invoked by xsl:apply-templates at file:/D:/Projects/xquery-xslt/qtspecs/specifications/xquery-40/style/assemble-xquery.xsl#26
  In template rule with match="/" on line 13 of assemble-xquery.xsl
Invalid URI file:/D:/Projects/xquery-xslt/qtspecs/build/xquery-40/src/xpath-preprocessed.xml/../D:\Projects\xquery-xslt\qtspecs\build/xquery-40/temp-xpath-grammar.xml

I've tracked this down to the grammar-file option passed to the XSLT in the build.gradle file, but I'm not currently sure what the fix should be.

Note: running this within Linux via WSL works, so the issue looks like it is due to handling Windows paths as file URIs.

Pull request #254 created #created-254

17 Nov at 23:35:14 GMT
Improvements/fixes for the coercion rules

This PR is concerned with the coercion rules (formerly known as function conversion rules).

It fixes issue #242, a minor omission where we failed to say that coercion rules are used to convert the result of a static function call to the required return type.

It implements the proposal in issue #189, to apply the coercion rules to variable declarations/bindings as well as to function arguments and results. This brings XQuery into line with XSLT, and thus paves the way to allowing type declarations on "let" clauses in XPath.

It does some further editorial tidying up of the rules, such as improvng the definition of the term "coercion rules", adding clarifying notes, etc.

In approving this PR, I am requesting approval of the change (which was already in the draft, but has not been discussed) to introduce "down-casting" or "relabelling". This allows a function to declare a parameter with required type xs:positiveInteger, and for a caller to supply the value 42 without explicitly casting it to xs:positiveInteger.

Issue #149 closed #closed-149

17 Nov at 11:12:48 GMT

Functions for splitting a sequence (or array) based on predicate matching

Pull request #253 created #created-253

17 Nov at 10:07:21 GMT
Fix xnt references

This PR doesn't make any technical changes[*]. It fixes some bugs in how cross-spec references are resolved. I failed to delete some out-of-date index files when I updated the code to generate them automatically. That lead to weird URIs. And I failed to update the stylesheets to correct for a change from doc-xpath- to doc-xpath40 in the targets. Thanks to @michaelhkay for a detailed bug report.

[*] That's not 100% true, I did change to <xterm> references to BracedURILiteral into <xnt> references in function-catalog.xml.

Issue #252 closed #closed-252

17 Nov at 09:45:40 GMT

Fix typo in CODEOWNERS

Pull request #252 created #created-252

17 Nov at 09:38:26 GMT

Fix typo in CODEOWNERS

Pull request #251 created #created-251

17 Nov at 09:37:24 GMT
Use external CSS for styling

This PR doesn't change anything technically, it moves around some big inline-blobs of CSS into external files so that we can have more consistency more easily. Every spec includes qtspecs.css and another spec-specific CSS if it needs one. The diff markup is also in a separate CSS that's included in the diff versions.

I don't think I've broken the styling anywhere, but if you look around and find something that looks odd, please let me know.

Pull request #250 created #created-250

16 Nov at 23:07:38 GMT
New functions fn:foot, fn:truncate, array:foot, array:truncate

New functions fn:foot, fn:truncate, array:foot, array:truncate, as proposed in issue #97. Note that not everyone was happy with the name "truncate".

Pull request #249 created #created-249

16 Nov at 22:26:52 GMT
Issue 213: new function fn:items-at

Add new function fn:items-at in response to issue 213

Pull request #248 created #created-248

16 Nov at 18:43:38 GMT
Editorial: update change logs and status

This PR is purely editorial: it updates change logs and status information to reflect the current state of play. In particular, the change logs in appendices now attempt to distinguish changes approved by the WG from changes that were in the baseline draft prepared by the editor, but which have not yet been reviewed or approved.

Pull request #247 created #created-247

16 Nov at 17:06:07 GMT
Actions QT4CG-011-01 and QT4CG-011-03.

Actions QT4CG-011-01 and QT4CG-011-03. (Note, there are still some infrastructure problems with xnt references)

Issue #246 closed #closed-246

16 Nov at 17:02:01 GMT

Give ednote a distinctive appearance

Pull request #246 created #created-246

16 Nov at 16:52:46 GMT
Give ednote a distinctive appearance

Next, move all the CSS into a common location

Issue #245 closed #closed-245

16 Nov at 16:49:58 GMT

Updates to parse-uri and build-uri functions

Pull request #245 created #created-245

16 Nov at 16:29:43 GMT
Updates to parse-uri and build-uri functions

Complete actions QT4CG-011-04 and QT4CG-011-05.

Issue #222 closed #closed-222

16 Nov at 11:52:37 GMT

Sequence comparison (starts, ends, contains) - issues 94, 96

Issue #244 closed #closed-244

16 Nov at 11:52:14 GMT

Sequence comparison (starts, ends, contains) - issues 94, 96 (fix 222)

Issue #80 closed #closed-80

16 Nov at 11:50:46 GMT

[FO] fn:while (before: fn:until)

Pull request #244 created #created-244

16 Nov at 11:35:32 GMT
Sequence comparison (starts, ends, contains) - issues 94, 96 (fix 222)

This is another attempt to fix the merge conflicts in #222

Issue #243 closed #closed-243

16 Nov at 11:29:23 GMT

Sequence comparison (starts, ends, contains) - issues 94, 96 with merge conflicts resolved

Pull request #243 created #created-243

16 Nov at 11:12:47 GMT
Sequence comparison (starts, ends, contains) - issues 94, 96 with merge conflicts resolved

I believe this is PR #222 with merge conflicts against the current master branch resolved.

Issue #228 closed #closed-228

16 Nov at 10:56:46 GMT

Updates to make FO valid

Issue #202 closed #closed-202

16 Nov at 10:02:51 GMT

NW for MK: subtyping (196)

Issue #242 created #created-242

16 Nov at 00:57:27 GMT
Coercion rules used to convert function result to expected type

We have separated the rules for static and dynamic function calls, which was necessary because static calls can now use keywords, whereas dynamic calls can't.

The new rules say that the coercion rules are used to convert the result to the expected result type in a dynamic call, but they fail to say that this also happens on a static call.

It would be nicer (and less error prone) if we could combine the rules. Ideally we should define that a static function call F(A, B, C) is equivalent to the dynamic call F#3(A, B, C). This in turn would be easier if we can find a way to extend function items to have optional parameters and to permit keyword arguments.

Note also that the definition of the term "coercion rules" is deficient. It doesn't say what the rules are, it just gives an example of one situation where they are used.

Issue #153 closed #closed-153

15 Nov at 20:48:25 GMT

Explicitly mention subtypes for arrays and maps

QT4 CG meeting 011 draft minutes #minutes-11-15

15 Nov at 18:00:42 GMT

Draft minutes published.

Issue #230 closed #closed-230

15 Nov at 17:52:26 GMT

New proposed text resolving issue 71 (Guarded expressions)

Issue #72 closed #closed-72

15 Nov at 17:52:03 GMT

[FO] Provide better support for URI processing within an expression

Issue #215 closed #closed-215

15 Nov at 17:52:02 GMT

First attempt at parse-uri and build-uri functions

Issue #207 closed #closed-207

15 Nov at 17:50:36 GMT

Issue 1. New expanded-QName function; new fn:QName#1 variant

Issue #210 closed #closed-210

15 Nov at 17:30:22 GMT

Issue 80: fn:iterate-while (before: fn:while)

Issue #197 closed #closed-197

15 Nov at 17:29:44 GMT

NW for MK: Variadicity (166)

Issue #241 created #created-241

15 Nov at 11:23:14 GMT
Functions integer-to-string and string-to-integer with radix

I propose (in response to a suggestion from Joel Kalvesmaki on the XML Slack) two functions

fn:integer-to-string($value as xs:integer, $radix as xs:integer := 10) as xs:string

fn:string-to-integer($value as xs:string, $radix as xs:integer := 10) as xs:integer

that produce/accept string representations of signed integers in a base other than 10. Proposed range for $radix is 2 to 32.

Note: I considered extending fn:format-integer(), but the result looked clumsy.

Issue #240 closed #closed-240

14 Nov at 10:17:38 GMT

Subtyping

Pull request #240 created #created-240

13 Nov at 22:41:57 GMT

Subtyping

Issue #239 created #created-239

13 Nov at 11:50:39 GMT
Terminology concerning function items and their access to static and dynamic context

See this StackOverflow question: https://stackoverflow.com/questions/74408887

The language describing how the static and dynamic context are bound in function items delivered by expressions such as xs:QName#1 or xs:QName(?) is pretty impenetrable, and I don't think it is very accurate. For example, for the latter case, the relevant rule asks "If [the implementation of F] is not an XPath 3.1 expression..." which seems to suggest that the behaviour varies depending on whether the implementation of xs:QName() is written in XPath or in some other language, which cannot be right.

For named function references, the spec talks of " if the function is context dependent, then the returned function is associated with the static context of the named function reference and the dynamic context in which it is evaluated. " but what exactly does it mean for the function to be context dependent?

The behaviour of partial function application is said to depend on whether "F's implementation is already associated with contexts", and if F was the result of a named function reference, then this seems to depend on whether the [static] function that's the target of the reference is context dependent, but how is that determined?

This issue is raised essentially to take note that there's a lot of flakiness in the specification in this area. It's not obvious what the solution is, but there is surely scope for making things much clearer than they are currently.

Issue #238 created #created-238

12 Nov at 08:30:38 GMT
Support Invisible XML

I propose that we support Invisible XML by means of a function

fn:invisible-xml($grammar as xs:string) as (function($string) as document-node())

The function takes as input a string defining an invisible XML grammar in ixml format, and returns as output a function that can be used to parse strings conforming to that grammar, converting them into XDM document nodes.

As a "dog-food" use case, we could use this for rendering function signatures in the F&O specification. Rather than using manual markup to define the signature of each function, we could define an IXML grammar for function signatures, and use this as the basis for formatting the representation in the spec. This would be particularly beneficial as we start to introduce more complex signatures involving record types.

Pull request #237 created #created-237

11 Nov at 22:28:24 GMT
Issue167 xsl conditionals

Update following CG review on 8 November; improvements to notes and examples, no substantive change.

Issue #236 created #created-236

11 Nov at 12:37:17 GMT
map:build: sequence of keys

In resolving PR #203 to address issue #151, the group decided not to immediately address all of the concerns raised by Martin Honnen in an email discussion where he wrote:

When map:group-by was introduced I found the restriction of a single key
instead of a sequence of keys unnecessarily restrictive (some people on
Slack agreed), the same appears in my view to be the case for the new
map:build, I think it could be easily adapted to handle a sequence of
keys by using e.g.

  fold-left($input, map{}, ->($map, $next) {

   fold-left($key($next), $map, ->($map, $key) {
    let $nextValue := $value($next) return
    if (map:contains($map, $key))
    then map:put($map, $key, $combine($map($key), $nextValue))
    else map:put($map, $key, $nextValue)})
  }
)

as the implementation body/definition of the result of

map:build(
$input    as item()*,
$key    as function(item()) as xs:anyAtomicType*    := fn:identity#1,
$value    as function(item()) as item()*    := fn:identity#1,
$combine    as function(item()*, item()*) as item()*    := fn:op(',')
) as map(*)


Use case in my view is the classical example of the XSLT 3 spec where in
e.g.

<titles>
     <title>A Beginner's Guide to <ix>Java</ix></title>
     <title>Learning <ix>XML</ix></title>
     <title>Using <ix>XML</ix> with <ix>Java</ix></title>
</titles>

you want to group the "title" elements by the "ix" child elements, with
the proposed change above that would give e.g.

map {
   "Java": (<title>A Beginner's Guide to <ix>Java</ix>
</title>, <title>Using <ix>XML</ix> with <ix>Java</ix>
</title>),
   "XML": (<title>Learning <ix>XML</ix>
</title>, <title>Using <ix>XML</ix> with <ix>Java</ix>
</title>)
}

We'll come back to this later.

Creating this issue satisfies action QT4CG-008-05 on NW.

QT4 CG meeting 011 draft agenda #agenda-11-15

11 Nov at 12:30:12 GMT

Draft agenda published.

Issue #177 closed #closed-177

11 Nov at 12:00:05 GMT

Items before etc

Issue #166 closed #closed-166

11 Nov at 11:56:18 GMT

Variadicity

Issue #196 closed #closed-196

11 Nov at 11:55:52 GMT

Subtyping

Issue #199 closed #closed-199

11 Nov at 10:03:45 GMT

NW for MK: Items before, etc. (177)

Issue #96 closed #closed-96

10 Nov at 21:41:11 GMT

[XPath] Functions that determine if a given sequence starts with another sequence or ends with another sequence

QT4 CG meeting 010 draft minutes #minutes-11-08

10 Nov at 17:37:37 GMT

Draft minutes published.

Issue #235 created #created-235

10 Nov at 09:36:27 GMT
Add multiple=true() option to fn:parse-json and fn:json-doc

It is common practice (though not, I believe, covered by any standard) to have files that contain multiple JSON objects. Often these will be arranged one per line, as in our own qt3tests use case R31 at https://github.com/w3c/qt3tests/blob/master/app/UseCaseR31/sales.json . In that example, the file can be parsed using unparsed-text-lines()!parse-json(). But in the more general case, where each object may itself be multi-line, there's no easy way of handling this.

I propose an option multiple=true() on fn:parse-json and fn:json-doc that enables parsing of an input containing multiple (zero or more) concatenated JSON texts. When this option is present, the result will always be delivered as an array, containing one member for each JSON text in the input. The wrapper array will be present even if the number of JSON texts in the input is zero or one.

If a JSON text ends with a letter or digit and the next JSON text starts with a letter or digit then they must be separated by whitespace.

Issue #182 closed #closed-182

09 Nov at 18:20:17 GMT

Should we allow vendor-defined optional parameters on built-in functions?

Issue #234 created #created-234

08 Nov at 20:56:21 GMT
If Without Else

This is based on the discussions in today's QT4CG meeting, as suggested by @dnovatchev.

Use Case

It is common to have an if condition where the else branch does nothing (i.e. yields the empty sequence). In BaseX, this requirement has lead to them making the else branch optional.

It can also be conceivable to want to elide the then branch as a way of not negating the if condition expression.

In the XSLT 4.0 draft, both of these are possible in the changes to support optional @then (or possibly @select) and @else attributes. Part of this is due to requiring backward compatibility with XSLT 3.0 that only allows the then branch within child elements. It would be nice to be able to support this in XPath and XQuery for parity between the languages.

Design

The rationale for not allowing an optional else is to avoid the dangling else problem from C and other languages that have optional else statements. That is:

if ($c) then
    if ($d)
    then 1
    else 2

is ambiguous as the else could be part of the $d if statement or the $c if statement. This would require parenthesis around the if statement to resolve the ambiguity, but the syntax should not require that. -- That is, it should be clear to the reader what the if statement will do.

As such, I propose the following variants:

(1) An if statement with a then and else expression -- currently supported:

if ($c) then 1 else 2

(2) An if statement with an else expression, but not a then expression -- new, should be unambiguous:

if ($c) else 2

(3) An if statement with a then expression, but not an else expression -- to resolve the dangling else, I propose to use return instead of then:

if ($c) return 1

This works analogously to the existing FLWOR, switch, typeswitch and other expressions that use return to denote the return/result expression.

For the dangling else, both cases are clear:

(a) when the $c (outer) if has the elided else expression:

if ($c) return
    if ($d)
    then 1
    else 2

(b) when the $d (inner) if has the elided else expression:

if ($c)
then if ($d) return 1
else 2

Syntax

Replace:

IfExpr ::= "if"  "("  Expr  ")"  "then"  ExprSingle  "else"  ExprSingle

with:

IfExpr ::= IfClause (( ThenClause? ElseCause ) | ReturnClause)
IfClause ::= "if"  "("  Expr  ")"
ThenClause ::= "then"  ExprSingle
ElseClause ::= "else"  ExprSingle

Design note: There is a ReturnClause symbol that is defined as "return" ExprSingle. Because of this, and to indicate that the then and else parts are not expressions, I've opted for the term clause. -- This matches the use in FLWORExpr, SwitchExpr, TypeswitchExpr, etc. that all use the term clause. I've also separated them out to make the IfExpr more readable now that it has optional parts.

Semantics

Replace:

The expression following the if keyword is called the test expression, and the expressions following the then and else keywords are called the then-expression and else-expression, respectively.

with:

The expression between the parenthesis after the if keyword is called the test expression.

The expression after the then or return keyword is called the then-expression. If this is missing, it defaults to the empty sequence.

The expression after the else keyword is called the else-expression. If this is missing, it defaults to the empty sequence.

Those should be the only required changes.

Issue #233 created #created-233

08 Nov at 14:25:09 GMT
Declare the result type of a mode, via @as

Initial message to support discussion on the proposal for the introduction of @as in <xsl:mode> to define the output of a particular mode. Current: https://qt4cg.org/specifications/xslt-40/Overview-diff.html#mode-result-type

QT4 CG meeting 010 draft agenda #agenda-11-08

07 Nov at 17:20:10 GMT

Draft agenda published.

Pull request #232 created #created-232

05 Nov at 19:29:48 GMT
Issue 225 - Data model clarifications

This PR attempts to clean up some of the definitions of concepts in the data model spec, including fixing some minor errors.

Issue #231 created #created-231

05 Nov at 18:42:56 GMT
for expression: "at" keyword

In a the comment thread to #181 it was noticed that to be in line with that proposal, we also need to be able to access the index of any member variable within a "for expression".

This can be done in various ways, such as with an index() function, as in the code example below:

for $x in (1,  3, 5, 7, 9, 11),
    $y in (1,  3, 5, 7, 9, 11)
 return
     if(index($x) le index($y)) then [$x, $y]
       else ()

But fortunately, in XQuery there is already the "at" keyword serving exactly the same purpose. Thus the above expression can be written as:

for $x at $ind-x in (1,  3, 5, 7, 9, 11),
    $y at $ind-y in (1,  3, 5, 7, 9, 11)
 return
     if($ind-x le $ind-y) then [$x, $y]
       else ()

Thus, I am proposing just to allow the "at" keyword in XPath for-expressions.

Pull request #230 created #created-230

03 Nov at 11:33:58 GMT
New proposed text resolving issue 71 (Guarded expressions)

The discussion on issue 71 was very wide-ranging and went off into a number of tangents. This PR attempts to resolve the issue initially raised (order of evaluation of predicates) in a general way by introducing the notion of guarded expressions. This also provides a proposed solution to the question of "short-cutting" and/or expressions - essentially, an optimiser can change the order of evaluation, but doing so must not introduce dynamic errors that would not occur in a short-cutted evaluation.

QT4 CG meeting 009 draft minutes #minutes-11-01

01 Nov at 17:30:00 GMT

Draft minutes published.

Issue #214 closed #closed-214

01 Nov at 15:20:11 GMT

FLWOR without FL

Issue #229 created #created-229

31 Oct at 19:02:26 GMT
Proposal: Add the missing functions for arrays: array:exists() and array:empty()

The functions over sequences: fn:exists and fn:empty are amongst the most useful and well-understood sequence functions.

In case we need to know whether or not an array has at least one member, or none, these function cannot help us:

image


To fill this gap, these two functions are proposed:

array:exists:

Signature

array:exists($arg as array(*)) as xs:boolean

Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

If the value of $arg is a non-empty array, the function returns true(); otherwise, the function returns false().

Examples

The expression array:exists(array:remove(["hello"], 1) returns false().

The expression aray:exists(array:remove(["hello", "world"], 1) returns true().

The expression aray:exists([]) returns false().

The expression aray:exists([ () ]) returns true().

array:empty:

Signature


array:empty($arg as array(*)) as xs:boolean

Properties

This function is ·deterministic·, ·context-independent·, and ·focus-independent·.

Rules

If the value of $arg is the empty array, the function returns true(); otherwise, the function returns false().

Examples

The expression array:empty(array:remove(["hello"], 1) returns true().

The expression aray:empty(array:remove(["hello", "world"], 1) returns false().

The expression aray:empty([]) returns true().

The expression aray:empty([ () ]) returns false().

Pull request #228 created #created-228

31 Oct at 17:07:00 GMT
Updates to make FO valid

This PR doesn't (intentionally) make any substantive changes, it simply fixes the FO spec and the tools that build it so that they produce valid markup. We have accidentally been running without validation and an unpleasantly large number of errors have crept in.

I made some ad hoc changes to the DTDs (already committed to themaster branch) where it seemed like it would be better to allow the new markup (var in code, p in td, etc.) than attempt to redraft the specification in some new way.

But that left a bunch of places where p or other wrappers were left out, a number of places where HTML markup instead of xmlspec was inserted, some broken links, etc.

Merging this into the repository while we have active PRs in flight for F&O is going to be ugly so I'm not proposing to merge this right away.

We should probably pause new PRs against FO until this has been sorted out.

Issue #227 closed #closed-227

31 Oct at 16:22:48 GMT

Add the Data Model specification; rename Serialization

Pull request #227 created #created-227

31 Oct at 16:16:25 GMT
Add the Data Model specification; rename Serialization
  1. Added the Data Model specification
  2. Renamed the Serialization specification from 3.1 to 4.0
  3. Reworked the way the cross reference index files (the /etc/ directory) are handled. The 4.0 versions are now generated every time, and should not be committed to the repository. (I haven't removed the old ones yet because this PR build will fail if I do. They're coming out next.)

I expect that the old ant build scripts will no longer work. I'm probably going to pull all that infrastructure out Real Soon Now™ as we've been working successfully with the new system.

Issue #226 closed #closed-226

31 Oct at 15:24:23 GMT

Add the Data Model specification; rename Serialization

Pull request #226 created #created-226

31 Oct at 15:13:30 GMT
Add the Data Model specification; rename Serialization

ATTENTION

There are a whole bunch of markup errors in the F&O specification. The build system has not been validating it before publication 😞 😢 😭 . I have fixed them in this PR. I fixed the markup where it was clear how to do that. I hacked at the DTDs to allow some new markup patterns. It's kind of a mess and if you've got an open PR on the F&O spec, I'm afraid it's going to be painful.

Aside from that horror show (hey, it's Halloween, maybe it's appropriate), I:

  1. Added the Data Model specification
  2. Renamed the Serialization specification from 3.1 to 4.0
  3. Reworked the way the cross reference index files (the /etc/ directory) are handled. The 4.0 versions are now generated every time, and should not be committed to the repository.

I expect that the old ant build scripts will no longer work. I'm probably going to pull all that infrastructure out Real Soon Now™ as we've been working successfully with the new system.

Issue #225 created #created-225

30 Oct at 07:40:50 GMT
[XDM] Terminology around "Atomic value" and "Type Annotation"

In response to the comments (see https://github.com/qt4cg/qtspecs/pull/202) against issue 196, I propose that we tighten up the terminology associated with atomic values.

In XDM, §2.7.5 says:

An atomic value can be constructed from a lexical representation. Given a string and an atomic type, the atomic value is constructed in such a way as to be [consistent with schema validation]. If the string does not represent a valid value of the type, an error is raised. When xs:untypedAtomic is specified as the type, no validation takes place. The details of the construction are described in [Section 18 Constructor functions ] and the related [Section 19 Casting ] section of [[XQuery and XPath Functions and Operators 3.1]].

The actual definition of "atomic value" is found in 2.1 Terminology and reads:

[Definition: An atomic value is a value in the value space of an [atomic type] and is labeled with the name of that atomic type.]

(Oddly, this isn't linked from 2.7.5)

There's another little issue here, which is that an atomic value created by atomizing a schema-validated node may end up having an anonymous type, in which case its most specific type is inexpressible using XPath ItemType syntax. Is the type annotation in this case an atomic type, or is it the name of an atomic type. And should we avoid assuming that the concept of "item type" is something synonymous with the ItemType construct in the XPath grammar? As a start, I would prefer to say that the type annotation is a type, rather than a name.

In a non-normative note in §2.7, XDM also says:

Values including element and attribute nodes, and atomic values, have a property called a type annotation whose value is a type: this is a reference to a type definition in the Schema Component Model.

It goes on to say (normatively, but rather informally):

Every [item] in the data model has both a value and a type. In addition to nodes, the data model can represent atomic values like the number 5 or the string “Hello World.” For each of these atomic values, the data model contains both the value of the item (such as 5 or “Hello World”) and its type. The property that holds the type is sometimes referred to as the type annotation: its value is a type definition component as defined in the Schema Component Model. This may be a built-in type (a type with a name such as xs:integer or xs:string), or a user-defined type.

This statement is misleading in a number of ways. Firstly, some items such as empty maps and arrays conform to many types, but they do not have a single defining type that trumps all the others. Secondly, the way the term "type annotation" is introduced fails to make clear that the type annotation of a node is something quite different from the type annotation of an atomic value; if an element named N has a type annotation of my:part-number, then its (most specific) type is element(N, my:part-number), while if an atomic value has a type annotation of my:part-number, then its most specific type is my:part-number. It also fails to make it clear that items like maps and arrays do not have a type annotation; their type is inferred from their content.

The fuzziness of some of these definitions makes it very difficult to be sufficiently formal elsewhere in the language. For example in 4.0 we're proposing to allow "down-casting" (or "relabelling") of atomic values in the coercion rules, and it's very hard to describe this operation formally without a better model.

I would like to start by changing the definition to

An atomic item (also known as an atomic value) is a pair (T, D) where T (the "type annotation") is an atomic type, and D (the "datum") is a point in the value space of T.

followed by references to atomic types and value spaces as concepts defined in XSD.

I don't expect we will want to use the term "datum" very often, but it's useful to have a name for the concept when we need it. We currently tend to call it the "value" which is very confusing, because if the (T, D) pair is a value, then D can't also be a value.

QT4 CG meeting 009 draft agenda #agenda-11-01

28 Oct at 08:21:36 GMT

Draft agenda published.

Issue #224 created #created-224

28 Oct at 09:01:41 GMT

Infrastructure changes/improvements

Issue #223 closed #closed-223

27 Oct at 16:04:10 GMT

Allow eg in fos:expression and fos:result

Pull request #223 created #created-223

27 Oct at 15:59:51 GMT
Allow eg in fos:expression and fos:result

This PR formats eg in fos:expression and fos:result as a block. I don't think this will have any detrimental effects as it's done during the "merging" process.

Issue #217 closed #closed-217

26 Oct at 08:32:40 GMT

Fix issue 41 (grammar bug in Typeswitch)

Pull request #222 created #created-222

25 Oct at 21:56:48 GMT
Sequence comparison (starts, ends, contains) - issues 94, 96

Add 3 functions starts-with-sequence, ends-with-sequence, contains-sequence as per issues #96 and part of #94

Issue #221 created #created-221

25 Oct at 20:35:33 GMT
Expose op:same-key() as a user-visible function

In issues #94, #96, and #99 a set of functions for comparing sequences are proposed; all of them are parameterized with a function for comparing items.

A useful function for comparing items is op:same-key() (useful because it is error-free, context-free, and transitive). For this and other reason, it would be useful to expose op:same-key() as a user-visible function.

The only tricky question is what to call it. I suggest "atomic-equal".

Alternatively, we could overload the "is" operator to compare atomic values using op:same-key(), and use op("is") as the argument to the sequence comparison functions.

QT4 CG meeting 008 draft minutes #minutes-10-25

25 Oct at 16:47:40 GMT

Draft minutes published.

Issue #219 closed #closed-219

25 Oct at 11:45:33 GMT

Options on fn:deep-equal

Issue #220 created #created-220

25 Oct at 09:09:57 GMT
Encapsulation

Let's suppose that I want to provide a library to do complex number arithmetic, and I want to hide how complex numbers are actually implemented (it could be an array of two doubles, or a map containing two doubles, it could be in polar coordinates, etc).

How might I go about doing this? One approach is like this (please be patient):

module namespace complex...;

declare %public type alias complex:number as function(*);

declare %private variable SECRET := <e/>;

declare $private function complex:wrap($content as record(re as xs:double, im as xs:double)) as complex:number {
    function($key as node()) {if ($key is $SECRET then $content else error());
};

declare $private function complex:unwrap($cx as complex:number) as record(re as xs:double, im as xs:double) {
   $cx($SECRET);
};

declare %public function complex:new($re as xs:double, $im as xs:double) as complex:number {
  complex:wrap(map{'re':$re, 'im':$im})
}

declare %public function complex:real($x as complex:number) as xs:double {
  complex:unwrap($x)?re
}

declare %public function complex:add($x as complex:number, $y as complex:number) {
  etc;
}

What's going on here? We're representing a complex number as an arity-1 function; the function "unlocks" the complex number to reveal its internal implementation, but it can only be called if you know the secret argument to pass, and this is encapsulated within the library module that implements the functionality. So we've successfully encapsulated the implementation, and we've done it without any data model changes. Within the provider module, the functions wrap() and unwrap() are available to convert from the internal representation to the external representation; these functions are not available to the caller.

It works, but it's hardly elegant.

From the point of view of the user of the library, it's quite usable: they just import the library and call the functions. They probably don't care that the type complex:number is actually a function, and if they do care, there's not much they can do with the knowledge.

From the point of view of the author of the library, there's a lot of boilerplate. One thing we could do to improve matters would be to provide some syntactic sugar. Instead of a module declaration, use a class declaration:

class complex:number wraps record(re as xs:double, im as xs:double);

declare %public function complex:real($x as complex:number) as xs:double {
  complex:unwrap($x)?re
}

declare %public function complex:add($x as complex:number, $y as complex:number) {
  etc;
}

The boilerplate, especially the wrap and unwrap functions, is now implicitly declared.

And we could provide "import class" for the caller as an alternative to "import module".

We could go further and give the magic function used to represent an encapsulated value some kind of special status in the data model. But do we need to?

Issue #219 created #created-219

24 Oct at 14:11:51 GMT
Options on fn:deep-equal

The function fn:deep-equal makes some arbitrary decisions about how values are compared, and many applications need slightly different comparison semantics. This issue proposes adding an options parameter to customise the comparison rules.

The following options are proposed (detailed syntax TBA):

  • For comparing atomic values, use op:same-key rather than eq. Note this implies ignoring the collation.
  • When comparing children of a node, don't ignore comments
  • When comparing children of a node, don't ignore processing instructions
  • When comparing children of an element, ignore whitespace text nodes (except where the element node has a simple type)
  • When ignoring comments and processing instructions, merge adjacent text nodes
  • When comparing strings (etc), perform Unicode normalization
  • When comparing strings (etc), perform whitespace normalization
  • Compare as if untyped (ignore type annotations and typed value)
  • Compare as typed (type annotation must match)
  • Require the same in-scope namespace bindings on every element
  • Require elements and attributes to have matching prefixes
  • Compare the is-ID and is-IDREF properties on elements and attributes

QT4 CG meeting 008 draft agenda #agenda-10-25

22 Oct at 09:49:46 GMT

Draft agenda published.

Issue #218 created #created-218

19 Oct at 09:11:13 GMT
Function library for maps with composite keys: and thoughts on encapsulation

I propose creating a function library that handles maps with composite keys. I'll start by defining a set of functions, then we'll consider how to package these, in particular issues of encapsulation and types. Even it we don't want to implement this particular feature, it might help to generate infrastructure that makes it easier to create such function libraries.

For the sake of a name, I'll call the data structure we are manipulating an atlas. (Like a map, but generalized.) An atlas is a mapping from sequences of atomic values to values. Two sequences of atomic values are equal if they are the same length and their constituent atoms are pairwise equal according to op:same-key(). Note that keys are variable length and it's permitted to have both ("a") and ("a, b") as entries in the atlas.

We'll start by defining the API we want to offer. The semantics for these functions are essentially the same as the corresponding functions for maps. The type of $key is always xs:anyAtomicType*. We'll discuss the type of $atlas later.

  • atlas:build($input, $keyFunction, $valueFunction, $onDuplicates) => atlas

  • atlas:get($atlas, $key) => value

  • atlas:contains($atlas, $key) => value

  • atlas:put($atlas, $key, $value) => atlas

  • atlas:keys($atlas) => array(key)

  • atlas:for-each($atlas, function(key, value)) => value

  • atlas:size($atlas) => integer

plus some more specific functions:

  • atlas:get-branch($atlas, $partial-key) => atlas

returns an atlas containing all those entries in the supplied atlas whose keys start with the supplied partial-key; the returned atlas contains the remaining parts of the keys.

  • atlas:put-branch($atlas, $partial-key, $atlas branch) => atlas

Similarly, grafts one atlas as a subtree into another.

It's not too hard to come up with an implementation of this API that uses nested maps. For example: An atlas-node is a map from atomic-values to record(leaf? as value, branch? as atlas-node), where leaf contains the value if the atlas contains the relevant key in full, while branch contains a nested atlas if the atlas contains nodes starting with the relevant prefix; either or both may be present.

Now the tricky part: how to handle encapsulation. There are two parts to this: (a) defining a type that can be used to represent an "atlas" in the API, and (b) preventing users subverting the API by accessing the implementation objects (atlas-nodes) directly.

There are features we can build on: the proposal for named item type aliases, and the sketchy definition of "external objects" in XSLT.

We could handle the first part by allowing the declaration of a named item type (or alias) to specify something like visibility=closed. So the function library comes with an item type whose name is atlas:atlas, and this type name is available for use by anyone who imports the function library, but they never get to know exactly what an atlas:atlas is other than the fact that it's an item, so they can never know what other functions they might be able to call on it.

But that doesn't stop them guessing (or asking, using if ($atlas instance of map(*) interrogatives).

To deal with that, we could consider expanding the existing half-defined concept of an "external object". XSLT 3.0 §24.1.3 sketches out the idea:

An implementation may allow an extension function to return an object that does not have any natural representation in the XDM data model, whether as an atomic value, a node, or a function item. For example, an extension function sql:connect might return an object that represents a connection to a relational database; the resulting connection object might be passed as an argument to calls on other extension functions such as sql:insert and sql:select.

The way in which such objects are represented in the type system is implementation-defined. They might be represented by a completely new datatype, or they might be mapped to existing datatypes such as integer, string, or anyURI.

So atlas could be an external object type. The module that implements the function library could contain private methods that wrap a map as an atlas, or that unwrap an atlas to reveal the underlying map. I think the only additional support needed is a construct to wrap a value as an encapsulated object of a given type, and to unwrap an encapsulated object to return the original value.

Note: I can imagine the same mechanism being useful to handle "parcels", and no doubt we will find many other applications for it.

Pull request #217 created #created-217

19 Oct at 00:00:25 GMT
Fix issue 41 (grammar bug in Typeswitch)

Fixes the broken typeswitch grammar pointed out in issue #41

Issue #83 closed #closed-83

18 Oct at 23:35:21 GMT

[XPath]Proposal: Notation for using an operator as a function

QT4 CG meeting 007 draft minutes #minutes-10-18

18 Oct at 17:01:35 GMT

Draft minutes published.

Issue #206 closed #closed-206

18 Oct at 17:10:05 GMT

Corrections to math:atan2 identified in issue 5.

Issue #151 closed #closed-151

18 Oct at 17:09:22 GMT

map:build() function

Issue #203 closed #closed-203

18 Oct at 17:09:07 GMT

Issue 151: map:build

Issue #200 closed #closed-200

18 Oct at 17:06:53 GMT

NW for MK: Drop xsl:match instruction (185)

Issue #185 closed #closed-185

18 Oct at 16:04:02 GMT

Issue118 drop xsl match

Issue #165 closed #closed-165

18 Oct at 15:26:45 GMT

Keyword arguments: ":=" or ":"?

Issue #212 closed #closed-212

18 Oct at 10:52:15 GMT

Loss of OS-specific line breaks

Issue #216 created #created-216

18 Oct at 10:48:38 GMT
fn:unparsed-text: End-of-line characters

UPDATED PROPOSAL (2023-10-31):

Change the default behavior and normalize the input as known from XML (https://www.w3.org/TR/xml/#sec-line-ends). Even though backward-incompatible, the change should be noninvasive enough, and it allows us to dispense with adding options to this and other functions.


ORIGINAL PROPOSAL:

Motivation

fn:unparsed-text can only be used if the input is valid. Next, the result may contain OS-specific newlines that may need to be normalized in a subsequent step. Two additional parameters might simplify things:

  • normalize: normalizes line endings in the input, as done by fn:doc, fn:parse-xml and fn:parse-xml-fragments and as defined in https://www.w3.org/TR/xml/#sec-line-ends (inspired by the discussion in #212).
  • fallback: Process characters that are not valid in the version of XML supported by the implementation (inspired by fn:parse-json)

Signature

fn:unparsed-text(
  $href    as xs:string?,
  $options as map(*)
) as xs:string?

Rules

The entries that may appear in the $options map are as follows:

Key | Meaning -- | -- encoding | The name of an encoding … (see existing rules) normalize-newlines | false by default. If set to true, two-character sequences #xD#xA and any #xD that are not followed by #xA are translated to single #xA characters. fallback | A function which is called when the input contains an escape sequence that represents a character that is not valid […] (see fn:parse-json).

Pull request #215 created #created-215

17 Oct at 16:00:35 GMT
First attempt at parse-uri and build-uri functions

Close #72

This is a first attempt to resolve issue 72 by describing parse-uri and build-uri functions.

I decided to try an approach that leverages the ABNF defined in RFC 3986. Attempting to redefine the rules just opens up the possibility that the rules we write will differ from the rules in RFC 3986 which would be wrong.

I added an options map and specified a "details" option that will further parse the path and query components. This gives the caller easy access to the percent-decoded forms of those elements without making the parse function irreversable.

Notes:

  1. More examples are needed
  2. I tried to use eg in the results so that they'd be formatted nicely, but it didn't work. Is there any markup for verbatim text in examples?
  3. There's an open question about whether non-hierarchical URIs can have fragment identifiers or queries. I think not, but I need to review the RFC a little more carefully.

Issue #50 closed #closed-50

17 Oct at 09:20:41 GMT

[XPath] Introduce the lookup operator for sequences

Issue #214 created #created-214

16 Oct at 19:48:27 GMT
FLWOR without FL

Resulting from thoughts in https://github.com/qt4cg/qtspecs/pull/210: Do we want to make let and for optional?

From the query optimization perspective, this seems reasonable as especially let clauses may become obsolete after inlining variables.

From a user point of view, I’m not convinced that the resulting syntax would be intuitive:

return 123

…could indicate that the return keyword has some special meaning, or people might be confused that/why it's equivalent to 123, or night start using the keyword esp. in function bodies.

where test()
return 'ok'

…could be used as alternative for an if expression with any empty else branch; but people might be confused why if cannot be used without else.

What do others think?

Issue #213 created #created-213

16 Oct at 18:50:56 GMT
Lookup/Indexing operator for sequences (supersedes #50)

This proposal attempts to take over where issue #50 left off: that issue contains a lengthy discussion and many alternative suggestions, and seemed to end with a concrete proposal which I summarise here. I propose that issue #50 now be closed.

The proposal is for an expression which I will call a subscript-expression, taking the form

SubscriptExpression ::= ExprSingle "[#" Expr "]"

The first operand evaluates to an arbitrary sequence. The second operand evaluates to a sequence of integers (or is coerced to a sequence of integers using the coercion rules). Both operands are evaluated in the context of the containing expression (despite the similarity to filter expressions, the predicate does not have its own focus).

The result of the expression A [# B] (assuming B is indeed a sequence of integers) is for $i in B return A[$i].

Examples

$input[#1] - this is synonymous with $input[1]

$input[#1 to 5] - equivalent to $input[position() = (1 to 5)]

$input[#reverse(1 to 5)] - returns the first 5 items in reverse order

The main differences from the existing A[B] syntax are:

(a) there is no overloading, the semantics do not depend on the dynamic type of B.

(b) the value of the predicate can be a sequence of integers, not just a single integer

(c) the focus for the predicate is the same as the outer focus, so expressions such as *[#1 to count(*) idiv 2] make sense.

Alternative 1: use the syntax A #[ B ].

Alternative 2: use a function items-at(A, B)

Issue #212 created #created-212

16 Oct at 18:25:36 GMT
Loss of OS-specific line breaks

When an XML document is parsed, CR is discarded and replaced by LF. The serialized result won’t have CR either:

let $CR := '&#xd;', $LF := '&#xa;'
let $input := '<a>' || $CR || '</a>'
let $xml := parse-xml($input)
let $output := serialize($xml)
return (
  contains($xml, $CR), contains($output, $CR),
  contains($xml, $LF), contains($output, $LF)
)

This is no proposal, but something we should keep in mind when discussing #101 and #121: Do we want to pursue and motivate an asymmetric approach by focusing on the processed and output data (i.e. insertion of CR characters in strings; new serialization parameter for custom newline strings), or should we rather strive for a solultion that also takes care of the input/import? Which could be:

  1. Provide newline parameters for fn:parse-xml, fn:doc, fn:collection, possibly others; or
  2. Use OS-specific newline strings

Interestingly, CR is preserved if fn:unparsed-text is used.

QT4 CG meeting 007 draft agenda #agenda-10-18

14 Oct at 16:44:32 GMT

Draft agenda published.

Issue #209 closed #closed-209

14 Oct at 16:29:48 GMT

Modify test generator to reference a real source file

Issue #211 created #created-211

14 Oct at 10:20:34 GMT
XSLT streaming: capturing accumulators

A Saxon extension to XSLT streaming that has proved very convenient is the notion of a capturing accumulator, indicated by setting xsl:accumulator/@capture = yes.

From the Saxon documentation:

This attribute may be set on an xsl:accumulator-rule element. It has no effect unless the accumulator is declared streamable and the rule has phase="end". It is intended for use when the accumulator rule matches an element node. The value is a boolean with the default "no".

The effect of setting this attribute is that the code for computing the new accumulator value when the element end tag is encountered now has access to a snapshot copy of the matched element (as if taken using the fn:snapshot function), and is no longer required to be motionless.

This means that if you want access to the typed value or string value of an element, you can now get this directly with a rule that matches the element, rather than having to write rules that match the element's text node children.

It also opens up additional possibilities. For example:

If a large document has a short header section containing metadata, you can capture a copy of the header in an accumulator, and the header then becomes available throughout the rest of the document processing using the accumulator-after() function. If you want to produce a sorted index of terms that are marked up using glossary elements scattered throughout the document, you can define an accumulator that retains snapshot copies of all the glossary entries (use select="($value, .)" in the accumulator rule). At the end of processing, you can sort and group these glossary entries, and because they are snapshot copies retaining all the attributes of ancestors, you can link them to the id anchors of the containing sections. Note: the snapshot captured as a result of the use of saxon:capture differs from the result of the fn:snapshot() function in that accumulator values associated with the copied nodes are not retained in the snapshot.

We could consider making this the default, since its effect is essentially to remove an unnecessary restriction.

Pull request #210 created #created-210

13 Oct at 11:42:16 GMT

Issue 80: fn:iterate-while (before: fn:while)

Pull request #209 created #created-209

13 Oct at 11:28:41 GMT
Modify test generator to reference a real source file

Modify the generated test set to use a real source file. The source/content construct for inline documents works in the XSLT test suite but not in QT4.

Issue #208 closed #closed-208

13 Oct at 09:41:31 GMT

Fix the function finder

Pull request #208 created #created-208

13 Oct at 09:35:36 GMT
Fix the function finder

Per MK's request. :-)

I also fixed the thing where the function names are too long and there was no space between the select elements!

Pull request #207 created #created-207

12 Oct at 23:22:54 GMT

Issue 1. New expanded-QName function; new fn:QName#1 variant

Pull request #206 created #created-206

12 Oct at 22:04:52 GMT

Corrections to math:atan2 identified in issue 5.

Issue #205 created #created-205

12 Oct at 12:15:46 GMT
Make higher-order-function support mandatory

I think higher-order-functions are becoming a core part of XQuery and XSLT; we shouldn't encourage the mindset that they should be avoided because not all implementations will support them. I therefore propose dropping this as an optional feature; the relevant capabilities should become a core part of the language in 4.0, supported by all conformant implementations.

Issue #126 closed #closed-126

12 Oct at 10:33:57 GMT

Mathematical Operator Unicode Symbols

Issue #204 created #created-204

12 Oct at 10:07:49 GMT
Non-ascii alternative operator symbols

Appendix B.3 of the draft specification proposes that we provide non-ASCII synonyms for many of the operator symbols:

Operator | Symbol | Codepoint -- | -- | -- and | ∧ | x2227 or | ∨ | x2228 eq | ≐ | x2250 ne | ≠ | x2260 lt | ⋖ | x22D6 gt | ⋗ | x22D7 le | ≤ | x2264 ge | ≥ | x2265 div | ÷ | xF7 mod |   |   idiv | ⨸ | x2A38 union (|) | ∪ | x222A intersect | ∩ | x2229 except | ∖ | x2216 is | ≡ | x2261 << (precedes) | ≪ | x226A  >> (follows) | ≫ | x226B otherwise | ⊩ | x22A9 some | ∃ | x2203 every | ∀ | x2200 satisfies | ⧴ | x29F4

This issue is raised to enable discussion of this proposal.

I have to confess I'm going rather cool on the idea. Some of the proposed symbols are rather obscure, and they aren't always rendered very clearly on display; some can easily be confused with other symbols. It's going to take a lot of WG time to agree the details, and it will probably cause more usability problems than it solves.

I've always felt that it was high time for programming to break loose from ASCII, but there are probably good reasons it hasn't done so.

The fact that we have two sets of comparison operators (and that some of the symbols clash with XML reserved characters) doesn't help. Providing alternative ways of writing them can only add to the confusion.

I'm going to propose dropping this unless someone else wants to champion it.

Pull request #203 created #created-203

12 Oct at 06:30:58 GMT
Issue 151: map:build

Drops the proposed map:group-by function, replacing it with the more powerful map:build.

QT4 CG meeting 006 draft minutes #minutes-10-11

11 Oct at 16:47:33 GMT

Draft minutes published.

Issue #173 closed #closed-173

11 Oct at 16:56:50 GMT

Add specification for fn:op() (issue #83)

Issue #188 closed #closed-188

11 Oct at 16:56:00 GMT

Editorial

Issue #198 closed #closed-198

11 Oct at 16:40:57 GMT

NW for MK: fn:op (173)

Issue #201 closed #closed-201

11 Oct at 16:31:01 GMT

NW for MK: editorial (188)

Pull request #202 created #created-202

10 Oct at 09:11:04 GMT
NW for MK: subtyping (196)

MK: improve pres of XPath §3.7 (subtyping) and clarify notion of dynamic type

This PR is intended to be technically equivalent to #196.

I've reworked a series of MK commits so that they are independent; I fear we would wind up with merge conflicts otherwise and cleaning up the merge conflicts, without accidentally making some other change, seemed riskier than just teasing them apart and making them independent before we accept and merge them.

Pull request #201 created #created-201

10 Oct at 09:09:23 GMT
NW for MK: editorial (188)

This PR is intended to be technically equivalent to #188.

I've reworked a series of MK commits so that they are independent; I fear we would wind up with merge conflicts otherwise and cleaning up the merge conflicts, without accidentally making some other change, seemed riskier than just teasing them apart and making them independent before we accept and merge them.

Pull request #200 created #created-200

10 Oct at 09:08:40 GMT
NW for MK: Drop xsl:match instruction (185)

This PR is intended to be technically equivalent to #185.

I've reworked a series of MK commits so that they are independent; I fear we would wind up with merge conflicts otherwise and cleaning up the merge conflicts, without accidentally making some other change, seemed riskier than just teasing them apart and making them independent before we accept and merge them.

Close #119

Pull request #199 created #created-199

10 Oct at 09:07:18 GMT
NW for MK: Items before, etc. (177)

This PR is intended to be technically equivalent to #177.

I've reworked a series of MK commits so that they are independent; I fear we would wind up with merge conflicts otherwise and cleaning up the merge conflicts, without accidentally making some other change, seemed riskier than just teasing them apart and making them independent before we accept and merge them.

Pull request #198 created #created-198

10 Oct at 09:05:55 GMT
NW for MK: fn:op (173)

This PR is intended to be technically equivalent to #173.

I've reworked a series of MK commits so that they are independent; I fear we would wind up with merge conflicts otherwise and cleaning up the merge conflicts, without accidentally making some other change, seemed riskier than just teasing them apart and making them independent before we accept and merge them.

Pull request #197 created #created-197

10 Oct at 08:55:27 GMT
NW for MK: Variadicity (166)

This PR is intended to be technically equivalent to #166.

I've reworked a series of MK commits so that they are independent; I fear we would wind up with merge conflicts otherwise and cleaning up the merge conflicts, without accidentally making some other change, seemed riskier than just teasing them apart and making them independent before we accept and merge them.

Pull request #196 created #created-196

09 Oct at 19:08:57 GMT
Subtyping

Editorial. Improve the presentation of XPath section §3.7 on subtype relationships, including supplying some rules that were previously marked TODO. To enable this, migrate some DTD and stylesheet extensions for var elements previously used for the xslt spec only. Add some clarifications concerning dynamic type, see issue #191

Issue #195 closed #closed-195

09 Oct at 14:10:12 GMT

Added a CODEOWNERS file

Pull request #195 created #created-195

09 Oct at 12:36:29 GMT
Added a CODEOWNERS file

This should have the effect that the Contributors team will automatically be added as reviewers to new PRs. Plus a few odds and ends where I should be notified (changes to build stuff).

Issue #194 closed #closed-194

09 Oct at 09:50:37 GMT

More norm testing

Pull request #194 created #created-194

09 Oct at 09:18:56 GMT
More norm testing

Ignore this too.

Issue #193 closed #closed-193

09 Oct at 09:16:34 GMT

Moved these changes into the DeltaXML pipeline; just serialize to HTML 5

Pull request #193 created #created-193

09 Oct at 09:16:26 GMT

Moved these changes into the DeltaXML pipeline; just serialize to HTML 5

Issue #192 closed #closed-192

08 Oct at 18:07:20 GMT

This is a test. This is only a test.

Pull request #192 created #created-192

08 Oct at 17:00:02 GMT
This is a test. This is only a test.

Had this been a real emergency, we would have fled in terror and you would not have been informed.

Issue #191 created #created-191

08 Oct at 16:57:07 GMT
Definition of "dynamic type"

The term "dynamic type" is defined in §2.2.3.2 as

[Definition: A dynamic type is associated with each value as it is computed. The dynamic type of a value may be more specific than the [static type] of the expression that computed it (for example, the static type of an expression might be xs:integer*, denoting a sequence of zero or more integers, but at evaluation time its value may have the dynamic type xs:integer, denoting exactly one integer.)]

Now, for a start, this is a pretty poor definition. It gives some properties of a dynamic type, but it doesn't say what it actually is: for example, what range of values it can take. Is it always a SequenceType that is expressible using the SequenceType grammar? We should be told.

It's also wrong. When you construct a sequence using the "," operator, or a map using the map:put() function, there's nothing in the specification of those operations that tells you what the dynamic type of the result is. For example, there's nothing in the specification that says what the dynamic type of the sequence (23, <e/>) is.

In §4.4.2.1 the error is compounded where it says "the converted argument value retains its most specific dynamic type". At least this recognizes that a value actually conforms to more than one type. But it's wrong in suggesting that one of those types is necessarily the "most specific" (presumably meaning that it's a subtype of all the others). For example, the empty map (map{}) belongs to every map type, but there is no map type that is a subtype of all other map types (unless we conjecture that the dynamic type may be one that is inexpressible using SequenceType syntax).

With the introduction of record tests the map map{"a":12, "b":14, 1:19} conforms both to record(a as xs:integer, b as xs:integer, *) and to map{xs:anyAtomicType, xs:integer}, neither of which is more specific than the other, and there is no type expressible using the SequenceType grammar that is a subtype of both of these. So the introduction of record tests increases the importance of getting rid of this error.

It's not easy to fix issues like this, but it's important we should try, because errors in the description of the fundamentals can easily turn into bugs in the specification of concrete language constructs. We certainly need to fix it if we're going to add features like that proposed in issue #148, or to make changes to the type system to allow variadic dynamic function calls.

See also https://github.com/w3c/qtspecs/issues/14

Issue #190 closed #closed-190

08 Oct at 16:31:17 GMT

Make DeltaXML diffs if possible

Pull request #190 created #created-190

08 Oct at 16:25:38 GMT
Make DeltaXML diffs if possible

This PR adds the ability to generate mechanical diffs with Delta XML.

Issue #189 created #created-189

07 Oct at 18:07:44 GMT
Adopt the coercion rules for variables in XQuery

Currently XSLT applies the coercion rules (formerly the function conversion rules) both when supplying values to arguments of a function declaring a required type, and when binding values to variables declaring a required type. This is pretty much essential because it allows xsl:variable and xsl:param to have the same semantics.

XQuery applies the coercion rules only for function arguments, and not for variable bindings. For variables, the supplied value must already be a value of the required type. For example, you can't write let $x as xs:decimal := @version, you have to write let $x as xs:decimal := xs:decimal(@version). (Or if you find that annoying, you can write let $x as xs:decimal := +@version, because unary plus invokes coercion).

This is the main reason declared types are not allowed in XPath: the XSLT community felt that if they were allowed, the rules had to be the same as the rules for XSLT variable bindings. We could eliminate this objection if XQuery adopted the coercion rules for variable bindings.

I have some hesitation in proposing this, because when I proposed it before, it was quite strongly opposed by some of the original XQuery designers, whose views I respect. But although they argued their case strongly, I confess I did not understand their position. I simply see no reason why assignments to function parameters and assignments to variables should be handled differently. It works in XSLT, and makes the language more usable, and I see no reason why it should not work in XQuery.

It's not uncommon for languages to impose different rules for the two cases. Java, for example has a variety of invocation contexts (strict and loose), and variable assignment allows conversions such as narrowing of integer constants that are not allowed on method calls (interestingly, Java allows more conversions on variable assignment than on method calling). In XQuery, however, I can see no rationale for the rules being different.

We're proposing syntax for XQuery (function calls with keyword parameters) that make parameter assignment look more like variable assignment. I think we should match this by making them semantically more similar.

Pull request #188 created #created-188

07 Oct at 16:41:47 GMT

Editorial

Issue #187 created #created-187

07 Oct at 11:34:13 GMT
Add a 'while' clause to FLWOR expressions

This proposal adds a new clause, the while clause, to FLWOR expressions. I'll start with the proposal, and then give some rationale.

Proposal

We add a new WhileClause which can appear anywhere a WhereClause can appear. The semantics are deliberately almost identical to the WhereClause.

3.12.x While Clause

[60] WhileClause ::= "while" ExprSingle A while clause serves as a filter for the tuples in its input tuple stream. The expression in the while clause, called the while-expression, is evaluated once for each of these tuples. If the effective boolean value of the while-expression is true, the tuple is retained in the output tuple stream; otherwise the tuple and all subsequent tuples in the stream are discarded.

Examples:

This example illustrates the effect of a while clause on a tuple stream: Input tuple stream:

($a = 13, $b = 11)
($a = 91, $b = 42)
($a = 17, $b = 30)
($a = 85, $b = 63)

while clause: while $a > $b Output tuple stream:

($a = 13, $b = 11)
($a = 91, $b = 42)

The following query illustrates how a while clause might be used to extract all items in an input sequence before the first one that fails to satisfy some condition. In this case it selects the leading para elements in the input sequence, stopping before the first element that is not a para element.

for $x in $section/*
while $x[self::para]
return $x

Note: Although the semantics are described in terms of discarding all the tuples following the first one that fails to match the condition, a practical implementation is likely to avoid evaluating those tuples, thus giving an "early exit" from the iteration performed by the FLWOR expression.

Justification

FLWOR expressions remain the primary control construct in XQuery, despite the introduction of higher-order functions. The inability to process input selectively until some condition is true remains one of their biggest limitations. The window clause provides a workaround for some use cases, but it is very complex, and although it can partition an input sequence based on conditions found in the data, it has no direct way of stopping processing when a condition is found. Also, it only operates on input sequences, not on the tuple stream, so it isn't able to handle a composite condition involving multiple variables, for example for $i at $p in //emp while ($i/location = 'x' and $p < 50) return $i.

That particular example could be written with a where clause. But if a where clause is used, both the human reader and the query optimiser need to do some thinking to work out that early exit is possible (once $p reaches 50, no further tuples are going to have values less than 50).

Issue #186 closed #closed-186

07 Oct at 10:43:46 GMT

Update build automation to publish tests

Pull request #186 created #created-186

07 Oct at 10:36:18 GMT
Update build automation to publish tests

Recent changes (#184) from MK have added a new automatically generated test to the F&O build.

This PR automates the construction and publication of new test sets from the qtspecs repository to the qt4stests repository.

N.B. The repository where tests are published (qt4cg/qt4tests in this case) must be the value of the TEST_REPOSITORY secret. If you want to automate publication to your own fork, just set the value of that secret accordingly.

Pull request #185 created #created-185

07 Oct at 10:20:51 GMT
Issue118 drop xsl match

Removes the proposed new feature xsl:match from the draft spec; the proposal has been withdrawn

Issue #118 closed #closed-118

07 Oct at 10:07:03 GMT

xsl:match - can we do better

Issue #183 closed #closed-183

07 Oct at 07:35:30 GMT

Editorial

Issue #184 closed #closed-184

07 Oct at 07:34:33 GMT

Editorial changes from PR 183

Pull request #184 created #created-184

07 Oct at 07:28:37 GMT
Editorial changes from PR 183

This PR contains the editorial changes from #183 without the commits that add fn:op and other notes.

I believe I can fairly commit this PR without CG approval.

Pull request #183 created #created-183

06 Oct at 17:34:59 GMT
Editorial

Branch for changes that are purely editorial in nature.

Issue #182 created #created-182

06 Oct at 11:13:07 GMT
Should we allow vendor-defined optional parameters on built-in functions?

Now that we allow optional parameters on functions, with invocation by keyword, we could permit vendors to extend the function signature with optional parameters in a vendor-defined namespace. It would be required (obviously) that such parameters are optional.

A caveat is that under the current specification, (a) we can't stop the value of such a parameter being supplied positionally (perhaps unintentionally), and (b) the availability of such a parameter changes the arity range of the function, which potentially brings it into conflict with other like-named functions (for example functions introduced in the future).

A possible way of resolving this conflict is to propose that parameters whose name is namespaced can ONLY be supplied by keyword, and don't contribute to the arity range of the function. However, for user-defined functions, this rule would not be backwards compatible.

And a possible way around that is to say that the extensibility policy for a function is a property of its namespace; and to adopt different extensibility policies for built-in function namespaces and user-defined function namespaces.

Perhaps this is over-elaborate, and we should just make sure that functions that vendors might wish to extend have an options parameter, even if no standardised options are defined.

Issue #181 created #created-181

05 Oct at 10:55:18 GMT
HOF Sequence Functions with Positional Arguments

Motivated by @michaelhkay’s question in https://github.com/qt4cg/qtspecs/issues/80#issuecomment-1253495999, I wondered if we should ~~add optional positional arguments to built-in sequence functions that take higher-order function arguments~~ (thanks, Dimitre:) extend the types of the functions that are themselves parameters to certain standard functions, by adding to their signature one more parameter, which is the index of the item that is also passed as an argument to this function.

Some examples:

(: fn:filter: Accept items that are identical to their predecessors :)
fn:filter(
  $sequence,
  fn($item, $pos) { $item = $sequence[$pos - 1] }
)

(: fn:for-each: Create enumerated strings for all items :)
for-each(
  $sequence,
  fn($item, $pos) { $pos || '. ' || $item }
)

(: fn:for-each-pair: Create enumerated strings for the maximum value of all item pairs :)
for-each-pair(
  $seq1, $seq2,
  fn($item1, $item2, $pos) { $pos || '. ' || max(($item1, $item2)) }
)

(: fn:fold-left: Return positions of the items matching a specific value :)
let $input := (11 to 21, 21 to 31)
let $search := 21
return fold-left($input, (), fn($seq, $curr, $pos) { $seq, $pos[$curr = $search] })

Edit: I was wrong with my claim that the filter function could be used to replace index-where, and I have modified the example.

Issue #180 closed #closed-180

05 Oct at 10:17:12 GMT

Implement function signatures as tables with default values

Pull request #180 created #created-180

05 Oct at 10:10:18 GMT

Implement function signatures as tables with default values

Issue #179 closed #closed-179

05 Oct at 09:00:08 GMT

Link fixes

Pull request #179 created #created-179

05 Oct at 08:28:23 GMT
Link fixes

This PR fixes some of the linking and cross-reference issues.

  1. I fixed some actual links in a few places
  2. I added ID attributes to some new sections
  3. I investigated, and finally punted on, the question of why some production cross-references in the XSLT 4.0 spec came out wrong. I "fixed" it by making the stylesheet more dynamic. There are still some that don't work, so we'll need to find a better solution eventually. But at least this helps.
  4. I discarded an old draft that I don't believe was in use anywhere.

Issue #178 closed #closed-178

05 Oct at 07:05:14 GMT

Fix the function finder

Pull request #178 created #created-178

05 Oct at 07:00:47 GMT
Fix the function finder
  1. Determine which functions cannot be resolved and disable them in the pull down
  2. Adjust the CSS so that the function finder doesn't scroll out-of-view

QT4 CG meeting 006 draft agenda #agenda-10-11

04 Oct at 16:56:16 GMT

Draft agenda published.

QT4 CG meeting 005 draft minutes #minutes-10-04

04 Oct at 16:56:16 GMT

Draft minutes published.

Pull request #177 created #created-177

04 Oct at 17:15:50 GMT

Items before etc

Issue #163 closed #closed-163

04 Oct at 16:04:01 GMT

New intersperse function now accepted

Issue #176 closed #closed-176

03 Oct at 14:58:17 GMT

parse-json() - option to return numbers as xs:decimal

Issue #176 created #created-176

03 Oct at 14:43:16 GMT
parse-json() - option to return numbers as xs:decimal

In response to a user request (https://saxonica.plan.io/issues/5708) I propose addition of an option to fn:parse-json(), say number-type=double|decimal|string, which causes numbers in JSON to be returned as xs:decimal or xs:string values rather than xs:double. Use of values with more digits of precision than xs:double supports is common in JSON practice, despite warnings in the RFC that it's not interoperable; and the fallback to xs:string is handy for people who have the misfortune to be processing incoming JSON where leading zeroes are significant (for example, Belgian VAT numbers).

The change also affects fn:json-doc()

Issue #174 closed #closed-174

03 Oct at 14:17:14 GMT

Should we allow f(x,,y) when calling a function where parameter 2 is optional?

Issue #175 created #created-175

03 Oct at 10:33:39 GMT
In XQuery, allow a semicolon at the end of the module

A trivial little enhancement to remove an irritant: allow a semicolon after the query body in a query. Specifically, change

QueryBody ::= Expr

to

QueryBody ::= Expr Separator?

Issue #174 created #created-174

03 Oct at 10:08:29 GMT
Should we allow f(x,,y) when calling a function where parameter 2 is optional?

If the second and third parameters of a function are optional, should we allow the syntax

f(x,,y)

to call the function using the default value for the second parameter?

I can see this being a handy convention for calling fn:sort, for example, where defaulting the second parameter (the collation) is commonplace.

The current proposal does not allow this, but I see no good reason not to.

Pull request #173 created #created-173

01 Oct at 00:00:47 GMT

Add specification for fn:op() (issue #83)

QT4 CG meeting 005 draft agenda #agenda-10-04

30 Sep at 16:50:29 GMT

Draft agenda published.

Issue #2 closed #closed-2

30 Sep at 16:20:59 GMT

[FO] fn:intersperse

Issue #24 closed #closed-24

30 Sep at 16:18:34 GMT

[XPath] [XQuery] The unknown ArgumentPlaceHolder EBNF symbol is referred to in several places

Issue #27 closed #closed-27

30 Sep at 16:16:36 GMT

Fix references to ArgumentPlaceholder in the specifications.

Issue #172 created #created-172

30 Sep at 08:09:25 GMT
Record Tests

The draft XPath specification includes a proposal for Record Tests - see section 3.6.4.3. This issue seeks WG endorsement of this enhancement, and also for the use of Record Tests as match patterns in XSLT.

The purpose of the feature is to provide more useful type checking for common use cases where maps are used to represent heterogeneous data. The feature enhances the type system but doesn't change the data model (that is, the new types represent a new way of placing constraints on the value space of maps, without introducing new kinds of values or new operations).

The type subsumption rules for record tests proposed in §3.7.2 clauses (m) et seq include a couple of TODO entries that need completing.

Issue #171 created #created-171

30 Sep at 07:59:07 GMT
XPath ternary conditional operator

The draft XPath specification proposes a ternary conditional operator

C ?? X !! Y

with the same semantics as

if (C) then X else Y

This issue seeks WG endorsement of this enhancement.

The choice of operator symbols is from Perl; no other obvious candidates are available.

The proposal is free-standing, it has no dependencies or interactions.

The proposal adds no new functionality, but in some contexts (especially in the middle of an expression that uses symbolic operators rather than keywords) it aids readability:

//employee[@salary > (@currency == 'GBP' ?? 10000 !! 12000)]/@name

There is room for debate about the operator precedence.

Issue #170 created #created-170

30 Sep at 07:48:18 GMT
XPath "otherwise" operator

The draft specification proposes an "otherwise" operator. This issue seeks WG endorsement of this addition.

See section 4.18 of the XPath draft specification.

The semantics are that (X otherwise Y) returns X, except in the case where X evaluates to an empty sequence, in which case it returns Y. A typical usage would be //employee ! (@location otherwise "Denver").

There are two questions the WG might choose to discuss:

  • How should the operator be spelled? The symbol "?:" has been proposed as an alternative to the keyword "otherwise".
  • What should the operator precedence be?

I propose that once approved, we also add this operator to the list recognised by the fn:op function.

Issue #169 created #created-169

30 Sep at 07:01:16 GMT
Handling of duplicate keys in xsl:map

The draft specifications include proposals to enhance the handling of duplicate key values in xsl:map. This issue seeks WG endorsement of these changes.

See XSLT section 21.2.1

map:merge currently provides a set of fixed "policy" options for handling of duplicates; xsl:map always raises an error.

Rather than adopt the fixed set of policies in map:merge, I propose that xsl:map should accept a callback function to process duplicates. For example, forming a sequence-concatenation of the values can be achieved using on-duplicates="op(',')". This approach allows all the options provided by map:merge, and more; for example the duplicate entries can be combined using string-join() if so desired.

Issue #168 created #created-168

30 Sep at 06:49:11 GMT
XSLT Extension Instructions invoking Named Templates

This feature is already present in the draft specification; the issue is being raised to seek WG endorsement of the change.

See section 10.1.3 of the draft.

The change is self contained with no dependencies. It adds no new functionality, merely syntactic convenience.

Issue #167 created #created-167

30 Sep at 06:37:33 GMT
XSLT Conditional Instructions

These changes are already present in the draft specification: this issue is being raised to seek WG endorsement of the changes.

See section 8 of the specification.

  1. In xsl:if, add optional then and else attributes
  2. In xsl:when and xsl:otherwise, add an optional select attribute which can be used in place of the sequence constructor
  3. Add a new xsl:switch instruction

Justification: these changes do not add any functional capability to the language, they simply provide a small improvement in usability. The changes are self-contained with no dependencies on other changes.

Issue #154 closed #closed-154

29 Sep at 20:33:10 GMT

Namespaces for Functions

Pull request #166 created #created-166

29 Sep at 20:28:48 GMT
Variadicity

Two main separate but related features: (a) allow function declarations in XQuery and XSLT to define optional parameters, (b) allow static function calls (including partial function application) to take keyword arguments.

Issue #165 created #created-165

29 Sep at 11:38:03 GMT
Keyword arguments: ":=" or ":"?

In function calls using keyword arguments, should we use the syntax name := value, or name : value?

Regarding :=:

  • Function declarations in XQuery will presumably use the syntax name := value to define the default value for a parameter, by analogy with initial values in global variable declarations, so it seems consistent to use name := value in function calls as well
  • The syntax name := value is also used in XPath let expressions to bind a value to a variable name.

Regarding ::

  • It matches the use in map constructors
  • It's used this way in C#
  • There is a need to use whitespace to prevent ambiguity, for example in a construct like f:display(first: first, last: last); moreover, if the space is omitted, there is a danger that the expression will still be valid but will have an unintended meaning.
  • Map constructors also require the disambiguating whitespace, but with map constructors, use of a bare name on the LHS is unusual (more common is a string literal in quotes), and failure to include the whitespace is almost certain to lead to a syntax error.

My feeling is that := is the better choice.

Note: = (as used in Python) is not an option, because a call like not(x = y) already has a well-defined meaning.

Issue #164 closed #closed-164

29 Sep at 10:37:23 GMT

Fix error in input paths for XSLT

Pull request #164 created #created-164

29 Sep at 10:32:42 GMT
Fix error in input paths for XSLT

The XSLT 4.0 spec didn't build correctly because the input paths in the build script were missing the /specification/ directory :-(

Pull request #163 created #created-163

27 Sep at 22:06:02 GMT
New intersperse function now accepted

and correct the fos.xsd schema for history/version information

Issue #162 created #created-162

27 Sep at 19:20:35 GMT
Support unbounded variadic functions on map parameter keys

Where a parameter is defined as a map, then named/keyword arguments bind to keys in that map if the named/keyword argument does not match a parameter name in that function. For example, fn:serialize($data, method: "json").

Motivation

There are several functions that take an option map in XQFO. It can be clunky to construct the map options when calling these functions.

The option map is a common pattern in other implementations. Specifically, MarkLogic makes use of that pattern.

Note

If the argument name is an NCName, the xs:NCName value should be cast to the map's key type using the XQFO type/value casting rules.

The argument name should support string literals, as map keys can contain spaces. In this case, the xs:string type is cast to the map's key type using the XQFO type/value casting rules.

The #159 proposal allows QNames for the argument names in order to support parameters that are QNames. When binding to a map key an error (static or dynamic) should be raised.

Issue #161 created #created-161

27 Sep at 19:00:27 GMT
Support unbounded variadic functions on sequence parameters

Where a parameter is defined as a 0-or-more sequence (T*) or a 1-or-more sequence (T+) and that parameter is indicated as behaving variadically, then positional (non-keyword) arguments from that parameter onward are bound to that sequence.

Motivation

The XQFO specification supports the fn:concat function which takes 2 or more arguments. The "or more" part is defined as "..." with a short sentence stating that this is the only function that supports two or more arguments. As such, the behaviour of this function is loosely defined.

Other implementors and specifications (BaseX, EXQuery RESTXQ, MarkLogic) make use of these variadic sequence types in several functions.

In all of these cases (and for user-defined variadic sequence parameters) the semantics of the functions should be well defined.

Note

In the case where the 0-or-more sequence is the last parameter (or the parameter receiving the unbounded argument values), that parameter would be an optional value as if it was defined as $param as T* := ().

One possibility with the interaction with named arguments is to allow that to break up two sequence parameters that behave variadically. For example, given f($a as xs:int*, $b as xs:string, $c as xs:int*) it may be possible to call it like f(1, 2, 3, b: "4", 5, 6, 7) which would be equivalent to f((1, 2, 3), "4", (5, 6, 7)) currently.

This should be defined such that calling the function with the same number of arguments as the function has parameters then none of the parameters behave variadically. This has the intention of not breaking existing code.

Issue #160 created #created-160

27 Sep at 18:35:59 GMT
Support named arguments on dynamic function calls

This proposal extends #159 to dynamic functions. For example, let $x := fn:tokenize#3 return $x("abcbdCef", pattern: "c", flags: "i").

Motivation

This is the same as the motivation for #159, but in the context of a) inline functions, and b) named function references. This is specifically useful for XPath expressions where defining the functions in XSLT is cumbersome and makes it harder to share the implementation between XSLT and XQuery.

Notes

If the resolved function (static or inline) from the dynamic expression does not have a parameter with the specified name, a suitable error should be raised.

Issue #159 created #created-159

27 Sep at 18:13:03 GMT
Support named arguments on static function calls

This proposal adds the ability to specify the names (referred to here as "keywords") of the parameters of a function at the point at which the function is called. This uses the form name: value where name is a QName and value is a single expression, for example fn:tokenize("abcbdCef", pattern: "c", flags: "i").

Motivation

If a function takes several arguments of the same type (xs:boolean, xs:integer, xs:string, etc.) it can be easy to mix up the parameters. As such, it can be useful to specify these by name when calling the function, e.g. f(a: true(), b: false(), c: true()) to both aid readability and to prevent bugs when the values are specified in the wrong order to how they are declared (such as swapping the $pattern and $flags parameters in fn:tokenize).

If a function takes several optional parameters and the caller only wants to override one of the later parameters (such as the $collation in fn:differences), specifying the name of the parameter avoids specifying a value for the other arguments. This makes it clearer what is happening (as the user does not need to think about the supplied default arguments), and reduces errors (such as specifying a wrong default value for the arguments the caller is not interested in).

Notes

Because function parameter names are EQNames, the keyword should also be an EQName. If the keyword is an NCName, then it should follow the same rules as parameter NCNames such that the namespace URI is empty.

Like with maps, if the value starts with an element name (EQName) then there must be a space after the :. This should use the same note as the one in the map constructor section.

An alternative name := value syntax has also been suggested. An informal consensus is in favour of using name: value.

QT4 CG meeting 004 draft minutes #minutes-09-27

27 Sep at 16:40:49 GMT

Draft minutes published.

Issue #158 created #created-158

27 Sep at 17:22:36 GMT
Support optional parameters on dynamic functions

This proposal extends #155 to dynamic functions.

Motivation

Within XPath there isn't a defined mechanism for declaring static functions. It can be such that declaring a function outside of the XPath expression is more cumbersome (such as defining it within XSLT) and makes the resulting expression less portable between XSLT and XQuery.

There are 2 cases where this applies:

  1. Defining inline functions -- These should work like declaring default values for static functions, with the rules for applying default values working at the dynamic call site.

  2. Specifying named function references for functions with additional default arguments, such as fn:tokenize#2("lorem ipsum") where the fn:tokenize#2 binds the $flags default value and the dynamic call applies the default value for $pattern. -- This is possible to define as creating an inline function such that any default parameters are preserved along with the parameter types at the corresponding parameter index.

Note

For FunctionTest (i.e. function types), I'm not sure of a use case where extending optional parameters to that would be useful, as any higher-order function would always call the input function with the correct number of arguments. If this is useful, then that should be moved to a separate proposal.

Issue #157 created #created-157

27 Sep at 17:00:34 GMT
Proposal to support optional parameters that bind to the context item.

This proposal formalizes the definition of the 0-arity versions of fn:string, fn:data, etc. that bind to the context item.

Motivation

This allows fn:string to be defined and implemented in a single function of the form:

declare function fn:string($item as item()? := .) as xs:string { ... }

Notes

When the default value is specified, the function is ·context-independent·, and ·focus-independent· at this point.

When the default value is not specified, the function is ·context-dependent·, and ·focus-dependent· at this point.

This allows users to define their own functions that can be used at the end of path expressions, e.g. /color/hexcode/to-int(16).

Once this proposal has been accepted, we should go through the XSLT and XQFO specifications and use defaults for all the relevant functions.

Issue #156 closed #closed-156

27 Sep at 16:50:45 GMT

Changes agreed 2022-09-20, and add history info to the spec for each function

Pull request #156 created #created-156

27 Sep at 16:45:09 GMT
Changes agreed 2022-09-20, and add history info to the spec for each function

Approved on 27 September

Issue #134 closed #closed-134

27 Sep at 16:43:05 GMT

Make agreed changes to fn:all and fn:some functions.

Issue #152 closed #closed-152

27 Sep at 16:42:36 GMT

Mike's changes to fn:all and fn:some

Issue #155 created #created-155

27 Sep at 16:36:31 GMT
Proposal to support optional parameter values on static functions.

This proposal allows users to define optional parameters for a function. When calling the function statically (e.g. f(1,2,3)), or creating a named function reference to it (e.g. f#3), any optional parameters that are not specified take the default values.

Motivation

Defining functions that can take one or more optional argument (such as in the Functions and Operators specification) adds a lot of boilerplate code. For example, `fn:tokenize could be written as:

declare function fn:tokenize($value as xs:string?) as xs:string* {
   fn:tokenize($value, "\s+")
};

declare function fn:tokenize($value as xs:string?, $pattern as xs:string) as xs:string* {
   fn:tokenize($value, $pattern, "")
};

declare function fn:tokenize($value as xs:string?, $pattern as xs:string, $flags as xs:string) as xs:string* {
   ...
};

In this proposal, the above can be written more simply and concisely as:

declare function fn:tokenize(
   $value as xs:string?,
   $pattern as xs:string := "\s+",
   $flags as xs:string := ""
) as xs:string* {
   ...
};

Both of these definitions of fn:tokenize are equivalent.

Note

Once this proposal has been accepted, we should go through the XSLT and XQFO specifications and use defaults for all the relevant functions.

Issue #154 created #created-154

27 Sep at 08:16:13 GMT
Namespaces for Functions

The current F+O function library is divided into four namespaces: fn, map, array, and math.

The use of separate namespaces serves little purpose, because they are all controlled by the same naming authority.

But it has some significant disadvantages:

(a) The proliferation of namespace declarations and namespace prefixes in user programs causes visual clutter.

(b) Putting additional namespaces into the static context causes semantic clutter: such namespaces become available for unwanted purposes, such as casting strings to QNames, which also means that the namespace bindings need to be retained at run time, which bloats compiled code.

(c) It's not always obvious, either to the specification developers or to users, what namespace a new function belongs in.

I've attempted various ways of solving this problem, such as having a "namespace search list" used to resolve unprefixed function names, in place of a single default namespace. This idea falls down because we already have duplication across the current namespaces, for example fn:remove and map:remove.

Polymorphism (deciding which function to call based on the type of the first argument) also seems unpromising as a way forward; our type system is not classically object-oriented.

I propose instead that all functions in F+O should have an alias/synonym in the fn namespace:

  • map:remove() becomes available as m.remove(), etc
  • array:size() becomes available as a.size(), etc
  • math:tan() becomes available as tan() etc

I also propose removing the rule that user-declared functions have to be in a namespace. Instead, a function declared with no prefix (at least in XSLT, not sure exactly how this works in XQuery) should be in no namespace; and the binding rule for unprefixed names in function calls is to search first for no-namespace names, then for names in the default (usually fn) namespace. Of course this creates a risk of binding to the "wrong" function, but users can defend against this in a number of ways: they can avoid use of the feature, they can use the fn: prefix for disambiguation when necessary, they can adopt a naming convention like calling their own functions my.sum(). We should treat users as adults, able to make their own decisions on such matters.

We will still encourage third-party developers of function libraries to put their functions in their own namespace, and of course this applies to EXPath modules such as file and binary.

Pull request #153 created #created-153

26 Sep at 12:14:32 GMT
Explicitly mention subtypes for arrays and maps

Reword section 3.6.4 for map and array type tests to explicitly mention sub typing and link to the relevant section.

I believe, it is important to add examples for subtypes of arrays and maps. Can someone advise how and where to add them?

map subtype example (could be added to 3.7.2.4 d.):

<note>
  <p><code>map(xs:integer, element(fruit))</code> is a subtype of <code>map(xs:anyAtomic, node())</code></p>
</note>

array subtype example (could be added to 3.7.2.4 i.):

<note>
  <p><code>array(xs:positiveInteger)</code> is a subtype of <code>array(xs:decimal)</code></p>
</note>

QT4 CG meeting 004 draft agenda #agenda-09-27

25 Sep at 17:48:07 GMT

Draft agenda published.

Pull request #152 created #created-152

22 Sep at 09:03:20 GMT
Mike's changes to fn:all and fn:some

This is a re-application of Mike's PR onto the latest sources and build environment.

Issue #110 closed #closed-110

21 Sep at 17:34:27 GMT

JSON templates

Issue #151 created #created-151

21 Sep at 17:34:06 GMT
map:build() function

This proposal replaces https://github.com/qt4cg/qtspecs/issues/110

map:build($input as item()*, 
                  $makeKey as function(item()) as xs:anyAtomicValue, 
                  $makeValue as function(item()) as item()*.
                  $duplicates as function(item()*, item()*) as item()*) as map(*)

The fourth argument is optional and defaults to fn:op(",").

The third argument is optional and defaults to fn:identity#1

Example 1:

map:build(//employee, ->{@ssn}, identity#1, ->($x,$y){error()}) Constructs an index of employees with the atomised @ssn attribute as key, and the employee element as the associated value, failing if there are any duplicates

This can be abbreviated to map:build(//employee, ->{@ssn}) if it is known that there are no duplicates, or if the default action on duplicates is acceptable.

Example 2:

map:build(//employee, ->{location/city}, ->{xs:decimal(salary)}, op(","))

Constructs a map whose keys are the locations of employees and whose associated values are the salaries of employees at that location. The effect of the $duplicates argument is that if there are several employees at the same location, their salaries are combined into a sequence (using the "," operator). If the last argument were op("+"), the map would contain the sum of the salaries for each location.

Rules

The function creates a map.

Informally, the function processes each item in the input sequence in turn in sequence order. It calls the $makeKey function on that item to obtain a key value, and the $makeValue function to obtain an associated value. If the key is not already present in the map, it creates a new key-value pair with that key and that value. If the key is already present, it combines the existing value for the key with the new value using the $duplicates function, and replaces the entry with this combined value.

More formally, the result of the function is the result of the following expression:

fold-left($input, map{}, ->($old, $next){
    let $key := $makeKey($next)
    let $value := $makeValue($next)
    return 
       if (map:contains($old, $key))
       then map:put($old, $key, $duplicates($map:get($key), $value))
       else map:put($old, $key, $value)
   })

NOTE: although defined to process the input sequence in order, the implementation may be optimised (for example to work in parallel) if it is known that the $duplicates function is symmetric, that is, if $duplicates($a, $b) produces the same result as $duplicates($b, $a).

Issue #150 created #created-150

21 Sep at 03:10:12 GMT
fn:ranks: Produce all ranks in applying a function on the items of a sequence

We all know the value and usefulness of functions such as fn:highest() and fn:lowest().

Sometimes, when we need to see all rankings of a particular sorting result, for example the rankings of a sport competition, we realize that highest and lowest are just the highest and lowest ranking-groups from all the rankings.

We define these three overloads for fn:ranks:

fn:ranks($input as item()*) as array(item()*)*

fn:ranks($input as item()*, $collation as xs:string?) as array(item()*)*

fn:ranks($input as item()*, $collation as xs:string?, $key as function(item()) as xs:anyAtomicType*) as array(item()*)*

The rules and semantics for the arguments are the same as those for fn:highest, fn:lowest, fn:sort. What is different is just the result.

Here is one possible XPath implementation and also a complete example:

let $ranks := function(
                $input as item()*,
                $collation as xs:string?,
                $key as function(item()) as xs:anyAtomicType*) as array(item()*)*
 {
    for $v in sort(distinct-values($input ! $key(.)),  $collation)
     return [$input[$key(.) eq $v]]
 },
 
   $inp := (3, 2, 4),
   $keyfun := function($n) {$n mod 2},
   $theRanks :=  $ranks($inp, (), $keyfun),
   $theHighest := $theRanks[last()], 
   $theLowest := $theRanks[1]
 return
 
   ( "Ranks:", $theRanks,
     "=================",
     "Highest:",
     $theHighest,
     "=================",
     "Lowest:",
     $theLowest
   )

The result, as intended is all the rankings (in this case they are just 2 groups of equally-ranked items), then the highest and lowest, extracted as the last and first of the rankings:

Ranks:
[(2, 4)]
[3]
=================
Highest:
[3]
=================
Lowest:
[(2, 4)]

Issue #149 created #created-149

20 Sep at 20:12:28 GMT
Functions for splitting a sequence (or array) based on predicate matching

This is concerned with use cases like "How do I select all the paragraphs before the first H2?" or "How do I select items between and ?".

Currently in the draft spec we have proposals for:

range-from($input, $predicate): Returns a sequence containing items from an input sequence, starting with the first item that matches a supplied predicate.

range-to($input, $predicate): Returns a sequence containing items from an input sequence, ending with the first item that matches a supplied predicate.

These both include the matching item, on the theory that it's easier to drop it if it's not wanted, than to add it if its needed.

I've also proposed (as an alternative) a family of four functions items-before, items-to, items-from, items-after giving four combinations of taking the subsequence before/after the first match of the predicate, and including or not including the matched item.

It's worth pointing out that these can all be defined in terms of index-where. For example range-to (assuming at least one item matches the predicate) is subsequence($input, 1, index-where($input, $predicate).

These functions all treat the first match of the predicate as special: they partition the sequence before or after the first item that matches the predicate. An alternative, inspilred by XSLT's for-each-group group-ending|starting-with, would be to partition the sequence breaking immediately before or after every item that matches the predicate:

group-breaking-after($input, $predicate)
group-breaking-before($input, $predicate)

But these logically return a sequence of sequences, which would typically be presented either as an array of sequences or a sequence of arrays, neither of which is ideal. (An alternative would be to return a sequence of arity-0 functions)

Having reviewed the options, I think my preferance remains having a family of four functions which I have called items-before, items-to, items-from, items-after. But I'm certainly open to other options. The logical names would probably be subsequence-before etc, but that's a bit of a mouthful.

Whatever family of functions we decide upon, there's logically a requirement to offer the same for arrays.

Michael Kay

QT4 CG meeting 003 draft minutes #minutes-09-20

20 Sep at 18:25:39 GMT

Draft minutes published.

Issue #148 created #created-148

20 Sep at 13:24:59 GMT
Get the type of a value

This could be a language construct like type of $a similar to $a instance of <type> or a function fn:type-of(item()*).

It returns the string representation of the type of a value.

  • For atomic values, return the local name of the most specific built-in (XSD) type of which the value is an instance of.
  • For nodes, return node-kind(name) where node-kind is for example "element" or "attribute" and name (for nodes that have a name) is the node name in Q{uri}local notation.
  • For other functions return a string representation of the function signature that (a) conforms to XPath SequenceType syntax, and (b) uses EQName notation for qualified names.
  • For empty maps and arrays return map(*) and array(*) respectively
  • For arrays and maps with members and entries the implementation can attempt to find the most specific shared type like array(xs:string), map(xs:integer, element(banana))
    • The inspection should stop as early as possible, when the first mismatching type is encountered.
type of "", (: yields "xs:string" :)
type of [], (: yields "array(*)" :)
type of [map{"a": true(), "b": false()}, map{"c", false()}], (: yields "array(map(xs:string, xs:boolean))" :)
type of ([], map{}, function () {}]), (: yields "function(*)" :)
type of [map{"a": true(), "b": "false"}, map{"c", false()}] (: yields "array(map(xs:string, xs:anyAtomic))" or "array(map(xs:string, *))" :)

I wrote an implementation in XQuery as a PoC utilizing typeswitch:

https://github.com/line-o/xbow/commit/ca6e593f869c15b1fb372d24653715abfbda5cf8

There is an implementation in baseX inspect:type. inspect:type#2 allowing to set additional options which should be considered.

Issue #147 created #created-147

19 Sep at 10:50:22 GMT
Terse syntax for map entries

Allow variables to be used to construct map entries with

  • the key being the variable name cast to xs:string and
  • the value to be the value of the variable

The syntax would change to something along

mapExpression ::= 'map' '{' ( expr ':' expr | variableReference ) [ ',' ( expr ':' expr | variableReference ) ]* '}'

Example:

let $my-var := <root />
let $other := map { "a": [ 1, 2, 3 ] }
return map {
  $my-var,
  $other
}

evaluates to

map {
  "my-var": <root />
  "other": map { "a": [ 1, 2, 3 ] }
}

This is the inverse operation of destructuring a map as proposed in https://github.com/qt4cg/qtspecs/issues/37

let ${my-var, other} := map {
  "my-var": <root />
  "other": map { "a": [ 1, 2, 3 ] }
}

In the Slack discussion John Lumley and Liam Quinn raised concerns this construct might be error prone when both methods to construct an entry can be mixed

  map { $a : $b, $c }
map:for-each($m, function ($k, $v as xs:anyAtomic) { map{ $v : $k } }),
map:for-each($m, function ($k, $v as xs:anyAtomic) { map{ $v, $k } })

Issue #146 created #created-146

18 Sep at 18:57:37 GMT
fn:apply with last two arguments (array, map) for the positional and keyword args in a func-call

The existing fn:apply() in XPath 3.1 "Makes a dynamic call on a function with an argument list supplied in the form of an array."

However in XPath 4.0 a function may have both (required) positional and variadic arguments, expressed as keyword arguments if the function is map-variadic. Thus there are cases when it is impossible to provide the arguments of the function just within an array.

For such cases, we need a new overload:

fn:apply($function as function(*), $array as array(*), $map as map(*)) as item()*

The result of the function is obtained by creating and invoking the same dynamic call that would be the result of a function-call to $function with positional arguments taken from the (ordered) members of the supplied array $array and keyword arguments taken from the (unordered) entries (KVPs - keyword-value pairs) of $map.

The effect of calling fn:apply($f, [$a, $b, $c, ...], map{"k1" : v1, "k2" : v2, ...}) is the same as the effect of the dynamic function call resulting from $function($a, $b, $c, ...., $k1 = v1, $k2 = v2, ...). The function conversion rules are applied to the supplied arguments in the usual way.

Dependencies

  1. The new overload of fn:apply() is needed if we have in XPath 4.0 function calls with both positional and keyword arguments. In other words, this proposal supposes (depends on) the existence of this new XPath feature

  2. Some newly proposed features, such as Function Decorators, depend on the existence of this new overload of fn:apply()

Example:

The expression:

let $data := <a b="3"/>
 return
    fn:apply(fn:serialize#2, [$data], map{"method":"xml", "omit-xml-declaration":true()})

returns:

'<a b="3"/>'

Issue #145 closed #closed-145

17 Sep at 13:28:16 GMT

Cleaned up infrastructure

Pull request #145 created #created-145

17 Sep at 13:27:49 GMT
Cleaned up infrastructure

I couldn't leave well enough alone. This "finishes" the conversion to a more modern build system, fixes a number of small editorial issues (incorrect bibliographic links, etc.), and suppresses some warnings by default.

Issue #144 closed #closed-144

16 Sep at 19:35:50 GMT

Fix links

Pull request #144 created #created-144

16 Sep at 19:30:28 GMT
Fix links

My bad. At some point, not too recently, I changed from using /branch/master to hold the current specifications to /specifications. The old links persisted in the index page for qt4cg.org and got (quite reasonably) copied into the README but I didn't notice.

Yesterday, I think, I was cleaning up some things and deleted the stale /branch/master files.

QT4 CG meeting 003 draft agenda #agenda-09-20

16 Sep at 15:59:58 GMT

Draft agenda published.

Issue #17 closed #closed-17

15 Sep at 17:04:04 GMT

readme

Issue #138 closed #closed-138

15 Sep at 16:56:53 GMT

Update README.md

Issue #143 closed #closed-143

15 Sep at 16:53:57 GMT

Infrastructure updates

Pull request #143 created #created-143

15 Sep at 16:51:59 GMT
Infrastructure updates

This PR doesn't make any technical changes (and it probably won't build correctly as a PR, so I'm just going to merge it).

  1. I tinkered a little bit with the URIs on the specifications so that they point to more accurate URIs.
  2. I've tried to make a few of the XSLT files "less chatty". I'd like to make it easier to see what the errors and warnings are so that we can resolve them. It's the sort of thing that bugs me, so if no one else does it, I'll probably chip away at them as we go.
  3. I reworked the way that the XPath Functions and XSLT specifications are built. I've pulled the tasks out of the ant files and implemented them directly in Gradle. (The XPath and XQuery specifications are as yet unchanged, they're more complicated and this already took longer than I had anticipated.)
  4. The specification/{specid}/build and specification/{specid}/html directories are no longer used. All of the build and intermediate files are under build. (For XPath Functions and XSLT.)
  5. I stopped trying to build all the namespace documents. Those aren't useful unless they're installed in the "real" URIs and I have no idea if/how we'll be able to update those. When we're closer to the end, I can get the namespace documents building again. For now, it's just faster not to.

It's now easier to see what is happening and Gradle will do a better job of managing the dependencies. So if you are building it locally, it should do less work on each build.

Everything should just build the same way it used to. Changes to the sources or the stylesheets should cause the right builds to run. It is, of course, possible that I've broken something. I tried to check everything, but there are a lot of moving pieces here. Comments, questions, and complaints most welcome.

Issue #142 closed #closed-142

15 Sep at 11:26:17 GMT

Improve workflows

Pull request #142 created #created-142

15 Sep at 11:26:02 GMT
Improve workflows

This PR updates the github workflows so that all branches are published and so that the PR path is exposed in the environment.

Issue #140 closed #closed-140

14 Sep at 14:52:50 GMT

Make agreed changes to fn:all and fn:som functions (NW testing)

Issue #141 closed #closed-141

14 Sep at 13:15:20 GMT

Fix branch name

Pull request #141 created #created-141

14 Sep at 13:15:15 GMT

Fix branch name

Pull request #140 created #created-140

14 Sep at 13:11:39 GMT

Make agreed changes to fn:all and fn:som functions (NW testing)

Issue #137 closed #closed-137

14 Sep at 13:04:44 GMT

A copy of Mike's fn-all-some changes for testing

Issue #139 closed #closed-139

14 Sep at 13:02:21 GMT

Fix markup error introduced yesterday

Pull request #139 created #created-139

14 Sep at 13:01:09 GMT
Fix markup error introduced yesterday

A commit yesterday introduced a markup error.

Pull request #138 created #created-138

14 Sep at 12:43:46 GMT
Update README.md

First draft of revising readme.md. I would propose also adding instructions detailing the build process for local builds of the repo.

Pull request #137 created #created-137

14 Sep at 12:05:34 GMT
A copy of Mike's fn-all-some changes for testing

This should create a formatted PR!

Issue #136 closed #closed-136

14 Sep at 12:02:44 GMT

Workflow changes

Issue #133 closed #closed-133

14 Sep at 12:01:35 GMT

Norm never plans to merge this, it's a test

Pull request #136 created #created-136

14 Sep at 12:00:02 GMT
Workflow changes

Add two workflows, update the build script to create an index page

Issue #135 created #created-135

13 Sep at 18:21:18 GMT
Arrays' counterparts for functions on sequences, and vice versa

When we have a proposal for a function f1 that has a sequence-argument, we need to also have (or propose if not-existent) a corresponding function f2 that has an array-argument in place of the f1's sequence argument.

For example:

all($input as item()*, $predicate as function(item()) as xs:boolean)

the above function accepts a sequence as its 1st argument. In this case there is no existing function all for arrays, therefore we will define/propose it together with the above function:

array:all($input as array(*), $predicate as function(item()*) as xs:boolean))

For consistency, clarity and to not confuse the reader of the Spec (for example trying to find why there is no corresponding 2nd function and abuse their imagination) we shall as a rule always provide a pair of such functions: one defined on sequence(s) and one defined on array(s).

Even if we were not going to propose a new function, but an "orphan" such function already exists, we will add its corresponding 2nd function.

QT4 CG meeting 002 draft minutes #minutes-09-13

13 Sep at 16:30:29 GMT

Draft minutes published.

Pull request #134 created #created-134

13 Sep at 17:27:47 GMT

Make agreed changes to fn:all and fn:some functions.

Pull request #133 created #created-133

13 Sep at 16:43:34 GMT
Norm never plans to merge this, it's a test

This is a test. This is only a test. Had this been a real emergency, we would have fled in terror and you would not have been informed.

QT4 CG meeting 002 draft agenda #agenda-09-13

12 Sep at 11:48:44 GMT

Draft agenda published.

QT4 CG meeting 001 draft minutes #minutes-09-06

06 Sep at 16:38:31 GMT

Draft minutes published.

Issue #132 created #created-132

06 Sep at 08:46:16 GMT
Clarify if redirects should be followed

If a user requests a document (using doc, document, unparsed-text, etc) with an HTTP(S) URI, and a redirect response is returned, what should the processor do?

QT4 CG meeting 001 draft agenda #agenda-09-06

05 Sep at 14:45:39 GMT

Draft agenda published.

Issue #131 created #created-131

23 Aug at 12:49:52 GMT
Expression for binding the Context Value

REVISED: I’ve incorporated the feedback from the comments (thanks).

We have no expression yet to bind a value to the context value. Such an expression would be useful, among other things, to extend the focus function to sequences (fn { . }, see #129).

Here are 3 possible constructs for that, ordered by my personal preference:

1. Value Map Expression

ValueExpr      ::=  ValidateExpr | ExtensionExpr | ValueMapExpr
ValueMapExpr   ::=  SimpleMapExpr ("~" SimpleMapExpr)*
SimpleMapExpr  ::=  PathExpr ("!" PathExpr)*

(: Example :)
//flower ~ (count(.) || ' flowers: ' || string-join(name, ', '))

The expression would be similar to the simple map expression (which we could rename to item map expression). The following equivalents would then exist for simple FLWOR expressions:

for $i in (1 to 5) return string($i)  ≍  (1 to 5) ! string(.)
let $i := (1 to 5) return count($i)   ≍  (1 to 5) ~ count(.)

fn { E } could be rewritten to fn($c) { $c ~ E }.

2. Context Value Declaration

ContextExpr  ::=  "context" "{" Expr "}" EnclosedExpr

(: Example :)
context { //flower } {
  count(.) || ' flowers: ' || string-join(name, ', ')
}

The result of the first expression defines the context value, the second expression can reference the context.

fn { E } could be rewritten to fn($c) { context { $c } { E } }.

3. Enhanced FLWOR expression (for the sake of completion)

Similar to variables, the dot could be used to bind and reference the context:

LetBinding  ::=  ("." | ("$" VarName)) TypeDeclaration? ":=" ExprSingle
ForBinding  ::=  ("." | ("$" VarName)) TypeDeclaration? AllowingEmpty? PositionalVar? "in" ExprSingle

(: Example :)
let . := //flower
return count(.) || ' flowers: ' || string-join(name, ', ')

fn { E } could be rewritten to fn($c) { let . := $c return E }.

Assessment

  • The first solution looks most appealing to me. I like the analogy with the existing syntax for single items.
  • We could choose the second solution if we believe that the expression will be rarely used.
  • I‘ve backed away from the third solution; I think it would be too pervasive.

Issue #130 created #created-130

22 Aug at 12:23:52 GMT
New super/union type xs:binary?

I know we want to avoid in-depth changes in the type system. Still, now that we have xs:numeric, is there an obvious reason why we have no super oder union type for xs:base64Binary and xs:hexBinary? It would simplify the definition of many extensions functions a lot (e.g. in the EXPath Binary or File Module).

Issue #127 closed #closed-127

19 Aug at 09:50:42 GMT

[XPath] [XQuery] Function items: Align "function" and "->"

Issue #129 created #created-129

18 Aug at 16:59:16 GMT
Context item → Context value?

This has already been discussed before at various places, I’d like to raise it again: What about generalizing the context item and allowing it to reference sequences? Are there definitive showstoppers?

The Context Item

As its name says, the context item is a container for a single item in the current context. A value that is bound to the context item is referenced with the Context Item Expression, the single dot: ..

The context item shares many similarities with variables. The main difference is that it currently cannot be used for sequences. I propose to generalize the semantics and introduce a “context value”:

  • Items that have formerly been bound to the context item (via the Context Item Declaration, within predicates, the simple map operator, path expressions, the transform with expression, etc.) are now bound to the context value.
  • The revised Context Item Expression returns sequences instead of single items.
  • We cannot drop context items completely – for example, we have a Context Item Declaration in the prolog of XQuery expressions, which uses the item keyword – but we can treat it as a secondary concept.

Context Value Declaration

It has become a common pattern to use declare context item to bind a document to the context item and process queries on that item:

declare context item := doc('flowers');
.//flower[name = 'Tigridia']

If data can be distributed across multiple documents (which is often, if not the standard case, in databases), this approach does not work. It would work if we could bind sequences:

declare context value := collection('flowers');
.//flower[name = 'Tigridia']

External Bindings

Many processors allow users to bind external values to the context item. This approach is particularly restricting for databases, in which data is often distributed across multiple documents. With the generalized concept, it would get possible to bind sequences and collections to the context. Paths like the following one could be used, no matter if the contents are stored in a single document or in a collection:

//flower[name = 'Iridaceae']

Focus Functions

The focus function provides a compact syntax for common arity-one functions. The single argument is bound it to the context item:

sort($flowers, (), function { @petals })

With the generalization to values, we could easily enhance focus functions to accept arbitrary sequences:

array:sort($flower-species, (), function { count(.) })
let $flowers := array:join(
  for $flower in //flower
  group by $_ := $flower/name
  return [ $flower ]
)
(: some $p in petals satisfies $p gt 4 :)
return array:filter($flowers, function { petals > 4 })

Use Case: Arrow Expressions

The arrow expression provides an intuitive syntax for performing multiple subsequent operations on a given input. With the context value generalization, we could also process chained sequences:

//flower[name = 'Psychotria']
=> function { count(.) || ' flower(s) found' }()

Issue #128 created #created-128

17 Aug at 10:29:05 GMT
fn:replace: Tweaks

Some suggestions for fn:replace:

A. Simplify/unify retrieval of the replacement value

  • If $action is present, call the function.
  • If $replacement is present, obtain its value.
  • If no argument is present, use the empty sequence.

See https://github.com/qt4cg/qtspecs/issues/104#issuecomment-1210506639 for similar replacements in maps and arrays.

B. Simplify actions by relaxing types:

  1. Pass on matches as xs:untypedAtomic items.
  2. Apply fn:string to the result of the action (or even fn:string-join, if we allow sequences as return values).

C. Always use xs:string? for $flags

…affects fn:matches, fn:tokenize, and fn:analyze-string.

Resulting function signature

fn:replace(
  $value        as xs:string?,
  $pattern      as xs:string,
  $replacement  as xs:string?                                                  := '',
  $flags        as xs:string?                                                  := '',
  $action       as (function(xs:untypedAtomic, xs:untypedAtomic*) as item()?)? := ()
) as xs:string

Examples

…taken from the specification draft and simplified/fixed:

fn:replace("Chapter 9", "[0-9]+", action := function { . + 1 }),

fn:replace(
  "57°43′30″",
  "([0-9]+)°([0-9]+)′([0-9]+)″",
  action := function($matches) {
    ($matches[1] + $matches[2] div 60 + $matches[3] div 3600) || '°'
  }
)

Issue #127 created #created-127

16 Aug at 09:40:31 GMT
[XPath] [XQuery] Function items: Align "function" and "->"

I think that function and -> should be 100% interchangeable:

Current Rule:
[83] InlineFunctionExpr  ::=  (("function" FunctionSignature) | ("->" FunctionSignature?)) FunctionBody

Proposed:
[83] InlineFunctionExpr  ::=  (("function" | "->") FunctionSignature?) FunctionBody

And an editorial note: In https://qt4cg.org/branch/master/xquery-40/xquery-40-diff.html#id-inline-func, the expanded syntax needs to be changed from

function($x as item()} as item()* {$x -> {EXPR}}

to

function($x as item()} as item()* { $x ! EXPR }

Issue #126 created #created-126

16 Aug at 09:25:02 GMT
Mathematical Operator Unicode Symbols

As programming fonts with ligatures get more and more common, we should ensure that Unicode symbols that are defined as equivalents for ASCII operators will not be mixed up with ligatures.

In particular, it’s the aliases for value comparisons that I believe should be used for general comparison operators instead:

  • <= (instead ofle)
  • >= (instead ofge)
  • != (instead of ne)
  • , , …drop them?

The full list of the currently defined aliases: https://qt4cg.org/branch/master/xquery-40/xquery-40-diff.html#id-math-symbols

Issue #125 created #created-125

15 Aug at 10:02:26 GMT
array:partition → fn:partition: empty results; examples

1. I think that an empty sequence should be returned if array:partition is invoked with an empty array. See e.g. fn:tokenize(''), which also returns an empty sequence and no zero-character string.

2. The last two example queries need to be revised, e.g. as follows:

array:partition(
  tokenize("In the beginning was the word"),
  function($previous, $current) { sum(($previous, $current) ! string-length()) gt 10 }
)

array:partition(
  (1, 2, 3, 6, 7, 9, 10),
  function($seq, $new) { not($new = $seq[last()] + 1) }
)

Maybe an affirming $add-when function is more intuitive than $break-when, in particular for function bodies with sequence operations and general comparisons.

Issue #10 closed #closed-10

11 Aug at 23:20:11 GMT

[FO] fn:filter with a function returning empty sequence

Issue #61 closed #closed-61

11 Aug at 22:57:10 GMT

[FO] fn:all and fn:some have an xs:integer* return type, but describe an xs:boolean return type

Issue #81 closed #closed-81

11 Aug at 22:46:08 GMT

[xslt30] Typo in §4.4

Issue #102 closed #closed-102

11 Aug at 22:42:30 GMT

[xslt30] Meaning of the term "lexical space"

Issue #117 closed #closed-117

11 Aug at 22:22:19 GMT

Downcasting (relabelling) in the coercion rules

Issue #120 closed #closed-120

11 Aug at 22:11:23 GMT

Typo: "stremability"

Issue #124 created #created-124

10 Aug at 19:16:07 GMT
[XPath] [XQuery] Incorrect subtype-itemtype rules for pure and local union types

Looking at https://github.com/qt4cg/qtspecs/issues/122, I've identified a possible gap in the logic for pure union types and LocalUnionTypes. Specifically, the rules are defined for when A is one of these union types but not when B is one of these union types.

That is, under the current 4.0 draft rules:

  1. subtype-itemtype(union(xs:string, xs:integer), xs:string) is defined and will return false (xs:integer is not a subtype of xs:string).
  2. subtype-itemtype(xs:string, union(xs:string, xs:integer)) is not defined and will return false even though the union type supports xs:string as one of its member types.

Note that in the earlier standard versions, the pure union type case is handled by derives-from(AT, ET):

  1. ET is a pure union type of which AT is a member type

With derives-from(AT, ET) only being applied in the case when both are atomic types

Draft Wording

  1. Conditions for atomic and union types: a. A and B are AtomicOrUnionTypes, and derives-from(A, B) returns true. b. A is a LocalUnionType in the form union(T1, T2, ...) and every type T in (T1, T2, ...) satisfies subtype-itemType(T, B). c. B is a LocalUnionType in the form union(T1, T2, ...) and any type T in (T1, T2, ...) satisfies subtype-itemType(A, T).

Design Note: There is no need for a rule when A is a pure union type as that is covered by the "There is a type MT such that derives-from( AT, MT ) and derives-from( MT, ET )" rule for derives-from.

Issue #123 created #created-123

09 Aug at 17:33:59 GMT
fn:duplicate-values

Motivation

The new function fn:all-equal can be used for consistency checks (e.g., to verify if IDs are distinct). Often, however, developers rather need to find the actual values that exist more than once. A fn:duplicate-values (or fn:duplicates) function could fill that gap.

Summary

Returns values that appear more than once in a sequence. Values are compared according to the rules of the fn:distinct-values function.

Signature

fn:duplicate-values(
  $values     as xs:anyAtomicType*,
  $collation  as xs:string?        := ()
) as xs:anyAtomicType*

Use Case

Find the values of duplicate IDs in a sequence:

let $ids := duplicate-values(//@id)
where exists($ids)
return error((), 'Duplicate IDs found: ' || string-join($ids, ', '))

Examples

Query | Result --- | --- fn:duplicate-values((1, '2', 3.0)) | empty sequence fn:duplicate-values(('id1', 'id10', 'id1')) | id1 fn:duplicate-values((1, 1 to 2, 1 to 3)) | 1, 2 fn:duplicate-values((1, 1.0, 1e0)) | 1 (as with fn:distinct-values, items may be of different type) fn:duplicate-values(1 to 1000000000000000000) | empty sequence fn:duplicate-values(()) | empty sequence

Equivalent Expression

for $group in $values
group by $value := $group (: collation 'value of $collation' :)
where count($group) > 1
return $value

Advantages vs. Drawbacks

+ simple and intuitive to understand as proposed + behavior analogous to distinct-values – narrow focus (similar as distinct-values) – “yet another convenience function”

Maybe it boils down to if we want to have more easy-to-use convenience functions, or if we rather wish to keep the set of functions limited.

Issue #122 created #created-122

09 Aug at 12:20:01 GMT
Support general union sequence types

Use Case

There are a number of cases where it is beneficial to define a type more precisely (specifically in parameters and return types) as a union of item or sequence types, for example:

  1. a binary type over xs:hexBinary and xs:base64Binary;
  2. an element that accepts ol or ul html list element names;
  3. an options parameter that accepts strings (xs:string*) an element (element(options)) or a map;
  4. a function that takes JSON types (map, array, xs:integer, xs:decimal, xs:string).

There are a number of MarkLogic APIs that make use of this. Several EXPath and EXQuery specifications can take advantage of this. I've also used this in my XQuery IntelliJ plugin when defining vendor APIs that have changed over the different versions.

Examples

(: BaseX API change in 8.5 :)
declare function archive:options($archive as xs:base64Binary)
     as (element(archive:options) | map(*)) external;

declare function html:list($list as (element(ol) | element(ul))) { ... };

(: https://docs.marklogic.com/cts:classify -- MarkLogic defines this as `(element() | map:map)?` :)
declare function cts:classify($data-nodes as node()*,
                              $classifier as element(cts:classifier),
                              $options as (element()? | map:map?))
     as element(cts:label)* external;

(: https://docs.marklogic.com/cts:search :)
declare function cts:search($expression as node()*,
                            $query as cts:query?,
                            $options as (cts:order* | xs:string*))
     as node()* external;

Existing Support

  1. Local Union Types -- This handles support for unions over atomic types.
  2. https://github.com/qt4cg/qtspecs/issues/23 -- This provides a more concise syntax for unions over element or attribute names.
  3. Types -- The Formal Semantics specification defines union types.
  4. Sequence Type Union -- This is the definition in my XQuery IntelliJ plugin.

Note: Due to SequenceTypeUnion being present in typeswitch expressions XQuery implementations will have existing code to handle matching these unioned types.

Syntax

3.4 Sequence Types

SequenceTypeUnion ::= SequenceType  ("|"  SequenceType)*
SequenceType ::= EmptySequenceType | (ItemType OccurrenceIndicator?) | ParenthesizedSequenceType
EmptySequenceType ::= "empty-sequence" "(" ")"
ParenthesizedSequenceType ::= "(" SequenceTypeUnion ")"
ItemType ::= AnyItemTest | TypeName | KindTest | FunctionTest | MapTest | ArrayTest |
             AtomicOrUnionType | RecordTest | LocalUnionType | EnumerationType

Design Note: SequenceTypeUnion is an existing BNF symbol used in typeswitch expressions that is unchanged in this issue.

3.6 Item Types

ItemTypeUnion ::= ItemType  ("|"  ItemType)*
ParenthesizedItemType ::= "("  ItemTypeUnion  ")"
ParenthesizableItemType ::= ItemType | ParenthesizedItemType

Design Note: ItemTypeUnion mirrors SequenceTypeUnion, allowing the non-sequence unions to be used in the contexts where only item types are allowed. Implementations can make use of the SequenceTypeUnion logic after the syntax/parser validates the item type restriction in those contexts.

Design Note: An alternative to this -- in order to minimize grammar changes -- would be to replace the ItemType with an ItemTypeBase symbol (or appropriately named alternative), and then define ItemType accordingly: ItemTypeBase ::= AnyItemTest | TypeName | KindTest | ... ItemTypeUnion ::= ItemTypeBase ("|" ItemTypeBase)* ItemType ::= ItemTypeBase | ParenthesizedItemType SequenceType ::= EmptySequenceType | (ItemTypeBase OccurrenceIndicator?) | ParenthesizedSequenceType

Other Changes

Design Notes: If ItemType is changed to ParenthesizableItemType, these are the other areas in the current XPath/XQuery 4.0 grammar that need changing.

ContextItemDecl ::= "declare"  "context"  "item"  ("as"  ParenthesizableItemType)?
                    ((":="  VarValue)  |  ("external"  (":="  VarDefaultValue)?))
ItemTypeDecl ::= "item-type" EQName "as" ParenthesizableItemType
TypedMapTest ::= "map" "(" ParenthesizableItemType "," SequenceType ")"
LocalUnionType ::= "union" "(" ParenthesizableItemType ("," ParenthesizableItemType)* ")"

Text

4.22.2 Typeswitch

The effective case definition is defined as:

The effective case in a typeswitch expression is the first case clause in which the value of the operand expression matches a SequenceType in the SequenceTypeUnion of the case clause, using the rules of SequenceType matching.

In order to make that fit this proposal, the wording should be updated to something like:

The effective case in a typeswitch expression is the first case clause in which the value of the operand expression matches the SequenceTypeUnion of the case clause, using the rules of SequenceType matching.

3.7.2 The judgement subtype-itemtype(A, B)

Section (2) Conditions for atomic and union types: should add the following rules:

  1. A is an ItemTypeUnion in the form (T1 | T2 | ...) and every type T in (T1, T2, ...) satisfies subtype-itemType(T, B).
  2. B is an ItemTypeUnion in the form (T1 | T2 | ...) and any type T in (T1, T2, ...) satisfies subtype-itemType(A, T).

3.7.1 The judgement subtype(A, B)

The first paragraph in this section shall be replaced by:

The judgement subtype(A, B) determines if the sequence type A is a subtype of the sequence type B. A can either be empty-sequence(), xs:error, an ItemType, Ai, possibly followed by an occurrence indicator, or a SequenceTypeUnion. Similarly B can either be empty-sequence(), xs:error, an ItemType, Bi, possibly followed by an occurrence indicator, or a SequenceTypeUnion.

The result of the subtype(A, B) judgement can be determined as follows:

  1. If A is a SequenceTypeUnion in the form (T1 | T2 | ...) and every type T in (T1, T2, ...) satisfies subtype(T, B), then subtype(A, B) is true.
  2. If B is a SequenceTypeUnion in the form (T1 | T2 | ...) and any type T in (T1, T2, ...) satisfies subtype(A, T), then subtype(A, B) is true.
  3. Otherwise, the result of the subtype(A, B) judgement can be determined from the table below, which makes use of the auxiliary judgement subtype-itemtype(Ai, Bi) defined in 3.7.2 The judgement subtype-itemtype(A, B) .

Issue #103 closed #closed-103

09 Aug at 09:24:12 GMT

fn:all, fn:some

Issue #121 created #created-121

08 Aug at 19:58:48 GMT
[FO] fn:nl, fn:tab, fn:cr

The most popular custom functions in BaseX, and the most boring ones, allow users to insert new line and tab characters. It would be nice to see official variants added to the spec:

Function | Returned character --- | --- fn:nl() as xs:string | end of line (&#10;, &NewLine;) fn:tab() as xs:string | character tabulation (&#9;, &Tab;) fn:cr() as xs:string | carriage return (&#13;)

The third function can possibly be dropped.

Issue #120 created #created-120

25 Jul at 13:14:18 GMT
Typo: "stremability"

I noticed it in 3.5.3.3 Overriding Components from a Used Package.

Issue #119 created #created-119

16 Jul at 22:25:38 GMT
Allow a map's key value to be any sequence

Since being introduced in XSLT 3.0 and later in XPath 3.1 the map datatype has become a powerful and expressive tool for programming in XPath.

At present the value of a key of a map can be "an arbitrary atomic value", thus a sequence of zero or more than one atomic items cannot be used in a map-key specification.

Besides giving us an almost 1 : 1 correspondence to a JSON object (when used with arrays, which themselves can be thought of as maps) maps are useful for expressing the tabular representation of a function that has one argument of type xs:anyAtomicType.

It is not possible using a map to naturally express the tabular representation of a function having two or more (or 0) arguments. While something like this can be done using nested maps as in the example below, this technique is cumbersome and error-prone even when having two arguments, and almost prohibitively difficult when applied to expressing functions with more than 2 arguments.

Here is how we could express one possible tabular form of the function M**N (M to the power of N), where the two arguments are of type xs:positiveInteger:

let $m1 := map {1 : 1, 2 : 2, 3 : 3, 4 : 4, 5 : 5, 6 : 6, 7 : 7, 8 :  8, 9 : 9, 10 :10},
    $m2 := map {1 : 1, 2 : 4, 3 : 9, 4 : 16, 5 : 25, 6 : 36, 7 : 49, 8 : 64, 9 : 81, 10 :100},
    $m3 := map {1 : 1, 2 : 8, 3 : 27, 4 : 64, 5 : 125, 6 : 216, 7 : 343, 8 : 512, 9 : 729, 10 :1000},
    (:  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . :)  
  
    $M := map{ 1: $m1,  2: $m2,  3: $m3 (: ... :)  }
  return $M(3)(4)

Here evaluating $M(3)(4) produces the value 4**3 (4 to the power of 3), that is: 64

This proposal is to expand the allowed value-space for a key of a map to any sequence.

Thus one will be able to write:

let $M := map{ (2, 2) : 4, (3, 2) : 9, (: . . . :)
               (2, 3) : 8, (3, 3) : 27 (: . . . :)
}
 return
    $M((2, 3))

And evaluating this returns the correct result 8.

Possible implementation: The implementation is very straightforward: just use in op:same-key as comparison function an improvement of fn:deep-equal(), which has similar behavior, but never throws errors, is context-free, error-free and transitive. Due to lack of fantasy I called this function deep-equal-safe() and its description is here

Note: No new datatypes and no changes to the XDM are necessary in order to implement this proposal.

Issue #118 created #created-118

21 Jun at 16:38:15 GMT
xsl:match - can we do better

The draft spec proposes an instruction xsl:match to test whether a given item matches a specified pattern, returning a boolean.

While this fills a gap, it's rather clumsy, especially because instructions that return atomic values typically have to be wrapped in xsl:variable or xsl:function to be useful.

It might be better to try and make xsl:apply-templates work more nicely as a function.

The current <xsl:match select="N" match="P"/> is roughly equivalent to <xsl:apply-template mode="test-pattern" select="N"/> where

<xsl:mode name="test-pattern" as="xs:boolean">
   <xsl:template match="P"><xsl:sequence select="true()"/></xsl:template>
   <xsl:template match="."><xsl:sequence select="false()"/></xsl:template>
</xsl:mode>

Can we improve this?

(a) we could allow <xsl:mode name="test-pattern" function-name="my:test"/> so that a function call my:test(XXX) is equivalent to the instruction <xsl:apply-templates mode="test-pattern" select="X"/>: that is, each mode can declare the name of a function whose effect is to apply templates in that mode (with no parameters).

(b) we could allow a select attribute on xsl:template to provide a quick way of returning the result, avoiding xsl:sequence.

(c) we could allow <xsl:mode on-no-match="return false()"/> to avoid the fallback template rule.

The use-case would then become

<xsl:mode name="test-pattern" as="xs:boolean" on-no-match="return false()" function-name="my:test">
   <xsl:template match="P" select="true()"/>
</xsl:mode>

Issue #117 created #created-117

21 Jun at 16:16:30 GMT
Downcasting (relabelling) in the coercion rules

The proposed coercion rules (aka function conversion rules) permitting down-casting (aka "relabelling") introduce a backwards incompatibility.

For example in XSLT3 test case as-1711 we have

  <xslt:variable name="var1" select="/doc-schemaas/elem-NMTOKEN" as="xs:token"/>

   <xslt:template match="/doc-schemaas">
            <xslt:value-of select="$var1 instance of xs:NMTOKEN"/>
   </xslt:template> 

where the result of atomising elem-NMTOKEN is of type xs:NMTOKEN.

Under the new rules this is "relabelled" as xs:token, causing the "instance of" test to return false, where in XSLT 2.0/3.0 it returned true.

I think the relabelling rules should probably be amended so that if the supplied value is already an instance of the required type, no relabelling takes place - it retains its existing type.

Issue #116 created #created-116

19 May at 09:52:50 GMT
Clarify the fn:transform function() wrt multiple top-level elements

The fn:transform function should clarify the expected behavior when the stylesheet node passed in has two top-level xsl:stylesheet elements.

Personally, I think this should be an error.

Issue #115 created #created-115

04 May at 18:55:31 GMT
Lookup operator on arrays of maps

I've been converting the XMark data files and queries from XML to JSON.

Here's part of Q20 in its XML form:

 <preferred>
  {count (/site/people/person/profile[@income >= 100000])}
 </preferred>

which becomes this, when we access the JSON form of the data:

 <preferred>
  {count (?people?*?profile[?income >= 100000])}
 </preferred>

It can get worse, for example Q16 has

exists ($a?annotation?description?parlist?*?parlist?*?text?*?emph?*?keyword?*?("§"))

These paths arise because a structured derived from JSON often includes map entries whose values are arrays.

It's very hard to get these paths right, and it's hard to produce good diagnostics when you get them wrong.

I'd like to allow the ?*? "operators" to be replaced with a simple "?". This isn't difficult. Currently the rules for the lookup operator say:

If the context item is an array:
If the [KeySpecifier] is an NCName, the [UnaryLookup](https://www.w3.org/TR/xpath-31/#doc-xpath31-UnaryLookup) operator raises a type error 

All that's needed is to change this to say that if the context item is an array, and the KeySpecifier is an NCName, then the array must be an array of maps and the lookup is applied to these maps.

Issue #114 created #created-114

04 May at 11:07:20 GMT
[fo] array:index-where

There is a need for an array:index-where() function to operate on arrays in the same way as fn:index-where() operates on sequences/

Use case: consider XMark query Q4, which uses the << operator:

(: Q4. List the reserves of those open auctions where a
       certain person issued a bid before another person. :)

for    $b in /site/open_auctions/open_auction
where  $b/bidder/personref[@person="person18829"] <<
            $b/bidder/personref[@person="person10487"]
return <history>{ $b/reserve }</history>

With the bidders held in an array like this:

     "bidders": [
        { "increase":18, "time":"13:16:15", "date":"2001-06-13", "personref":"person0" },
        { "increase":12, "time":"11:29:44", "date":"2000-09-18", "personref":"person23" },
        { "increase":18, "time":"10:23:59", "date":"1998-01-07", "personref":"person14" },
        { "increase":4.5, "time":"14:00:39", "date":"2001-07-10", "personref":"person16" }
      ],

the best way of expressing this query seems to be:

for    $b in ?open_auctions?*
let    $bidders := $b?bidders
where  array:index-where($bidders, ->($bidder) {$bidder?personref="person18829"}) <
            array:index-where($bidders, ->($bidder) {$bidder?personref="person10487"})
return <history>{ $b?reserve }</history>

Issue #113 created #created-113

04 May at 07:53:23 GMT
[xslt] Constructing arrays

I've felt for a while that the current proposal for xsl:array is messy. It's both semantically and syntactically messy with it's composite=yes|no attribute and the xsl:array-member child. I've been using it doing XML to JSON conversion and you get a lot of stuff like this:

<xsl:template match="closed_auctions">
      <xsl:array>
         <xsl:for-each select="closed_auction">
            <xsl:map>
               <xsl:apply-templates select="*"/>
            </xsl:map>
         </xsl:for-each>
      </xsl:array>
   </xsl:template>

Almost invariably, xsl:array has xsl:for-each or xsl:apply-templates as a child. So how about allowing:

<xsl:template match="closed_auctions">
         <xsl:for-each select="closed_auction" form="array">
            <xsl:map>
               <xsl:apply-templates select="*"/>
            </xsl:map>
         </xsl:for-each>
   </xsl:template>

The semantics here is that xsl:for-each delivers an array in which there is one member for each item in the input sequence. This cleanly eliminates the need for composite=yes|no and xsl:array-member: you can create a "composite" array using

<xsl:for-each select="1 to 5" form="array">
   <xsl:sequence select="., .+1"/>
</xsl:for-each>

which delivers [(1,2), (2,3), (3,4), (4,5), (5,6)].

The attribute form="array" can also appear on xsl:apply-templates and xsl:for-each-group. In the latter case each group produces one member of the resulting array:

<xsl:for-each-group select="0 to 9" group-adjacent="0 idiv 5" form="array">
   <xsl:sequence select="current-group()"/>
</xsl:for-each-group>

delivers [(0,1,2,3,4), (5,6,7,8,9)]

The attribute form="sequence" is the default and specifies the current behaviour.

I've been wondering also about extending this to form="map". In most cases when you construct a map from an input sequence, both the key and the value are functions of the input item. So instead of:

<xsl:template match="regions">
      <xsl:map>
         <xsl:for-each select="*">
            <xsl:map-entry key="name()">
               <xsl:array>
                  <xsl:for-each select="item">
                     <xsl:array-member>
                        <xsl:apply-templates select="."/>
                     </xsl:array-member>
                  </xsl:for-each>
               </xsl:array>
            </xsl:map-entry>
         </xsl:for-each>
      </xsl:map>
   </xsl:template>

we could write:

<xsl:template match="regions">
         <xsl:for-each select="*" form="map" key="name()">
                  <xsl:apply-templates select="item" form="array"/>
         </xsl:for-each>
 </xsl:template>

which strikes me as an improvement...

Issue #112 created #created-112

30 Mar at 15:28:37 GMT
Abbreviate `map:function($someMap)` to `$someMap?function()`

The title says it all.

We already have a proposal to be able to have (when $m is a map and $x is a variable of type xs:string):

$m?$x

instead of: map:get($m, $x)

The next logical step is to allow the RHS of the ? operator to be any of the standard functions in the namespace "http://www.w3.org/2005/xpath-functions/map" (with the standard prefix "map:".

Thus, instead of map:keys($m) or $m => map:keys()

One would simply write:

$m?keys()

Issue #111 created #created-111

04 Feb at 23:02:52 GMT
FLWOR tracing

I've been developing a complex query involving grouping, windowing, and sorting, and finding it very hard to debug.

I propose a new clause that can be included anywhere in a FLWOR expression

TraceClause ::= "trace" Expr

for example

for $item in //items
trace "input: ", $item/@id
group by $sku := $item/sku
trace "group: ", data($sku), " items: ", data($item/@id)
return count($item) 

As with fn:trace(), the precise output is implementation-dependent.

The trace clause passes the incoming tuple stream unchanged to the next clause in the pipeline, with the side effect of evaluating an expression in the context of the variables defined in that tuple stream and displaying the value of the expression in an implementation-defined way.

Issue #110 created #created-110

23 Jan at 11:20:13 GMT
JSON templates

JSON templates

JSON templates are introduced as a convenient way of constructing maps and arrays, especially for use in a stylesheet designed to deliver serialised JSON output (but not restricted to that use case).

A new kind of XPath expression called a JSON template is introduced. The following specification is informal.

The syntax is

json-template ::= "[#" parameterized-json "#]"

The syntax of parameterised-json is a modified form of the JSON syntax.

There are three modifications:

(a) single quotes can be used instead of double-quotes

(b) wherever a JSON value is permitted (including a string used as a key), we allow a parenthesised XPath expression

For example

{ "sum": ($x + $y),
  "difference": ($x - $y),
  ($extra) : true }

The rules for what the XPath expression may return are essentially the same as the JSON serialization rules, except that if the value is a node, it is atomised rather than serialized.

(c) within an array, wherever an array element may appear, we allow the syntax "*(" expression ")") to deliver a sequence of array elements. For example

[ 1, 2, *(5 to 10)]

returns the array [1, 2, 3, 5, 6, 7, 8, 9, 10]

Similarly, within an object, wherever a member may appear, we allow the syntax "*(" expression ")") to deliver a sequence of members. The XPath expression must deliver a map or sequence of maps, whose entries are used as the members of the target object. For example

{ "A":1, "B":2, *(//book ! map{isbn : price}) } Entries with duplicate keys result in an error.

Note: an XPath expression appearing within a json-template may of course contain nested json-templates.

JSON templates can be embedded in XPath using the syntax "[#" parameterized-json "#]", and we also allow them to be embedded in XSLT using the instruction:

<xsl:json-template> parameterized-json </xsl:json-template>

Issue #109 created #created-109

20 Jan at 20:40:36 GMT
[xslt4] xsl:note for structured documentation

I propose that we add an element xsl:note whose intended use is for structured documentation.

The element may appear anywhere and may have any attributes and content, including elements in the XSLT namespace. Any xsl:note elements in the stylesheet are stripped (together with their attributes and descendants) at the same time as comments and processing instructions are stripped. (Rules such as "xsl:param must come first" or "xsl:apply-imports must be empty" thus apply to the stylesheet AFTER xsl:note elements are stripped.)

Traditionally, structured documentation comments have been written in a third-party namespace. This approach has two disadvantages:

(a) they can only appear in a limited number of places (typically as children of xsl:stylesheet)

(b) they require an extra namespace to appear on the xsl:stylesheet element, and this namespace has semantics that affect the stylesheet execution. It even needs to be carried through to run-time, in case someone (for example) tries to cast a dynamic string to a QName.

The spec would say very little about xsl:note, except that it is permitted and ignored by the XSLT processor. It might offer some usage suggestions:

(a) an xsl:note appearing as the first child of xsl:package or xsl:stylesheet should be taken as pertaining to the package or stylesheet as a whole; an xsl:note appearing anywhere else should be taken as pertaining to the first following sibling element that is not an xsl:note (if there is one)

(b) third party software that takes account of xsl:note (for example, a tool that generates documentation) should only recognise xsl:note elements that specify target=XXX where XXX is a string that they decide upon.

(c) XSLT processors are discouraged from using xsl:note elements to modify the behaviour of the processor in any way, for example in ways that change the output or the performance; they are expressly not allowed to use xsl:note elements to trigger non-conformant behaviour.

Issue #108 created #created-108

19 Jan at 08:30:58 GMT
Template match using values of [tunnel] parameters

It would often be useful to make a template match conditional on the values of the supplied parameters, especially tunnel parameters.

This is especially the case when matching JSON-derived structures (maps and arrays), as no context information is then available via the ancestor axis.

Obviously, the parameters are not in scope within the match pattern, and I don't propose to change that. Instead I propose that xsl:param (when used in a template rule) should have a test="expression" attribute. The expression may refer to the parameter being declared (and to no other parameters or local variables). For the template rule to match, any parameters having a test attribute must be satisfied: specifically, the test expression must have an effective boolean value of true. If no value is supplied for the parameter, then if required="yes" is specified, the template rule does not match; if required="no" is specified, the test is applied to its default value. The focus for evaluating the test expression is absent. The existence of the test has no effect on the priority of the template rule. An error evaluating the test expression means that the template rule does not match.

For example, this template rule

<xsl:template match="record(long, lat)">
  <xsl:param name="country" test="$country = 'UK'" tunnel="yes" required="yes"/>
  ....
</xsl:template>

matches only if the tunnel parameter $country is present with the value "UK".

In some cases this capability can substitute for modes, except that the values are entirely dynamic.

Issue #107 created #created-107

04 Jan at 22:01:47 GMT
Allow self::(a|b|c)

After an explicit axis specifier (including the abbreviated axis specifier "@", but not including the default axis specifier), allow a composite NodeTest that consists of a |-separated list of NodeTest's in parentheses. For example

@(id|name)

ancestor::(section|chapter)

child::(comment()|processing-instruction())

The composite NodeTest matches a node if any of its constituent NodeTest's matches.

Issue #106 created #created-106

02 Jan at 02:25:38 GMT
Decorators' support

Decorators

A Decorator is a tool to wrap (extend/change/modify) the behavior of one or more existing functions without modifying their code.

There are many cases when we want to handle a call to a specific function f() and do some or all of the following:

  1. Perform some initial action based on the context and on the arguments in the call to f().
  2. Transform the set of the actual arguments on the call to some other set of argument values -- substituted arguments.
  3. Decide whether or not to invoke f(), passing to it the actual arguments or the substituted ones, created in the previous step.
  4. If we invoked the function in the previous step, we could do something with its result.
  5. Perform some final processing that may (but does not have to) depend on the result of the invocation of f().

Here is a small, typical problem that is well handled with decorators:

We want to have a tool to help us with debugging any function. When a function is called (and if we are in Debug mode), this tool will:

  • Tell us that a call to the function was performed and will list the function name and the parameters, passed to the function
  • Tell us what the result of the call to the function was

We will be able to do such tracing with not just one but with all functions, whose behavior we want to observe.

let $trace-decorator := function($debug as xs:boolean, $f as function(*))
    {
      let $theDecorator := function($args as array(*), $kw-args as map(*))
      {
        if($debug)
        then 
          let $func-name := (function-name($f), $kw-args("$funcName"))[1],
            $pre-msg := "Calling function " || $func-name || " with params: " || array:flatten(($args))
            || "," || map:for-each( $kw-args, function($key as xs:anyAtomicType, $val)
                                          {if($key ne "$funcName")
                                            then (" "|| string($key)||": " || string($val))
                                            else ()
                                          }),
            
            $result := $f($args, $kw-args),
            $post-msg := "Got result: " || string($result) || "&#xA;"
           return
             ($pre-msg, $post-msg)
         else $f($args, $kw-args)
      }
      return $theDecorator
    },
    
    $upper := function($args as array(*), $kw-args as map(*))
    {
      let $txt := $args[1]
        return upper-case($txt)
    }
    
    return 
      (
        $trace-decorator(true(), $upper)(["hello"], map{"$funcName" : "$upper"}),
        $trace-decorator(true(), $upper)(["reader"], map{"$funcName" : "$upper"}),
        "=======================================================================",
        $trace-decorator(false(), $upper)(["hello"], map{"$funcName" : "$upper"}),
        $trace-decorator(false(), $upper)(["reader"], map{"$funcName" : "$upper"})
       )

The result of evaluating the above XPath 3.1 expression is exactly what we wanted to get:

Calling function $upper with params: hello, Got result: HELLO

Calling function $upper with params: reader, Got result: READER

======================================================================= HELLO READER

So, it is possible to write and use decorators even in XPath 3.1 as above. Then why XPath decorators are as rare as the white peacock?

peacock Photo Via: aboutpetlife.com

The answer is simple: just try to write even the simplest decorator in XPath 3.1 and you'll know how difficult and error-prone this is. This is why several programming languages provide special support for decorators:

  • Python: decorators are a standard feature of the language.

    • A standard syntax is provided to declare that a function is being decorated by another function. Composition of multiple decorators is supported.
    • There is a standard Python way of getting "any actual arguments" with which the unknown in advance function (to be decorated) is called.
    • There is a standard Python way to call any function passing to it just an array (for its positional arguments) and a map (for its keyword arguments).
  • Typescript:

    • A standard syntax is provided to declare that a function is being decorated by another function. Composition of multiple decorators is supported.
    • Decorators can be applied not only to methods but also to their parameters (and to classes, constructors, static and instance properties, accessors).
    • The actual parameters in calling the manipulated method are accessed in a standard way as a spread (the opposite of destructuring). The spread syntax is used both for getting the parameters and in providing them in the call to the manipulated method.
  • Javascript : Almost the same as in Typescript (above)

  • .NET/C#: C# Attributes, and in particular Dynamic Proxies on the Fly in the Castle project.

  • Java: A PerfectJpattern implementation

Goal of this proposal

To provide standard XPath support for decorators, as seen in other languages (above):

  1. Provide syntax for specifying the decoration of a function:

Update rule:

[72] | FullSignature | ::= | "function" "(" ParamList? ")" TypeDeclaration? -- | -- | -- | --

To:

[72] | FullSignature | ::= | ("^" DecoratorReference)* "function" "(" ParamList? ")" TypeDeclaration? -- | -- | -- | --

And add this new rule:

[NN] | DecoratorReference | ::= | VarRef ArgumentList? | FunctionCall -- | -- | -- | --

  1. When a decorated function is called, the XPath processor should create from the actual arguments of the call an array $args that holds all positionally-specified arguments in the call, and a map $kw-args that will hold the name - value pairs of all keyword-arguments in the call. Then the decorator will be called with these two arguments: ($args, $kw-args) in addition to any of its own positional arguments (if it has any).

    ^decorator-name $funcName

    is converted behind the scenes by the XPath engine to:

    let $funcName := decorator-name($funcName)
      return
      ...remaining code in the same scope
    

    This redefining of the inline-function name allows us to reference the result of the decoration using the same function item name ($funcName) in the remainder of the current scope.

  2. When any function is called just with two arguments: ($args, $kw-args) (such as from a decorator), the XPath processor must perform everything necessary in order to call the function in the way it expects to be called. For this purpose, a new overload of fn:apply() is defined (proposed separately here):

fn:apply($function as function(*), $array as array(*), $map as map(*)) as item()*

The result of the function is obtained by creating and invoking the same dynamic call that would be the result of a function-call to $function with (positional) arguments taken from the members of the supplied array $array and (keyword arguments) taken from $map.

The effect of calling fn:apply($f, [$a, $b, $c, ...], map{"k1" : v1, "k2" : v2, ...}) is the same as the effect of the dynamic function call resulting from $function($a, $b, $c, ...., $k1 = v1, $k2 = v2, ...). The function conversion rules are applied to the supplied arguments in the usual way.


Example:

With this support added to the language, we simply write:

let $f := $trace-decorator(true()) ( upper-case#1 )
  return
    ("hello", "reader") ! $f()

which produces the same correct result:

Calling function upper-case with params: hello, Got result: HELLO

Calling function upper-case with params: reader, Got result: READER

And this:

let $f := $trace-decorator(false())  ( upper-case#1 )
  return
    ("hello", "reader") ! $f()

Produces just the normal result of executing the original function $f(), as the $debug argument is false() here

HELLO READER

When we decorate an inline-function (say the $upper from the initial example), we can even simply write:

^$trace-decorator(true())  $upper,
.   .   .   .   .
  return
    ("hello", "reader") ! $upper()

to get again the the correct result:

Calling function $upper with params: hello, Got result: HELLO

Calling function $upper with params: reader, Got result: READER

Do note:

  1. The decorator and the manipulated function are completely independent of each other and may be written long before / after each other and by different people who may not be aware of each other.
  2. They can reside in different code files.
  3. We can have a library of useful decorator functions and can append them to decorate any wanted function.
  4. As the updated Rule 72 above suggests, one can specify a chain of decorators manipulating a specific function. The inner-most decorator is passed to the next-inner-most-decorator, and so on..., which is passed ... to the outer-most decorator. Decorating is right-associative. The different decorators specified don't know about each other, are completely independent and may be written by different authors at any times and reside in different, unrelated function libraries.