@qt4cg statuses in 2026

This page displays status updates about the QT4 CG project from 2026.

See also recent statuses.

QT4 CG meeting 153 draft minutes #minutes-02-17

17 Feb at 17:30:00 GMT

Draft minutes published.

Issue #2457 closed #closed-2457

17 Feb at 17:09:39 GMT

Improved use of fos:result

Issue #2456 closed #closed-2456

17 Feb at 17:07:37 GMT

Stylesheet handling of fos:result/@narrative

Issue #2234 closed #closed-2234

17 Feb at 17:05:56 GMT

Replace `a/get(XX)` with `a/?(XX)`

Issue #2427 closed #closed-2427

17 Feb at 17:04:31 GMT

Node construction in XPath

Issue #2446 closed #closed-2446

17 Feb at 17:04:30 GMT

2427 Add computed node constructors to XPath

Issue #2459 closed #closed-2459

17 Feb at 14:40:51 GMT

What are "invalid XML characters" in the XPath file read functions?

Issue #2385 closed #closed-2385

17 Feb at 14:31:48 GMT

The XML version of the XPath spec isn't the XML version of the spec, it's HTML

Pull request #2467 created #created-2467

17 Feb at 11:55:55 GMT
Harmonize the fn: and file: functions that read text

Close #2460

This PR harmonizes the functions fn:unparsed-text, fn:unparsed-text-lines, file:read-text, and file:read-text-lines with respect to handling non-permitted characters. Each function has an options parameter and that parameter may contain a fallback function to remap non-permitted characters.

This leaves unresolved the question of what to do about permitted characters not allowed in XML, but that’s orthogonal. Strings containing such characters might arise from any of these functions, but equally, might arise from other operations.

Issue #2466 created #created-2466

16 Feb at 23:26:55 GMT
format-number() precision

The specification of format-number() says:

If there are several such values that are numerically equal to the mantissa (bearing in mind that if the mantissa is an xs:double or xs:float, the comparison will be done by converting the decimal value back to an xs:double or xs:float), the one that is chosen should be one with the smallest possible number of digits not counting leading or trailing zeroes (whether significant or insignificant).

The parenthetical "bearing in mind" note needs updating, because comparison of a decimal to a double is no longer done by converting the decimal back to a double.

Background: XSLT test case format-number-044a, which I have extricated from the composite test format-number-044, formats the x:double obtained as 1E100 div 3. This is giving me a result with sixteen 3s on Java, fifteen 3s on C#. I am trying to work out which is correct, or at any rate which one should be produced according to the above rules.

QT4 CG meeting 153 draft agenda #agenda-02-17

16 Feb at 12:40:00 GMT

Draft agenda published.

Issue #2465 created #created-2465

16 Feb at 11:25:39 GMT
Error description of FODC0006 should be more generic

The error description for FODC0006 can be raised by fn:parse-xml and fn:parse-xml-fragment. It is currently described as

err:FODC0006, String passed to fn:parse-xml is not a well-formed XML document.
Raised by [fn:parse-xml](https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-parse-xml)
if the supplied string is not a well-formed and namespace-well-formed XML document;
or if DTD validation is requested and the document is not valid against its DTD.

I propose to alter the description to

err:FODC0006, String cannot be parsed as XML.
Raised by [fn:parse-xml](https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-parse-xml) or [fn:parse-xml-fragment](https://qt4cg.org/specifications/xpath-functions-40/Overview.html#func-parse-xml-fragment)
if the supplied string is not a well-formed and namespace-well-formed XML document;
or if DTD validation is requested and the document is not valid against its DTD;
or if it was passed to parse-xml-fragment and is not a well-formed external general parsed entity, if it contains entity references other than references to predefined entities, or if a document that incorporates this well-formed parsed entity would not be namespace-well-formed.

Issue #2464 created #created-2464

16 Feb at 09:36:35 GMT
add method to use path()

The description of fn:path() says Returns a path expression that can be used to select the supplied node relative to the root of its containing document, but gives no guidance as to how exactly to use such a path expression to select a node; one can use eval(), or could write an invisible XML grammar, but that'd like saying “if and do-while can be used to write a program to solve chess problems”.

Easiest fix - remove the text saying people can use the result of path() to select nodes, since they can’t.

Slightly harder - add a path-to-node($root, $path) as node()? function.

I’d prefer the function—unless there is one and i missed it, and then i’d prefer the obvious editorial change :-) —but reducing the reader’s expectations would be OK too.

Issue #2463 created #created-2463

16 Feb at 09:29:41 GMT
Add map:apply() or third argument to fn:apply() as a map of names to parameter values.

With keyword arguments being available, it would make sense to have a map:apply($function, $map), in which keys in the map are mapped to argument names of the function.

Admittedly since fn:apply() takes an array and is not called array:apply(), maybe an optional 3rd argument to fn:apply() would work better, with the same rules as for static function application.

Issue #2462 created #created-2462

14 Feb at 10:43:34 GMT
Revert dynamic function calls on sequences

I would like to question the decision to allow function calls on sequences.

While I favored the change in the past, repeated user feedback indicates that the new behavior is confusing and complicates debugging. First, it legalizes cryptic code like ()(). Second, seemingly simple function calls like $add(1, 2) return an empty sequence if $add turns out to be empty.

I believe we should make type safety and readability a priority over convenience. If a short syntax is needed, one can still write $add ! .(1, 2).

Related: #2219 (can be closed if we revert dynamic function calls).

Issue #2461 created #created-2461

14 Feb at 10:33:05 GMT
Unparsed entities

Support for unparsed entities and notations is something of a minority interest, but recent correspondence with a Saxon user reminds me that there are applications that depend on them quite heavily. There is limited support for them in the data model and in XSLT, but none in XPath or XQuery.

There is no logical reason for the two functions unparsed-entity-uri() and unparsed-entity-public-id() to be XSLT-only, other than to save XQuery implementors the trouble of implementing them.

At the same time, the functions are incomplete and inadequate. For example, there is no way of obtaining a complete list of declared entities, and there is no way of getting information about the notations that they refer to.

Many XML parsers do not expose information about entities and notations, so anything we define should be capable of returning a result that indicates "information not available for this document (or for this implementation)".

I propose a new function unparsed-entities($doc) which returns a map along the lines:

{ "entities": {
     "e-name-1": {
         "system-id": ....
         "public-id": ....
         "notation": ....
     }, ....
  },
 "notations":{
    "n-name-1": "system-id", ...
  }
} 

and which is allowed to return an empty sequence if information about unparsed entities is not available (making it possible to provide a trivial fallback implementation).

Issue #2460 created #created-2460

13 Feb at 09:47:04 GMT
file:read-text and invalid XML characters

file:read-text() should be aligned with fn:unparsed-text() to remove the restriction regarding valid XML characters.

Incidentally, bin:decode-string() has never had such a restriction even in the EXPath 1.0 version.

Issue #2459 created #created-2459

12 Feb at 16:23:54 GMT
What are "invalid XML characters" in the XPath file read functions?

Are we applying XML 1.0 rules or XML 1.1 rules? Does the user get to decide? How?

Issue #2458 created #created-2458

11 Feb at 08:19:25 GMT
NodeTests: Unify jnode(X) and get(X)

I can currently write $map/jnode("key") or $map/get("key"), and both have the same meaning.

Both imply the child axis: I can also write $map/child::jnode("key") or $map/child::get("key")

There are differences:

  • jnode also allows a content type to be specified
  • get allows the key to be an arbitrary expression, while jnode requires a Constant
  • jnode allows an NCName-valued key to be written without quotes
  • get allows multiple keys to be selected
  • jnode allows all keys to be selected
  • get also works with XNodes

I propose that we unify these constructs.

To do this we distinguish jnode as an item type from jnode as a selector.

When used as a selector, the first argument of jnode should accept a KeySpecifier rather than a Constant. KeySpecifier is the subset of Expression that we allow after the lookup operator ?. (We should extend it to generalise Literal to Constant) A KeySpecifier allows an arbitrary expression in parentheses; we can debate what focus should be used to evaluate it.

This means, for example, that $array/get($i) becomes $array/jnode($i)

We could apply the same treatment to element() and attribute(), allowing $doc//element($N).

The second optional argument of jnode(), element(), and attribute() is unaffected.

QT4 CG meeting 152 draft minutes #minutes-02-10

10 Feb at 17:55:00 GMT

Draft minutes published.

Issue #2437 closed #closed-2437

10 Feb at 17:00:00 GMT

SimpleNodeTest: TypeTest → RegularItemType?

Issue #2434 closed #closed-2434

10 Feb at 17:00:00 GMT

`fn:has-children`: buggy examples?

Issue #2441 closed #closed-2441

10 Feb at 16:59:59 GMT

2434 Fix inconsistencies with GNode tests in axis steps

Issue #2445 closed #closed-2445

10 Feb at 16:56:51 GMT

fn:element-to-map - ignore `xsi:type` and similar attributes

Issue #2449 closed #closed-2449

10 Feb at 16:56:50 GMT

2445 Add rules for xsi namespace elements in element-to-map

Issue #2453 closed #closed-2453

10 Feb at 16:54:32 GMT

XSLT Patterns: the "child-or-top" adjustment

Issue #2444 closed #closed-2444

10 Feb at 16:54:32 GMT

XSLT Patterns for matching JNodes

Issue #2451 closed #closed-2451

10 Feb at 16:54:31 GMT

2444 Make match="*" and match="N" match element nodes only

Issue #2450 closed #closed-2450

10 Feb at 16:51:51 GMT

JNode types: matching root JNodes

Issue #2452 closed #closed-2452

10 Feb at 16:51:50 GMT

2450 Add jnode((), *) to match root JNodes

Issue #2435 closed #closed-2435

10 Feb at 16:50:51 GMT

Incorrect namespace prefixes in EXPath Binary example

Issue #2439 closed #closed-2439

10 Feb at 16:50:50 GMT

Fix prefix on bin:int-octets example function

Issue #2436 closed #closed-2436

10 Feb at 16:50:04 GMT

`jnode` type: arguments (spec vs. tests)

Issue #2432 closed #closed-2432

10 Feb at 16:49:33 GMT

Constructor Functions: conversions

Issue #2440 closed #closed-2440

10 Feb at 16:49:32 GMT

2432 Clarify effect of coercion on constructor functions

Pull request #2457 created #created-2457

10 Feb at 15:53:17 GMT
Improved use of fos:result

Changes the main function catalog to make improved use of the fos:result element. Specifically:

  • Uses fos:error-result for error examples
  • Makes greater use of explicit results rather than narrative results where possible
  • Uses fos:result narrative="true" to get auto-checking of examples having no definitive result, and improved rendition.

Pull request #2456 created #created-2456

10 Feb at 13:21:16 GMT
Stylesheet handling of fos:result/@narrative

Following a schema change that allows the function catalog to contain code examples annotated <fos:result narrative="true">, this PR makes stylesheet changes allowing such examples to be rendered.

  • In generating the specification documents, narrative results are simply rendered as prose, without containing <code> tags
  • In generating the QT4 tests, the generated test case ensures that the example can be successfully compiled and run, but the test always succeeds so long as the result is not a static or dynamic error.

Issue #2399 closed #closed-2399

10 Feb at 11:47:08 GMT

Canonical JSON Serialization: edge cases

Issue #2418 closed #closed-2418

10 Feb at 11:47:06 GMT

2399b Add rules and advice for JSON output of special numerics

Issue #2455 created #created-2455

09 Feb at 15:12:44 GMT
`file:copy`: creating targets

The current rules of the file:copy function say:

Copies a file or a directory given a source and a target path/URI. The following rules apply if $source points to a file:

  1. if $target does not exist, it will be created.

I think we should refine this rule and prevent this function from creating arbitrary directory structures if simple files are copied.

QT4 CG meeting 152 draft agenda #agenda-02-10

09 Feb at 12:50:00 GMT

Draft agenda published.

Issue #2454 created #created-2454

09 Feb at 10:01:05 GMT
Grammar: literals & constants, negative numbers

I think we could easily tweak the grammar by changing…

Constant     ::=  StringLiteral | ("-"? NumericLiteral) | QNameLiteral | ("true" "(" ")") | ("false" "(" ")")
Literal      ::=  NumericLiteral | StringLiteral | QNameLiteral

…to:

Constant     ::=  Literal | ("true" "(" ")") | ("false" "(" ")")
Literal      ::=  ("-"? NumericLiteral) | StringLiteral | QNameLiteral

As a result, a negative number would be returned by the parser as a literal instead of a unary expression (weird constructs like - - -2344 will still be possible), and the key specifier of a lookup expression could be a negative number.

Issue #2453 created #created-2453

09 Feb at 00:20:18 GMT
XSLT Patterns: the "child-or-top" adjustment

This issue identifies a bug in XSLT 3.0 (retained in XSLT 4.0).

There is a special rule in XSLT 3.0 designed to ensure that the pattern match="a" will match an a element even if it is parentless. Without the special rule, it would not do so, because "a" expands to child::a which would otherwise only match an element that is a child of something.

The rule is written (in §5.5.3):

If any PathExprP in the Pattern is a RelativePathExprP, then the first StepExprP PS of this RelativePathExprP is adjusted to allow it to match a parentless element... If PS uses the child axis (explicitly or implicitly), and if the NodeTest in PS is not document-node() (optionally with arguments), then the axis in step PS is replaced by child-or-top, which is defined as follows. If the context node is a parentless element, comment, processing-instruction, or text node then the child-or-top axis selects the context node; otherwise it selects the children of the context node. It is a forwards axis whose principal node kind is element.

Now consider the pattern match="*/(b union c)". Clearly b is a PathExprP and a RelativePathExprP, so the adjustment applies to its first StepExprP, namely to b. So the pattern should expand to match="*/(child-or-top::b union child-or-top::c)", and a literal reading of the semantics of patterns, in conjunction with the semantics of path expressions, means that this should match a parentless b or c element, whereas the clear intention (and the Saxon implementation) is that it matches a b or c element only if it has an element node parent.

Similarly, consider the pattern match="b[b or c]". Again, the rules suggest that this should be adjusted to match="child-or-top::b[child-or-top::b or child-or-top::c]" which would technically match a parentless b element having no b or c child. This is clearly not intended

Saxon in fact is not applying any syntactic adjustment to the pattern at all. Rather, it is evaluating the pattern steps from right to left, and if a sub pattern has no preceding "/" or "//" operator then it is assuming there are no constraints on the element's ancestry.

I can think of a couple of ways the bug might be fixed.

Firstly, we could try to define more precisely the StepExprP subexpressions that are subjected to this adjustment. It's basically any StepExprP that is not the right-hand operand of "/" or "//", and that is not an operand of a union, intersect, or except expression that is the right hand operand of "/" or "//", and is not contained within a predicate.

Alternatively, we could try to amend the "equivalent expression" rule. We say that a pattern P matches a node N if N has an ancestor-or-self node $A such that the path expression $A//(P) selects N. We could amend this rule to say that if N is parentless, then this rule is evaluated "as if" N had an imaginary parent.

Neither solution is very elegant.

Pull request #2452 created #created-2452

08 Feb at 13:19:37 GMT
2450 Add jnode((), *) to match root JNodes

Fix #2450

Pull request #2451 created #created-2451

07 Feb at 22:44:18 GMT
2444 Make match="*" and match="N" match element nodes only

Fix #2444

The effect of the change is that simple patterns like match="*" and match="order" will only match element nodes, they will no longer match JNodes.

This is motivated firstly by implementation experience: the current rules cause a performance regression compared with XSLT 3.0 because it's harder to do precise static type inferencing, and some streaming use cases are no longer streamable for the same reason.

However, I think there is also a usability benefit. These simple match patterns are instinctively understood by the entire XSLT user population, and extending their meaning so they match things unexpectedly could be a debugging nightmare. Consider the case of a mature stylesheet with hundreds of template rules designed to process XML elements, which is then extended with a new module to handle a JSON representation of the same data; it's very unlikely the user will actually want or intend the same template rules to process both, and if this is what they want, it's clearer to make this explicit by using a union pattern.

Issue #2450 created #created-2450

07 Feb at 22:33:21 GMT
JNode types: matching root JNodes

In the description of JNode types in 3.2.9, there is no explicit statement of what a type like jnode(*, map()) means. In particular, it isn't clear (except perhaps from studying examples) whether it matches a JNode whose selector property is absent (i.e. the root of a JTree). The examples suggest that it does.

For pattern matching in XSLT, it would be useful to have a pattern that ONLY matches the root of a JTree. Perhaps the syntax jnode(/, map()) might serve this purpose.

Pull request #2449 created #created-2449

07 Feb at 21:29:26 GMT
2445 Add rules for xsi namespace elements in element-to-map

Fix #2445

Issue #2448 created #created-2448

07 Feb at 17:33:45 GMT
Change title of XPath specification

The title of the XPath specification is

XML Path Language (XPath) 4.0

No-one actually calls it "XML Path Language" -- and it's now a path language for JSON as well.

I propose we change the title to

XPath 4.0

Issue #2447 created #created-2447

07 Feb at 16:56:58 GMT
Drop string-literals for names in computed constructors

XQuery 3.1 allowed element foo { "bar" }.

And then we discovered syntax ambiguities, so we allowed element "foo" {"bar"}

And then we introduced QName literals, allowing element #foo {"bar"}

We then dropped string literals from the grammar, but there are incorrect examples that use it:

XQuery 4.0 allows the node name to be written in quotation marks (for example, element "book" {}

in 4.12.3

Pull request #2446 created #created-2446

07 Feb at 13:17:51 GMT
2427 Add computed node constructors to XPath

Fix #2427

Issue #2445 created #created-2445

06 Feb at 23:12:23 GMT
fn:element-to-map - ignore `xsi:type` and similar attributes

Following discussion in #1948, test element-to-map-017 has been changed so it is treated as if the xsi:type attribute were not present. However, I can't see anything in the current spec to justify this behaviour.

I propose changing the spec to say that all attributes in the xsi namespace should be ignored (except to the extent that when the input is schema-validated, they may have affected the choice of a type annotation, which itself may affect the outcome). This means (a) the attribute itself is not included in the result of the conversion, and (b) if all the attributes of an element are in this namespace, the element is treated as having no attributes.

Issue #2444 created #created-2444

04 Feb at 22:00:05 GMT
XSLT Patterns for matching JNodes

A pattern such as match="order" currently matches both an element named "order" and a JNode whose selector is "order".

This makes it much more difficult to do type inferencing on the body of the template rule, and it greatly complicates streamability analysis. While usability rightly takes priority over implementation concerns, I don't think this design has usability benefits either. I think in practice users will know whether their template rules are intended to match XNodes or JNodes, and using the same pattern syntax for both is more confusing than helpful. We already offer the syntax `match="jnode(order)" to match JNodes, and I think that is clearer.

Note that the semantics of match="order" already depart from the semantics of the equivalent XPath expression child::order because the pattern will match a parentless element. So we only have to adapt the current magic rule:

If PS uses the child axis (explicitly or implicitly), and if the NodeTest in PS is not document-node() (optionally with arguments), then the axis in step PS is replaced by child-or-top, which is defined as follows. If the context node is a parentless element, comment, processing instruction, or text node then the child-or-top axis selects the context node; otherwise it selects the children of the context node. It is a forwards axis whose principal node kind is element.

so that the child-or-top axis only selects XNodes.

Issue #2443 created #created-2443

04 Feb at 13:51:31 GMT
Naming: JNodes, selectors and contents

I am playing around with the new JNode functions, and my impression is that the resulting code does not look very catchy. It seems way too technical to me. For creating a map with element names and contents, we can write:

map:build($xnodes, name#1, data#1)

Similar code for JNodes would be:

map:build($jnodes, jnode-selector#1, jnode-content#1)

I wonder whether we really need to introduce so many completely new terms for rather straightforward concepts, instead of borrowing existing terminology. What about renaming the terms “Selector” to “JKey” and “content” to “JValue”? This would pretty much resemble the map terminology (even if we also use it for arrays), and it might help to understand that these terms are specific to JNodes.

map:build($jnodes, jkey#1, jvalue#1)

Issue #2177 closed #closed-2177

04 Feb at 10:29:29 GMT

F+O: improve cross-referencing between functions

Issue #2404 closed #closed-2404

04 Feb at 10:29:28 GMT

2403 Enhancements to fos.xsd

Issue #2442 closed #closed-2442

04 Feb at 10:29:26 GMT

MK: 2403 enhancements to fos xsd

Pull request #2442 created #created-2442

04 Feb at 10:29:02 GMT
MK: 2403 enhancements to fos xsd

This PR is #2404 with the addition of the compiled fos.scm.

Close #2404 Close #2403 Close #2177

The CG agreed to merge this PR at meeting 151

Pull request #2441 created #created-2441

03 Feb at 21:14:10 GMT
2434 Fix inconsistencies with GNode tests in axis steps

Fix #2434 Fix #2437

  1. The axis step child::gnode() should be allowed.
  2. The axis step self::array(*) is used in an example but is invalid syntax and makes no sense
  3. Functions with an argument of type gnode() that accept the context item as a default should allow the context item to be any gnode, not just an XNode.

Issue #2422 closed #closed-2422

03 Feb at 18:57:33 GMT

XSLT: drop 3.11 Embedded Stylesheet Modules

Pull request #2440 created #created-2440

03 Feb at 18:54:57 GMT
2432 Clarify effect of coercion on constructor functions

Fix #2432

Pull request #2439 created #created-2439

03 Feb at 18:37:50 GMT
Fix prefix on bin:int-octets example function

Fix #2435

QT4 CG meeting 151 draft minutes #minutes-02-03

03 Feb at 17:45:00 GMT

Draft minutes published.

Issue #2407 closed #closed-2407

03 Feb at 17:36:27 GMT

`fn:type-of`: function vs fn

Issue #2409 closed #closed-2409

03 Feb at 17:36:26 GMT

2407 Change function to fn in type-of output

Issue #2398 closed #closed-2398

03 Feb at 17:34:24 GMT

fn:highest documentation in F&O spec not up to date?

Issue #2410 closed #closed-2410

03 Feb at 17:34:23 GMT

2398 Fix fn:highest to match fn:lowest

Issue #2406 closed #closed-2406

03 Feb at 17:32:17 GMT

Rounding dates/times and durations

Issue #2416 closed #closed-2416

03 Feb at 17:32:16 GMT

2406 Add fn:parts-of-dateTime and fn:build-dateTime functions

Issue #2365 closed #closed-2365

03 Feb at 17:30:14 GMT

Record types: extensible and non-extensible pairs

Issue #1484 closed #closed-1484

03 Feb at 17:30:13 GMT

Functions that expect a record type should make it extensible

Issue #2413 closed #closed-2413

03 Feb at 17:30:12 GMT

2365 Drop extensible record types

Issue #2428 closed #closed-2428

03 Feb at 17:28:08 GMT

2422 Drop XSLT section on embedded stylesheet modules

Issue #2421 closed #closed-2421

03 Feb at 17:26:06 GMT

XSLT edge case incompatibility with simplified stylesheet

Issue #2423 closed #closed-2423

03 Feb at 17:26:05 GMT

2421 document XSLT incompatibility with simplified stylesheets

Issue #2292 closed #closed-2292

03 Feb at 17:23:48 GMT

The XSLT document() function

Issue #2419 closed #closed-2419

03 Feb at 17:23:47 GMT

2292 XSLT document() function: options parameter

Issue #2397 closed #closed-2397

03 Feb at 17:12:27 GMT

Additions for "Functions Defined in XSLT" section in F&O spec

Issue #2411 closed #closed-2411

03 Feb at 17:12:26 GMT

2397 add to F&O list of functions defined in XSLT

Issue #2396 closed #closed-2396

03 Feb at 17:09:58 GMT

Missing "New in 4.0" labels for functions in F&O Spec

Issue #2395 closed #closed-2395

03 Feb at 17:09:58 GMT

The new fn:regex-groups function is not labelled "New in 4.0"

Issue #2412 closed #closed-2412

03 Feb at 17:09:56 GMT

2395 2396 Add missing "new in 4.0" entries

Issue #2438 closed #closed-2438

03 Feb at 17:09:16 GMT

Michaelhkay 2403 enhancements to fos xsd

Pull request #2438 created #created-2438

03 Feb at 17:08:48 GMT
Michaelhkay 2403 enhancements to fos xsd

This is MK's PR with the compiled form of the schema added.

The CG agreed to merge this PR at meeting 151.

Close #2404 Close #2403 Close #2177

Issue #2426 closed #closed-2426

03 Feb at 17:07:47 GMT

2408 editorial omnibus

Issue #2429 closed #closed-2429

03 Feb at 17:04:31 GMT

Feature/2026 01 28 draft review

Issue #1962 closed #closed-1962

03 Feb at 17:02:49 GMT

fn:map-to-element

Issue #2053 closed #closed-2053

03 Feb at 17:02:32 GMT

Add fn:collection-available

Issue #2430 closed #closed-2430

03 Feb at 17:00:27 GMT

Updates to schema for xslt

Issue #2437 created #created-2437

03 Feb at 13:15:31 GMT
SimpleNodeTest: TypeTest → RegularItemType?

With the current grammar…

AxisStep         ::=  (AbbreviatedStep | FullStep) Predicate*
AbbreviatedStep  ::=  ".." | ("@" NodeTest) | SimpleNodeTest
FullStep         ::=  Axis NodeTest
NodeTest         ::=  UnionNodeTest | SimpleNodeTest
SimpleNodeTest   ::=  TypeTest | Selector
TypeTest         ::=  NodeKindTest | JNodeType

…type tests in axis steps are limited to node() and its subtypes as well as jnode(). The test cases include tests for additional types like gnode() or array(*).

Maybe TypeTest should be replaced by the RegularItemType:

RegularItemType  ::=  AnyItemTest | NodeKindTest | GNodeType | JNodeType | MapType | ArrayType | RecordType | EnumerationType

Issue #2436 created #created-2436

03 Feb at 12:04:25 GMT
`jnode` type: arguments (spec vs. tests)

The current spec defines the following grammar for the jnode type:

JNodeType  ::=  "jnode" "(" (("*" | NCName | Constant) ("," ("*" | SequenceType))?)? ")"
Constant   ::=   StringLiteral | ("-"? NumericLiteral) | QNameLiteral | ("true" "(" ")") | ("false" "(" ")")

As far as I can judge, none of the current test cases seems to use this syntax. Instead, the tests I found expect a single sequence type argument, for example fn-jtree-006:

<test-case name="fn-jtree-006">
  <description> JNode applied to an array - type of result</description>
  <created by="Michael Kay" on="2025-06-16"/>
  <test>fn:jtree([1,2,3]) instance of jnode(array(xs:integer))</test>
  <result>
    <assert-true/>
  </result>
</test-case>

Is it the test suite or the spec that needs to be updated?

If this is still subject to discussion, my preference would be to disallow constants in the jnode syntax:

  • It is not clear to me how fn:jtree({ 'a': 1, 'b': 2 }) instance of jnode(a) can be interpreted.
  • Instance checks for linear hierarchies are generelly more intuitive.
  • With the presence of get(), the jnode constants should be redundant.
  • We can only use atomic items in the node tests for which literals exists.

Issue #2435 created #created-2435

03 Feb at 11:14:19 GMT
Incorrect namespace prefixes in EXPath Binary example

In EXPath Binary Module 4.0 some example functions have been redefined in a different namespace prefix (which was bin:) to avoid suggesting they were part of the supported library.

However in 2.2 Example – reading and writing variable length ASN.1 integers, the definitions of asn:int-octets() and asn:encode-ASN-integer() still contains references to the original prefix definition bin:int-octets() which should be asn:int-octets()

Issue #2434 created #created-2434

03 Feb at 09:45:37 GMT
`fn:has-children`: buggy examples?

I believe that the recently added examples for fn:has-children need to be fixed (or it’s my brain that needs to be updated):

[1,2,3] => has-children()
[] => has-children()

The function signature expects gnode()? as input type, so I would expect both queries to return an error unless the arguments are not explicitly wrapped into a JNode.

Issue #2433 created #created-2433

03 Feb at 06:38:29 GMT
`fn:jtree`: Identity

The rules for fn:jtree say:

If two maps or arrays M1 and M2 have the same function identity, as determined by the function-identity function, then jtree(M1) is jtree(M2) MUST return true: that is, the same JNode must be delivered for both.

Note: It is to some extent implementation-defined whether two maps or arrays have the same function identity. Processors SHOULD ensure as a minimum that when a variable $m is bound to a map or array, calling jtree($m) more than once (with the same variable reference) will deliver the same JNode each time.

Shouldn’t SHOULD be MUST? The argument will always have the same function identity if jtree($m) is called more than once. Maybe the note can also be dropped, as fn:jtree is not about the identity of maps and arrays, but the identity of JNodes.

QT4 CG meeting 151 draft agenda #agenda-02-03

02 Feb at 12:50:00 GMT

Draft agenda published.

Issue #2431 closed #closed-2431

02 Feb at 11:20:20 GMT

Patch grammar explorer

Issue #2432 created #created-2432

02 Feb at 10:50:35 GMT
Constructor Functions: conversions

The specification says in [22.1 Constructor functions for XML Schema built-in atomic types](https://qt4cg.org/specifications/xpath-functions-40/Overview.html#constructor-functions-for-xsd-types)…

If the value passed to a constructor is not in the lexical space of the datatype to be constructed, and cannot be converted to a value in the value space of the datatype under the rules in this specification, then an dynamic error is raised [err:FORG0001].

…but it is not clear which rules in the specification are meant.

Specifically, I think we should clarify whether query like the following one are supposed to return an error or a duration:

xs:anyURI('P2000Y') => xs:yearMonthDuration()

Pull request #2431 created #created-2431

02 Feb at 10:09:11 GMT
Patch grammar explorer

On all pages: Improve display of Headlines with slightly smaller font-size.

Screenshot 2026-02-02 at 11 07 32

On rule detail pages: The Name of the Grammar is now visible in the back button of the ribbon instead of the H1 which only displays the rule name.

Screenshot 2026-02-02 at 11 08 41

On the right hand side:

  • Fix complement character class display.
  • Sequence and choice items now are indented if they do not fit on one line
  • occurrence indicators always stick to the item they belong to
  • use spans to group items and mark any character that is displayed
  • dashes in character ranges and the pipes in choices are now also recognized as part of EBNF syntax
  • literals are never wrapped into the next line
Screenshot 2026-02-02 at 11 00 45

Pull request #2430 created #created-2430

31 Jan at 22:19:21 GMT
Updates to schema for xslt

Updates the schema for XSLT 4.0:

  • Adds canonical to xsl:output and xsl:result-document
  • Adds xsl:package-location to content model of xsl:use-package.

Pull request #2429 created #created-2429

31 Jan at 16:12:30 GMT
Feature/2026 01 28 draft review

I'm suggesting two minor changes following my reading of the most recent draft specs.

Pull request #2428 created #created-2428

30 Jan at 18:29:42 GMT
2422 Drop XSLT section on embedded stylesheet modules

This doesn't actually abolish the feature, it just de-emphasises it. AFAIK, no-one actually uses it.

Issue #2427 created #created-2427

30 Jan at 15:38:24 GMT
Node construction in XPath

There was pushback on issue #573 which proposed a set of functions for constructing nodes, on the grounds that for XQuery users, this was unnecessary duplication.

A possible alternative is to add a subset of the XQuery syntax for node construction to XPath: specifically, computed node constructors, which are relatively free of hassles such as dependence on the namespace context, boundary space rules, etc.

Specifically we could add computed constructors:

ComputedConstructor::=CompDocConstructor
|  CompElemConstructor
|  CompAttrConstructor
|  CompNamespaceConstructor
|  CompTextConstructor
|  CompCommentConstructor
|  CompPIConstructorCompDocConstructor::="document" EnclosedExpr
 CompElemConstructor::="element"  CompNodeName EnclosedContentExpr
 CompAttrConstructor::="attribute"  CompNodeName EnclosedExpr
 CompNamespaceConstructor::="namespace"  CompNodeNCName EnclosedExpr
 CompTextConstructor::="text"  EnclosedExpr
 CompCommentConstructor::="comment"  EnclosedExpr
 CompPIConstructor::="processing-instruction"  CompNodeNCName EnclosedExpr

with the restriction that CompNodeName / CompNodeNCName are either expressions in curly braces, or use the new XQuery 4.0 form with a leading "#".

It is of course trivial to define a function library on top of this if someone wants the extra flexibility:

let $new-element := fn($name, $content) { element {$name} {$content} }

etc

(Incidentally, EnclosedContentExpr serves no useful purpose as it's identical to EnclosedExpr.)

We could common up the rules for "constructing simple content" and "constructing complex content" at the same time, putting them in XPath where both XQuery and XSLT can refer to them. I believe they are identical except for (a) error codes, (b) with duplicate attribute names, XQuery throws an error while XSLT takes the last.

Pull request #2426 created #created-2426

30 Jan at 13:46:11 GMT
2408 editorial omnibus

Fixes nearly everything in #2408

Issue #2424 closed #closed-2424

30 Jan at 09:46:03 GMT

More Explorer tweaks

Issue #2425 created #created-2425

29 Jan at 21:58:24 GMT
Permanent diffs for PRs

Since we link to pull requests in the spec and in test cases, I wonder whether it would be possible to publish a permanent diff showing the effect of each PR?

Essentially, the idea would be to take the HTML diff as we currently publish it, and reduce it to those sections of the specs that actually contain changes.

I would find this very useful, for example, when a PR has been accepted but is still marked with "Tests needed" - it's not easy at present to see retrospectively what tests might be required. It would also be useful, of course, when implementing the PR. But I think that all readers of the specs might find this beneficial.

Pull request #2424 created #created-2424

29 Jan at 21:48:11 GMT
More Explorer tweaks

Building on the awesome work from @ndw I just tweaked the grammar explorer layout a little more

  • layout works better on bigger and smaller screens
  • consistent navigation between all screens with back button alwasy on the navigation at the top
  • consistent sizes, paddings, colors set by CSS variables
  • output as html5 which fixes small issues with whitespace in inline elements
  • additional, minor layout improvmements
Screenshot 2026-01-29 at 22 25 40 Screenshot 2026-01-29 at 22 25 58 Screenshot 2026-01-29 at 22 26 11

Before

Screenshot 2026-01-29 at 22 41 33

After

Screenshot 2026-01-29 at 22 41 20

Pull request #2423 created #created-2423

29 Jan at 21:28:25 GMT
2421 document XSLT incompatibility with simplified stylesheets

Fix #2421

Issue #2422 created #created-2422

29 Jan at 21:11:24 GMT
XSLT: drop 3.11 Embedded Stylesheet Modules

XSLT Section 3.11 describes embedded stylesheet modules - a stylesheet rooted at an element node which is not the outermost element of a document. There are no real conformance requirements associated with this feature and it isn't widely used. I proposed we drop the section, while retaining the statement in §3.5 that a stylesheet module can be "all or part" of an XML document.

Issue #2421 created #created-2421

29 Jan at 15:43:23 GMT
XSLT edge case incompatibility with simplified stylesheet

Simplified stylesheets have changed so that the implicit template rule now does match="." rather than match="/".

This creates a theoretical incompatibility when

(a) the stylesheet is invoked supplying a node other than a document node as the input. It will now execute the (only) template rule, previously it would execute the built-in template for the node kind

(b) the simplified stylesheet module is included/imported into another stylesheet. This is a highly unlikely scenario, but it is tested by test case include-0601.

The incompatibility should be documented.

Issue #2420 closed #closed-2420

29 Jan at 13:01:11 GMT

Explorer tweaks

Pull request #2420 created #created-2420

29 Jan at 13:01:04 GMT
Explorer tweaks

h/t @line-o

Plus a few other tweaks.

Pull request #2419 created #created-2419

28 Jan at 16:21:44 GMT
2292 XSLT document() function: options parameter

Fix #2292

Issue #2414 closed #closed-2414

28 Jan at 15:26:39 GMT

Diff markup issues

Pull request #2418 created #created-2418

28 Jan at 15:20:07 GMT
2399b Add rules and advice for JSON output of special numerics

Fix #2399

Issue #2417 closed #closed-2417

28 Jan at 15:18:13 GMT

2399 Add rules/advice for JSON output of special xs:double values

Pull request #2417 created #created-2417

28 Jan at 15:15:30 GMT
2399 Add rules/advice for JSON output of special xs:double values

Fix #2399

Pull request #2416 created #created-2416

28 Jan at 12:43:47 GMT
2406 Add fn:parts-of-dateTime and fn:build-dateTime functions

Fix #2406

Issue #2415 closed #closed-2415

28 Jan at 12:26:40 GMT

Publish the grammar explorer pages

Pull request #2415 created #created-2415

28 Jan at 12:26:11 GMT

Publish the grammar explorer pages

Issue #2414 created #created-2414

28 Jan at 09:36:48 GMT
Diff markup issues

There seem to be two consistent errors in the diff markup that appears in PRs on the dashboard:

  1. When an inline <code> element is modified, the diff version shows the old code as deleted (red background), but does not show the new code.

For example:

Image
  1. When a grammar entry is modified, the text gets duplicated.

For example:

Image

Pull request #2413 created #created-2413

28 Jan at 08:44:02 GMT
2365 Drop extensible record types

This PR drops the concept of extensible record types, replacing it with a rule that coercion to a record type drops any map entries that are not defined by the record type. In effect this means that a record type used when declaring a function parameter is implicitly extensible.

The benefits of the proposal are:

  • It simplifies the spec, especially rules on type subsumption and on generation of implicit constructor functions
  • It avoids the need to declare pairs of record types, one extensible and one not.
  • It avoids all the awkward decisions about whether record types used in core functions should be extensible or not.

The rules for type patterns in XSLT are changed to invoke coercion.

Fix #1484 Fix #2365

Pull request #2412 created #created-2412

28 Jan at 00:11:10 GMT
2395 2396 Add missing "new in 4.0" entries

Fix #2395 Fix #2396

Pull request #2411 created #created-2411

27 Jan at 23:45:16 GMT
2397 add to F&O list of functions defined in XSLT

Fix #2397

Pull request #2410 created #created-2410

27 Jan at 23:16:36 GMT
2398 Fix fn:highest to match fn:lowest

Fix #2398

Pull request #2409 created #created-2409

27 Jan at 21:24:02 GMT
2407 Change function to fn in type-of output

Fix #2407

Issue #2195 closed #closed-2195

27 Jan at 17:23:33 GMT

Editorial notes (incremental)

Issue #2408 created #created-2408

27 Jan at 17:22:26 GMT
Editorial notes (incremental)

This issue summarizes the unresolved comments from #2195:

  • [x] Oxford/serial comma should be used consistently at numerous places. Candidates:
    • sine, cosine and tangent
    • durations, dates and times
    • year, month, day, hour, minute, second and timezone
    • The month, day, hour and minute components
    • the tokens w, W and Ww
    • rules for overflow, underflow and approximation
  • [ ] The comma may need to be removed at other places:
    • This function is context-independent, and focus-independent.
    • A dynamic error is raised [err:FODC0002] if a relative URI reference is supplied, and the base-URI property in the static context is absent.
  • [x] fn:distinct-values: $coll$collation
  • [x] “If … is an empty sequence” vs. “If … is the empty sequence” (which one do we prefer?)
    • MHK: I can't say I have a strong preference, but pedantically, the is probably more accurate.
  • [x] Section 2.1.3 Values of the XPath 4.0 spec includes a change note highlighting that the terms XNode and JNode have been introduced but the section text only mentions XNode; there's no reference to JNode in that section other than in the change note itself.
  • [ ] In the XPath 4.0 spec, the publoc URL appears to be incorrect (/spec/header/publoc/loc)
  • [x] A number of examples in the function catalog should be annotated with spec="XQuery" so that the corresponding test cases are marked as inapplicable to XPath. Specifically:
  fo-test-fn-count-001
  fo-test-fn-deep-equal-005
  fo-test-fn-every-010
  fo-test-fn-function-annotations-002
  fo-test-fn-function-annotations-003
  fo-test-fn-hash-009
  fo-test-fn-hash-010
  fo-test-fn-serialize-004
  fo-test-fn-sort-with-005
  • [x] The serialization spec, in the section on the serialization parameter document, makes it rather hard to discover that the namespace prefix output is bound to the URI http://www.w3.org/2010/xslt-xquery-serialization
  • [x] F+O The definition of fn:compare refers to a function fn:months-from-dateTime, this should be fn:month-from-dateTime
  • [x] F+O The description of unparsed-text refers to the available text resources component of the dynamic context which has been dropped.
  • [x] The serialization spec for Adaptive serialization of function items gives an example **fn:exists#1** is serialized as **function fn:exists#1** but the word function does not actually appear in the result.
  • [x] fn:parse-html needs to define an error code for use when $encoding is an unknown or invalid encoding.
  • [ ] Consistent rendition for term definitions. Most of the specs output definitions as [Definition: here is the definition]. F+O uses [Definition] here is the definition. XSLT puts the keyword "Definition" in small caps.
  • [x] The function catalog contains entries for functions such as op:gMonthDay-equal that are no longer referenced.
  • [x] Serialization, HTML5, Processing Instructions: Dashes in the name must be escaped as well. Example: <?a---b c---d?> should be serialized as <!--?a- - -b c- - -d?-->..
  • [x] UTF-16leUTF-16LE, UTF-16beUTF-16BE (caused by #2239)
  • [ ] fn:unparsed-text: Reference bin:infer-encoding, drop redundant rules
  • [x] op:divide-dayTimeDuration-by-dayTimeDuration: An example could be simplified by using seconds(1)

Issue #2407 created #created-2407

27 Jan at 17:19:14 GMT
`fn:type-of`: function vs fn

As we use the fn alias in (almost) all XQFO signatures, we could also return it by the fn:type-of function.

Issue #2355 closed #closed-2355

27 Jan at 17:06:34 GMT

bin:infer-encoding error conditions

Issue #2362 closed #closed-2362

27 Jan at 17:06:32 GMT

2355 bin:infer-encoding: further alignments

QT4 CG meeting 150 draft minutes #minutes-01-27

27 Jan at 17:00:00 GMT

Draft minutes published.

Issue #2361 closed #closed-2361

27 Jan at 16:57:25 GMT

Encoding parameters: upper/lower case, normalization

Issue #2394 closed #closed-2394

27 Jan at 16:57:24 GMT

2361 Use upper case for encoding names; comparisons are case-blind

Issue #2349 closed #closed-2349

27 Jan at 16:55:13 GMT

Revert `array:join`

Issue #2363 closed #closed-2363

27 Jan at 16:55:12 GMT

2349 Revert array:join

Issue #2378 closed #closed-2378

27 Jan at 16:53:09 GMT

HTML indenting

Issue #2391 closed #closed-2391

27 Jan at 16:53:08 GMT

2378 HTML indenting: clarify the definition of inline elements

Issue #1944 closed #closed-1944

27 Jan at 16:52:50 GMT

Try/Catch/Finally - order of evaluation

Issue #2127 closed #closed-2127

27 Jan at 16:52:40 GMT

JNodes: Include atomic items

Issue #2159 closed #closed-2159

27 Jan at 16:52:36 GMT

JNodes: Learning from JSONiq?

Issue #2351 closed #closed-2351

27 Jan at 16:52:31 GMT

Current Drafts: What will we keep, what may be dropped?

Issue #2354 closed #closed-2354

27 Jan at 16:52:27 GMT

`fn:append`

Issue #2360 closed #closed-2360

27 Jan at 16:52:17 GMT

fn:root() vs. absolute path expressions

Issue #2384 closed #closed-2384

27 Jan at 16:50:46 GMT

`fn:xsd-validator` - attribute nodes

Issue #2392 closed #closed-2392

27 Jan at 16:50:44 GMT

2384 Clarify that fn:xsd-validator can validate attributes

Issue #2406 created #created-2406

27 Jan at 12:28:26 GMT
Rounding dates/times and durations

The precision returned for dates, times, and durations is implementation-defined, and it has changed between Saxon releases. This leads one of our users to point out that there is no easy way to request a reduced precision (e.g. milliseconds) in order to ensure interoperability. The simplest approach we can offer seems to be current-dateTime() => format-dateTime("....") => xs:dateTime() which is pretty cumbersome and inefficient.

Rather than providing specific functions for rounding dates, times, and durations, the most versatile solution to this might be to provide functions that reduce a dateTime or duration to a record containing the numeric values of the components, allowing these to be manipulated as numbers, with a further function to reconstruct the dateTime or duration from the record: rather like the parse-uri()/build-uri() pair. Something like:

parts(current-dateTime()) ! map:put(., 'seconds', round(?seconds, 3)) ! build-dateTime()

Issue #2405 closed #closed-2405

27 Jan at 10:15:44 GMT

The published XML for XPath and XQuery is incorrect

Pull request #2405 created #created-2405

27 Jan at 10:15:36 GMT
The published XML for XPath and XQuery is incorrect

It’s not the specification XML, it’s the pre-fixed-up HTML as XML. h/t to @martian-a for noticing first!

Pull request #2404 created #created-2404

27 Jan at 00:08:07 GMT
2403 Enhancements to fos.xsd

Schema enhancements to the function catalog for

Issue #2403 - allow non-testable results for examples to be labelled narrative="true" Issue #2177 - allow a fos:see-also element to make links to related functions

Note this PR is purely an enabler, it does not include changes to the function catalog to exploit this features, nor stylesheet enhancements to render them.

QT4 CG meeting 150 draft agenda #agenda-01-27

26 Jan at 13:40:00 GMT

Draft agenda published.

Issue #2402 closed #closed-2402

23 Jan at 12:16:22 GMT

This PR should fail to build

Issue #2403 created #created-2403

23 Jan at 12:00:34 GMT
Testable examples in the file spec

In PR #2401 Norm introduced a temporary fix needed because the function catalog in the EXPath file spec doesn't conform to the fos.xsd schema.

I think we can fix this without a schema or stylesheet change by using existing mechanisms illustrated by this example from fn:collation:

             <fos:test>
               <fos:expression>collation({ 'lang': 'de', 'strength': 'primary' })</fos:expression>
               <fos:result>"http://www.w3.org/2013/collation/UCA?lang=de;strength=primary"</fos:result>
               <fos:test-assertion>
                  <result xmlns="http://www.w3.org/2010/09/qt-fots-catalog">
                     <any-of>
                        <assert-string-value>http://www.w3.org/2013/collation/UCA?lang=de;strength=primary</assert-string-value>
                        <assert-string-value>http://www.w3.org/2013/collation/UCA?strength=primary;lang=de;</assert-string-value>
                     </any-of>
                  </result>
               </fos:test-assertion>
               <fos:postamble>The order of query parameters may vary.</fos:postamble>
            </fos:test>

The fos:result (or fos:error-result) element must always be present, and will always be rendered in the spec as the expected result. If the example will not always deliver this result, then <fos:test-assertion> can appear to give the result as it will appear in the generated test case.

But I suggest we add another attribute <fos:result narrative="true"/> to indicate that the result is given as explanatory prose, not as a testable XPath expression. This would allow another fn:collation example

         <fos:example>
            <p>The expression <code>collation({ 'lang': default-language() })</code>
               returns a collation suitable for the default language in the
               dynamic context.</p>
         </fos:example>

to be rewritten as

         <fos:example>
            <fos:test>
                <fos:expression>collation({ 'lang': default-language() })</fos:expression>
                <fos:result narrative="true">A collation suitable for the default language in the
               dynamic context.</fos:result>
                <fos:test-assertion>
                   <result xmlns="http://www.w3.org/2010/09/qt-fots-catalog"><assert>true()</assert></result>
                </fos:test-assertion>
           </fos:test>
         </fos:example>

which will (a) make it easier to fit the results into the tabular presentation of examples, and (b) cause a test case to be generated which will ensure that the example is syntactically valid.

(Or we could leave out <fos:test-assertion> in this example. If the supplied fos:result has narrative="true" and there is no test assertion, the generated test case can assume <assert>true()</assert>)

Pull request #2402 created #created-2402

23 Jan at 11:41:22 GMT
This PR should fail to build

We're never going to merge this, it's just a CI test.

Issue #2379 closed #closed-2379

23 Jan at 11:39:50 GMT

Use exported schema to validate function catalogs

Issue #1948 closed #closed-1948

22 Jan at 16:37:05 GMT

fn:element-to-map: Tests

Issue #2401 closed #closed-2401

22 Jan at 11:57:06 GMT

Stopgap fix to get the status quo drafts built

Pull request #2401 created #created-2401

22 Jan at 11:56:58 GMT
Stopgap fix to get the status quo drafts built

The content model of fos:test requires an fos:result or fos:error-result. For the EXPath File module, we have to work out what those should be or change the markup or change the schema.

In the short term, I’ve made bogus fos:results of FIXME:

Issue #2400 closed #closed-2400

22 Jan at 10:55:22 GMT

Irrelevant whitespace change to nudge CI

Pull request #2400 created #created-2400

22 Jan at 10:55:13 GMT

Irrelevant whitespace change to nudge CI

Issue #2399 created #created-2399

22 Jan at 10:45:49 GMT
Canonical JSON Serialization: edge cases

RFC 8785 says:

Note: Since Not a Number (NaN) and Infinity are not permitted in JSON, occurrences of NaN or Infinity MUST cause a compliant JCS implementation to terminate with an appropriate error.

We have just decided to treat these cases more liberally, but I think we should continue to raise an error if canonical serialization is requested.

In our serialization spec, we also say:

Implementations may serialize an xs:double value using any lexical representation of a JSON number defined in [RFC 7159], but it is recommended to use the same representation as when the canonical parameter is true.

We may need to exclude the edge cases from this recommendation.

If we keep the recommendation, we may need to fix the test case Serialization-json-11, which expects -0 instead of 0 (what is returned for RFC8785).

Issue #2398 created #created-2398

21 Jan at 18:09:52 GMT
fn:highest documentation in F&O spec not up to date?

The description for the new fn:highest function does not align with the description for the fn:lowest function; contrary to what I'd expect. In particular, the rules section makes no mention of the $key argument; so I suspect this is not up to date?

Issue #2397 created #created-2397

21 Jan at 18:01:30 GMT
Additions for "Functions Defined in XSLT" section in F&O spec

The new XSLT 4.0 functions current-merge-key-array and regex-groups are missing from the "Functions Defined in XSLT" section.

Also I believe the function unparsed-text-available should be added saying "Originally XSLT 2.0; then XPath 3.0 and later".

Issue #2396 created #created-2396

21 Jan at 17:55:18 GMT
Missing "New in 4.0" labels for functions in F&O Spec

Please add changes entries to say "New in 4.0" for the functions: function-identity and jnode-content.

Also I assume the first change entry for function-annotations should actually say "New in 4.0" (rather than be a duplicate of the second change entry).

Issue #2395 created #created-2395

21 Jan at 17:49:22 GMT
The new fn:regex-groups function is not labelled "New in 4.0"

Please add a changes entry in the spec for the new XSLT 4.0 regex-groups function.

Pull request #2394 created #created-2394

21 Jan at 11:54:11 GMT
2361 Use upper case for encoding names; comparisons are case-blind

Standardizes on upper case for encoding names, and mentions that comparisons are case-blind.

Fix #2361

Issue #2393 created #created-2393

21 Jan at 09:08:15 GMT
Keep or drop `array:members` and `array:of-members`?

Adopted from #2351:

We have recently dropped map:pairs and map:of-pairs. With array:members, the members of arrays are returned as single-entry maps, which may confuse users. Thus, for the sake of reducing redundant functionality, do we want to keep array:members and array:of-members, or rather promote the use of for member $m and array:split/array:join instead?

If we keep the functions, we should add a dedicated record type for record(value).

Pull request #2392 created #created-2392

21 Jan at 08:52:57 GMT
2384 Clarify that fn:xsd-validator can validate attributes

Fix #2384

Pull request #2391 created #created-2391

21 Jan at 08:44:32 GMT
2378 HTML indenting: clarify the definition of inline elements

Fix #2378

Issue #2390 created #created-2390

20 Jan at 22:44:42 GMT
methods and inheritance

Don’t panic, i am not suggesting a large change :)

But i would like to suggest a small change to the semantics of method calls, with the goal of third parties being able to build something much larger.

Today, we have,

A method call combines accessing a map M to look up an entry whose value is a function item F, and calling the function item F supplying the map M as the implicit value of the first argument.

I’d like to add,

If there is no such function, but there is in the map a key fn:fallback whose value is a function, then that function is called with the map, the function name, and the arity of the desired function as arguments.

In this way one could write a function that looked for an "isa" entry in the map whose value was a sequence of "class" maps, and find the function.

It’s limited in that there is no possibility of polymorphic functions, but we do not have those elsewhere in the language.

One practical benefit is that you can have a map with all your functions in it, and “instance maps’ then do not need to have, say, 40 entries for all the methods that can be called. In an application in which a map gets updated a million times (I do have one of those), adding 40 extra entries to copy is a significant burden, even though of course it’s encapsulated in a single function.

Issue #2344 closed #closed-2344

20 Jan at 18:54:27 GMT

HTML Serialization: Processing Instructions

Issue #2372 closed #closed-2372

20 Jan at 18:54:26 GMT

2344 Change rendition of PIs in HTML5

QT4 CG meeting 149 draft minutes #minutes-01-20

20 Jan at 17:30:00 GMT

Draft minutes published.

Issue #2359 closed #closed-2359

20 Jan at 17:13:35 GMT

Implicit conversion to JNodes with absolute path expressions

Issue #2373 closed #closed-2373

20 Jan at 17:13:34 GMT

2359 No conversion to JNode in absolute paths

Issue #2337 closed #closed-2337

20 Jan at 17:11:23 GMT

XSLT xsl:mode/@typed attribute

Issue #2376 closed #closed-2376

20 Jan at 17:11:21 GMT

2337 Extend xsl:mode/@typed to handle JNodes etc

Issue #2387 closed #closed-2387

20 Jan at 17:07:23 GMT

641 NaN/Infinity in JSON

Issue #2088 closed #closed-2088

20 Jan at 17:05:15 GMT

File Module: Feedback, Observations

Issue #2364 closed #closed-2364

20 Jan at 17:05:14 GMT

2088 File Module: Feedback, Observations

Issue #2185 closed #closed-2185

20 Jan at 17:03:31 GMT

Request for an `fn:xproc` function

Issue #2383 closed #closed-2383

20 Jan at 17:02:56 GMT

Attempt to resolve action QT4CG-148-01

Issue #2389 created #created-2389

20 Jan at 15:44:14 GMT
Adaptive Serialization: more freedom?

The adaptive serialization method was introduced “for the purposes of debugging query results”. For our processor, it has turned out pretty soon that it does not satisfy the requirements of our users, which is why we have introduced a custom debugging method.

I wonder what others think: Shouldn’t we relax several of the rules and let the implementation decide what to output? We haven’t defined either how the output of fn:trace needs to look like.

Some examples:

  • The output of doubles often causes confusion. If parsed JSON is output, small integers will be output in exponential notation. For example, parse-json('{ "A" : 20 }') needs to be output as { "A": 2.0e1 }.
  • xs:date("2001-01-01") is output as xs:date("2001-01-01"), while xs:token('x') is output as "x".
  • fn() { 1 } is output as (anonymous-function)#0, whereas an implementation could prefer to use the output of fn:function-identity (see #2388), or output the original query string (if available), reproduce a string representation of the function body, etc.

I will be glad to create a PR.

Issue #2388 created #created-2388

20 Jan at 15:28:34 GMT
Adaptive Serialization: function items

The serialization specs defines rules for creating a string representation for function items. Now that we have `fn:function-identity', we should replace the rules and use this string instead.

QT4 CG meeting 149 draft agenda #agenda-01-20

19 Jan at 13:40:00 GMT

Draft agenda published.

Issue #2386 closed #closed-2386

19 Jan at 13:11:48 GMT

Add namespace declaration to environment for generated tests

Pull request #2387 created #created-2387

16 Jan at 01:03:46 GMT
641 NaN/Infinity in JSON

Addresses part of issue #641

In the JSON serialization method, NaN is output as null, and infinity is output as ±1e9999.

The parse-json function adds recommendations on how to achieve round-tripping of these values.

Pull request #2386 created #created-2386

15 Jan at 17:55:32 GMT
Add namespace declaration to environment for generated tests

Changes the stylesheet for generating keyword and function signature tests so that the namespace prefix "output" is explicitly declared in the test environment. This prefix is used in one of the tests and it needs to be declared if the test is to work in XPath.

Issue #2385 created #created-2385

15 Jan at 13:59:58 GMT
The XML version of the XPath spec isn't the XML version of the spec, it's HTML

Probably XQuery too. I have no idea why.

Issue #2384 created #created-2384

15 Jan at 11:09:44 GMT
`fn:xsd-validator` - attribute nodes

The type signature of the xsd-validator function suggests that it can be used to validate attribute nodes (as well as documents and elements), and the prose description concurs with this. However the function summary says "can be invoked to validate a document or element node against this schema."

Test case xsd-validator-092 expects an attribute node to be rejected.

Also the Notes in the specification say

The validation process is explained in more detail in the XQuery ([[XQuery 4.0: An XML Query Language]] section [4.25 Validate Expressions] and XSLT ([[XSL Transformations (XSLT) Version 4.0]] section [25.4 Validation]

but the detailed description has since been moved to F&O 17.2.4.

Note that XSLT has always allowed validation of free-standing attribute nodes, but the validate expression in XQuery allows only document and element nodes.

Pull request #2383 created #created-2383

14 Jan at 11:41:24 GMT
Attempt to resolve action QT4CG-148-01

Per #2315:

  1. Added ‘at-risk’ changes to the fn:insert-separator, array:members, and array:of-members
  2. Added ‘at-risk’ changes to the XPath/XQuery section on map and array filtering
  3. Added a note about what ‘at risk’ means to the status sections

Issue #2382 closed #closed-2382

14 Jan at 11:34:53 GMT

Tool changes for action QT4CG-148-01

Pull request #2382 created #created-2382

14 Jan at 11:34:42 GMT
Tool changes for action QT4CG-148-01

These should have no effect without additional commits, but they have to be merged into main in order to have a visible effect on my subsequent PR.

Issue #573 closed #closed-573

13 Jan at 23:08:46 GMT

Node construction functions

Issue #2124 closed #closed-2124

13 Jan at 23:07:16 GMT

573 Functions to Construct Trees

QT4 CG meeting 148 draft minutes #minutes-01-13

13 Jan at 17:30:00 GMT

Draft minutes published.

Issue #2357 closed #closed-2357

13 Jan at 17:19:48 GMT

element() vs element(*) in function signatures

Issue #2358 closed #closed-2358

13 Jan at 17:19:45 GMT

2357 Standardize on element() rather than element(*)

Issue #2367 closed #closed-2367

13 Jan at 17:17:43 GMT

Documentation for new main-module attribute of xsl:stylesheet

Issue #2366 closed #closed-2366

13 Jan at 17:17:43 GMT

json-lines attribute for xsl:output and xl:result-document in XSLT spec

Issue #2356 closed #closed-2356

13 Jan at 17:17:42 GMT

Clarification on scope of variables in xsl:for-each-group/(@split-when|@merge-when)

Issue #2368 closed #closed-2368

13 Jan at 17:17:40 GMT

2367 Misc XSLT editorial fixes

Issue #2369 closed #closed-2369

13 Jan at 17:16:05 GMT

F+O section 11 is empty

Issue #2371 closed #closed-2371

13 Jan at 17:16:04 GMT

2369 Add content for F&O section 11 (Processing binary values)

Issue #2375 closed #closed-2375

13 Jan at 17:13:58 GMT

2195 Editorial Omnibus

Issue #1591 closed #closed-1591

13 Jan at 17:12:08 GMT

Implausible filter expressions

Issue #1934 closed #closed-1934

13 Jan at 17:12:00 GMT

Supporting RELAX NG validation

Issue #2377 closed #closed-2377

13 Jan at 17:11:36 GMT

2195 F+O Editorial Corrections

Issue #2381 created #created-2381

13 Jan at 13:37:43 GMT
Add facility to serialize binary values as url-safe base64 encoded strings

In XQuery 3.1 there are two XDM types to represent binary values xs:base64binary and xs:hexBinary. The current draft adds binary literals as an additional option.

Thus, it is possible to base64 encode any value by casting a xs:base64binary to a xs:string.

xs:base64Binary("+w==") => xs:string()

There is no standard way to serialize those binary values to the URL safe variant of that encoding described in section 5 of RFC 4648.

The simplest workaround is replacing the unsafe characters of the alphabet (+ and /) and dropping the padding at the end with

xs:base64Binary("+w==") => translate("+/=", "-_")

This of course will only work for relatively small binary values. In order for processors to offer a performant and efficient way I see several options.

  1. adding new type xs:base64BinaryUrlSafe whose string representation uses the adapted alphabet with - and _ and does not add padding at the end
  2. a new function in fn namespace fn:encode-base64-url-safe($data as (xs:string | xs:base64Binary | xs:hexBinary)) as xs:string
  3. a new function in bin namespace bin:encode-base64-url-safe($data as (xs:string | xs:base64Binary | xs:hexBinary)) as xs:string
  4. add an output option that will serialize all binary values to base64 url-safe when cast to strings

Addendum

I am also wondering why binary values cannot be created from numeric literals. Especially now that we have the binary notation for integer literals and the xs:integer type is unbounded this would be a perfectly fine literal notation to create binary values from. At least as suitable as string literals that are currently allowed.

xs:hexBinary(0xfb) and xs:base64Binary(0b11111111)

Issue #2380 created #created-2380

13 Jan at 01:22:29 GMT
Use Case for Generators: News Feeds Aggregation Using Generators

In response to: QT4CG-147-02: NW to chase up DN and LQ about follow-up to the generator discussion


Use Case: News Feeds Aggregation Using Generators

Contents

Use Case: News Feeds Aggregation Using Generators

  • Actors
  • Goals
  • Functional Requirements
  • Constraints / Assumptions / Preconditions
  • Proposed High-Level Solution
  • Known Approaches that are Problematic
  • Benefits of the Generators Approach
  • End-to-End Flow
    • Brief Description of the Core Processes in the Pipeline
    • Notes on the Process Pipeline
  • Why This Fits the Generator Datatype Extremely Well
  • Alternative Flows
    • Alternative Flow-1: A Feed Temporarily Stops Producing New Items
    • Alternative Flow-2: Partial Consumption of the Pipeline
    • Alternative Flow-3: Editor Inserts or Reorders Items 11
  • Exception Flows
    • Exception Flow-1: Feed Unreachable or Network Failure
    • Exception Flow-2: Malformed Feed Data
    • Exception Flow-3: Resource Exhaustion Risk
  • Postconditions
  • References

The Problem

Modern RSS/JSON aggregators must process hundreds of continuously updating feeds without excessive memory usage or latency, while supporting filtering, merging, and prioritization in real time.


Actors

  • End-User
  • Editor
  • Administrator
  • System components (internal processes acting as secondary actors)
  • External services (RSS providers, APIs, social signals)

Goals

  • End-User
    “As a user, I want to get the latest, up-to-the-minute news from many important sources. I want each brief news item to be presented with a link to more detailed information from the original source.”

  • Editor
    “As an editor, I want to be alerted to any change in the aggregated news-stream, as it happens continuously, and to have powerful ways of inserting, reordering, appending, prepending or deleting one or more news-items.”

  • Administrator
    “As an administrator, I want to start, stop, or restart the system, manage the configured feeds, and monitor operational health and error conditions.”


Functional Requirements

  • Consume RSS / Atom / JSON-LD feeds incrementally
  • Filter items by topic or sensitivity
  • Merge multiple feeds chronologically
  • Produce continuously updated summaries

Constraints / Assumptions / Preconditions

Assumptions

  • Feeds may be large or unbounded
  • Items arrive over time

Constraint

  • Memory usage must remain bounded

Preconditions

  • At least one news feed is configured
  • Feeds are RSS or JSON-LD and timestamped
  • Items within a feed are presented in reverse-chronological order
  • Each item contains a content-link or optionally - inline content
  • Items may belong to multiple categories

Proposed High-Level Solution

Each feed is modeled as a generator producing yield values lazily.
The ordered set of values produced by successive, demand-driven calls to move-next() is called the yield of the generator.

A generator’s yield may be finite or infinite, and may be empty for a given generator instance without implying exhaustion of the underlying data source.

Known Approaches That Are Problematic

These approaches require full materialization in memory:

  • Eager sequences (XPath)
  • DOM-style loading
  • Materialized feeds

Benefits of the Generators Approach

  • Bounded memory usage
  • Low latency
  • Composability
  • Deterministic control of evaluation

End-to-End Flow

+-------------------------------+
| 1. Feed Fetching              |
| Input:  external providers    |
| Output: G_rawItems            |
+---------------+---------------+
                |
+---------------v---------------+
| 2. Normalization              |
| Input:  G_rawItems            |
| Output: G_normalizedItems     |
+---------------+---------------+
                |
+---------------v---------------+
| 3. Filtering                  |  <-- unwanted content removed
| Input:  G_normalizedItems     |
| Output: G_filteredItems       |
+---------------+---------------+
                |
+---------------v---------------+
| 4. Topic Classification       |
| Input:  G_filteredItems       |
| Output: G_classifiedItems     |
+---------------+---------------+
                |
+---------------v---------------+
| 5. Clustering                 |
| Input:  G_classifiedItems     |
| Output: G_clusteredItems      |
+---------------+---------------+
                |
+---------------v---------------+
| 6. Ranking                    |
| Input:  G_clusteredItems      |
| Output: G_rankedItems         |
+---------------+---------------+
                |
+---------------v---------------+
| 7. Summary Page Generation    |
| Input:  G_rankedItems         |
| Output: G_summaryPageItems,   |
|         HTML                  |
+---------------+---------------+
                |
+---------------v---------------+
| 8. Detail Page Generation     |
| Input:  G_summaryPageItems    |
| Output: HTML Detail Pages     |
+-------------------------------+

Remarks

  1. The participating generator instances are named using the convention G_{name}.
  2. Every stage except the final one produces a new generator.
  3. Every stage except the very first uses a generator as its input.
  4. Arrow semantics: the output generator of one stage is the input for the next stage.

Brief Description of the Core Processes in the Pipeline

Process 1 — Feed Fetching & Acquisition

Goal:
Continuously pull RSS / Atom / JSON-LD feeds from CNN, Fox, NBC, BBC, etc.

Includes:

  • Periodic polling (e.g., every 5 minutes)
  • Detection of new items (GUID, URL hash, published timestamps)
  • N-way merging to ensure the resulting yield is sorted in reverse-chronological order
  • Basic sanity validation (e.g., XML schema validity)

Output:
A generator whose yield values are raw feed items (XML / JSON documents) → input to Process 2.


Process 2 — Parsing & Normalization

Goal:
Convert heterogeneous raw feed items into a uniform internal format.

Normalized fields include:

  • Title
  • Description / Summary
  • Full text (if available)
  • URL
  • Publication time (converted to UTC)
  • Source
  • Images, categories, tags
  • Named entities (optional NLP-based enrichment)

Output:
A generator yielding clean, normalized NewsItem documents → input to Process 3.


Process 3 — Content Filtering & Exclusion Rules

Goal:
Remove unwanted items early using configurable rule sets.

Examples:

  • Blocked topics: politics, celebrity gossip, violence, etc.
  • Blocked entities: Donald Trump, Joe Biden, Kanye West, etc.
  • Blocked publishers (optional)
  • Expiration rules:
    • Tech news stale after 48 hours
    • Breaking news stale after 6 hours

Techniques:

  • Keyword filtering
  • Named Entity Recognition (NER)
  • Sensitive-topic classifiers (ML-based)
  • Freshness scoring

Output:
A generator yielding allowed, filtered NewsItem documents → input to Process 4.
Rejected items are stored separately for auditing.


Process 4 — Topic Classification

Goal:
Assign each item to one or more topics.

Example topics:

  • Politics
  • World
  • Tech
  • Health
  • Sports
  • Business
  • Disasters / Urgent events
  • Crime / Safety
  • Entertainment

Approaches:

  • Fine-tuned BERT classifier (preferred)
  • TF-IDF + SVM (simpler)
  • Feed-provided category tags (fallback)

Output:
A generator yielding categorized NewsItem documents → input to Process 5.


Process 5 — Similarity Analysis & Clustering

Goal:
Group news items from different sources describing the same event.

Techniques:

  • Semantic vector embeddings (e.g., SBERT, Ada embeddings)
  • Cosine similarity
  • Hierarchical clustering or DBSCAN

Produces:

  • Clusters of highly similar articles
  • A primary (best) representative per cluster

Output:
A generator yielding clusters of related articles → input to Process 6.

Note:
To better match streaming behavior, clustering may operate within bounded windows (e.g., sliding windows) while still consuming the input generator.


Process 6 — Ranking, Urgency, and Freshness Scoring

Goal:
Prioritize which news appears on the Summary Page.

Computed scores:

  • Freshness score (more recent → higher)
  • Urgency score (disasters, crises, violence)
  • Coverage score (number of sources reporting)
  • Engagement score (optional: social signals)

Weighted formula:

FinalScore = a*Urgency + b*Freshness + c*Coverage + d*EditorRules

Items with the highest scores per topic are selected.

This stage does not require a full total ordering; instead a partial ordering (e.g., top-K per topic) preserves bounded memory.

Editor-driven operations (insert, remove, reorder) are modeled as generator transformations applied downstream of ranking.

Output:
A generator yielding ranked clusters → input to Process 7.


Process 7 — Summary Page Generation

This stage consumes the input generator and produces finite views intended for presentation.

Goal:
Build a continuously updated Summary Page (“Front Page”) containing:

  • Top events per topic
  • Short summaries
  • Links to primary articles
  • “Read similar news” (cluster siblings)
  • Source icons
  • Timestamp of most recent update

The page auto-refreshes and always reflects the newest items.


Process 8 — Detailed Pages & Cross-Links

This stage consumes its input generator and produces finite presentation views.

For each cluster:

  • Canonical article (primary representative)
  • Related articles across sources
  • Timeline of developments
  • Additional metadata (images, entities, tags)

Cross-links include:

  • “More like this…”
  • “Earlier developments…”
  • “Follow-up stories…”

Notes on the Process Pipeline

  • Feed Fetching typically wraps one or more data providers
    → produces G_rawItems lazily (RSS, JSON APIs, DB cursors, web services)
  • Every stage is expressible as:
    • for-each, filter, append, prepend, insert-at, remove-where, concat, or fold, etc., producing a new generator derived from the previous one
  • No stage requires full materialization unless explicitly demanded
    (e.g., to-array, bounded sort, pagination)
  • Infinite generators are valid until stage 6; stages 7–8 typically consume finite prefixes (take(n))

Why This Fits the Generator Datatype Extremely Well

  • The pipeline is a composition of generator transformers
  • Each box maps almost 1-to-1 to generator operations
  • External data providers integrate naturally at Stage 1
  • Sorting can be introduced in different ways:
    • External merge-sort over generators
    • Bounded-window ranking
    • Top-K lazy ranking – e.g. using heaps.

Alternative Flows

Alternative Flow 1 — Feed Temporarily Stops Producing New Items

Condition:
A feed is reachable but has no new items since the last polling cycle.

Flow:

  1. The feed generator advances (move-next()).
  2. The data provider returns no new items.
  3. The feed-generator instance yields no items during this interval.
  4. Downstream generators remain operational.
  5. If all feeds are empty, no new items are added downstream.

Result:
The pipeline continues uninterrupted; no special handling is required.


Alternative Flow 2 — Partial Consumption of the Pipeline

Condition:
Only a finite prefix of the stream is required (e.g., top N items).

Flow:

  1. Downstream consumers apply take(N).
  2. Upstream generators are evaluated only as needed.
  3. Remaining potential yield values are never materialized.

Result:
Latency and memory usage remain bounded. The pipeline supports early termination naturally.


Alternative Flow 3 — Editor Inserts or Reorders Items

Condition:
An editor manually modifies the aggregated stream.

Flow:

  1. Editor operations are applied as generator transformations
    (append, prepend, insert-at, remove-at, remove-where).
  2. A new generator with the modified yield is produced.
  3. Downstream stages consume it transparently.

Result:
Editorial control integrates seamlessly without breaking the pipeline.


Exception Flows

Exception Flow 1 — Feed Unreachable or Network Failure

Condition:
A feed cannot be reached during polling.

Flow:

  1. The data provider reports an error or timeout.
  2. The next instance of the feed generator yields no items during this polling interval.
  3. The error is logged for monitoring.
  4. A retry policy (e.g., exponential backoff) is applied.

Result:
The system continues operating with remaining feeds.


Exception Flow 2 — Malformed Feed Data

Condition:
A feed item is malformed (invalid XML/JSON or schema validation problems, e.g. missing required fields).

Flow:

  1. The normalization stage detects the issue.
  2. The item is discarded or quarantined.
  3. Processing continues with subsequent items.

Result:
Malformed data does not propagate downstream.


Exception Flow 3 — Resource Exhaustion Risk

Condition:
A downstream operation risks exceeding memory limits.

Flow:

  1. Bounded strategies (windowing, top-K selection) are applied.
  2. Full materialization is avoided.
  3. If needed, the operation degrades gracefully (e.g., reduced clustering depth).

Result:
System stability is preserved under load.


Postconditions

Upon successful execution:

Functional Outcomes

  • End users see an up-to-date Summary Page.
  • Each summary item links to a Detailed Page.
  • Editors can intervene using generator operations.
  • Administrators retain full system control.

Technical Guarantees

  • Memory usage remains bounded.
  • Latency is minimized through lazy evaluation.
  • Full materialization occurs only when explicitly requested.

System State

  • All generators remain composable.
  • Generator composition remains valid after alternative and exceptional flows.
  • Empty generators correctly represent exhaustion.
  • Infinite yields are supported up to stages that require finiteness.

References

  1. RSS 2.0 Specification
    https://www.rssboard.org/rss-specification

  2. Atom Publishing Protocol (RFC 5023)
    https://www.rfc-editor.org/rfc/rfc5023

  3. JSON-LD Specification
    https://json-ld.org/spec/

  4. TF-IDF, “Understanding TF-IDF (Term Frequency-Inverse Document Frequency)”, https://www.geeksforgeeks.org/machine-learning/understanding-tf-idf-term-frequency-inverse-document-frequency/

  5. TF-IDF + SVM, “Strengthening Fake News Detection: Leveraging SVM and Sophisticated Text Vectorization Techniques. Defying BERT?”, https://arxiv.org/html/2411.12703v1

  6. Sentence-BERT (SBERT)
    Reimers, N. & Gurevych, I., 2019
    https://arxiv.org/abs/1908.10084

  7. Fine-tuned BERT, “Fine-tuning a BERT model”, https://www.tensorflow.org/tfmodels/nlp/fine_tune_bert

  8. Ada Embeddings (OpenAI)
    Radford et al., 2021
    https://arxiv.org/abs/2103.00020

  9. Cosine Similarity
    https://en.wikipedia.org/wiki/Cosine_similarity

  10. Hierarchical Clustering
    https://en.wikipedia.org/wiki/Hierarchical_clustering

  11. DBSCAN
    Ester et al., 1996
    https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf

Pull request #2379 created #created-2379

12 Jan at 15:03:12 GMT
Use exported schema to validate function catalogs

This PR adds an exported SCM version of the fos.xsd schema with an embedded license. Updates to the build script use it to validate all function-catalog.xml files.

Pro: validation

Con: any change to the fos.xsd also has to be accompanied by an update to the exported schema, which probably, only Mike or I can do.

Issue #2378 created #created-2378

12 Jan at 12:07:34 GMT
HTML indenting

The spec says (in both 3.1 and 4.0):

The inline elements are those included in the %inline category of any of the HTML 4.01 DTDs or those elements defined to be phrasing elements in HTML5

This could be read as defining one set of inline elements for version="4.01" and a different set of inline elements for version="5.0", or it could be read as indicating that an element is an inline element if it satisfies either of these two conditions.

Issue #2370 closed #closed-2370

12 Jan at 11:00:25 GMT

Add character-maps to the allowed context dependencies

Issue #2374 closed #closed-2374

12 Jan at 10:53:45 GMT

Markup: Allow empty nt elements

Pull request #2377 created #created-2377

12 Jan at 10:51:37 GMT
2195 F+O Editorial Corrections

F+O: editorial corrections to issues identified in #2195 (mainly corrections to examples), plus completion of missing change metadata.

Pull request #2376 created #created-2376

09 Jan at 15:02:59 GMT
2337 Extend xsl:mode/@typed to handle JNodes etc

Fix #2337

Pull request #2375 created #created-2375

09 Jan at 13:24:21 GMT
2195 Editorial Omnibus

Fixes a number of problems from issue #2195.

Pull request #2374 created #created-2374

09 Jan at 13:15:06 GMT
Markup: Allow empty nt elements

Stylesheet change to allow empty NT elements, bringing them into line with other referencing elements such as termref and xnt. The markup <nt def="AxisStep"/> is treated as equivalent to <nt def="AxisStep">AxisStep</nt>. This removes a common source of error which tends to result in missing text in the spec rather than in any kind of build error.

Pull request #2373 created #created-2373

09 Jan at 10:30:31 GMT
2359 No conversion to JNode in absolute paths

Fix #2359

Pull request #2372 created #created-2372

09 Jan at 10:21:37 GMT
2344 Change rendition of PIs in HTML5

Fix #2344

Pull request #2371 created #created-2371

09 Jan at 09:50:54 GMT
2369 Add content for F&O section 11 (Processing binary values)

Fix #2369

Pull request #2370 created #created-2370

09 Jan at 09:20:56 GMT
Add character-maps to the allowed context dependencies

Currently function-catalog.xml is invalid against the schema fos.xsd. This PR updates the schema to allow "character-maps" in the enumeration of allowed context dependencies.

(Note, this should probably cause the build to fail. The problem was only spotted when using Oxygen to query the function catalog.)

Issue #2369 created #created-2369

09 Jan at 00:40:50 GMT
F+O section 11 is empty

F+O section 11, Processing Binary Values, is currently empty

Has something gone wrong, or should we delete the section?

Pull request #2368 created #created-2368

08 Jan at 23:08:29 GMT
2367 Misc XSLT editorial fixes

Most of the changes here are to bring the change log entries up to date. Also:

Fix #2356 Fix #2366 Fix #2367

Issue #2367 created #created-2367

08 Jan at 18:04:28 GMT
Documentation for new main-module attribute of xsl:stylesheet

In the XSLT 4.0 spec, 3.6 Stylesheet Element says:

The optional main-module attribute is purely documentary. By including this attribute in every stylesheet module of a package, an XSLT editing tool may be enabled to locate the top-level module of the relevant package [...]

But what does "top-level module" mean? Should this say "principal stylesheet module" instead? I can see that top-level package is defined, but not "top-level module", so I'm confused.

Issue #2366 created #created-2366

08 Jan at 17:44:37 GMT
json-lines attribute for xsl:output and xl:result-document in XSLT spec

The new serialization parameter json-lines is documented at 26.2 Serialization parameters. But please add a "changes" entry for the new attribute json-lines in the sections 25.1 Creating Secondary Results and 26.1 The xsl:output declaration in the XSLT 4.0 spec. This is currently missing.

(Note that there is already a changes entry in the Serialization spec at 3 Serialization Parameters.)

Issue #2365 created #created-2365

07 Jan at 21:41:50 GMT
Record types: extensible and non-extensible pairs

It is often useful in a function signature for an argument type to be an extensible record type (so additional fields are allowed, which the function can ignore, saving the need to check for their presence) while the return type is non-extensible (giving better static type checking for lookup expressions, for example).

Currently this requires two separate named record types to be declared, differing only in that one of them is extensible and the other not. This duplication is clearly undesirable.

One solution to this might be to have a single non-extensible definition of the name of the record type, with some way of indicating at the point where the record type is used that extensions are allowed.

For example (probably not viable syntax as written):

fn:element-to-map-plan(
       $input as element()*   
) as fn:element-to-map-conversion-plan

fn:element-to-map(
       $node as element(),
       $plan as extensible fn:element-to-map-conversion-plan
} as map(*)

Perhaps the syntax extensible(fn:element-to-map-conversion-plan) would work.

Pull request #2364 created #created-2364

07 Jan at 15:38:32 GMT
2088 File Module: Feedback, Observations

Closes #2088

Issue #2250 closed #closed-2250

07 Jan at 14:55:18 GMT

Function to detect/infer the string encoding from a binary

Issue #2092 closed #closed-2092

07 Jan at 14:53:17 GMT

Drop map:pair, map:of-pairs, map:pairs, array:members, array:of-members

Issue #2194 closed #closed-2194

07 Jan at 14:52:51 GMT

fn:transform sandbox=yes option

Pull request #2363 created #created-2363

07 Jan at 14:46:11 GMT
2349 Revert array:join

Closes #2349

Pull request #2362 created #created-2362

07 Jan at 13:57:19 GMT
2355 bin:infer-encoding: further alignments

Closes #2355

Issue #2361 created #created-2361

07 Jan at 12:49:03 GMT
Encoding parameters: upper/lower case, normalization

The serializer spec says…

Serializer are required to support values of UTF-8 and UTF-16

…whereas the XQFO spec mentions utf-8 as default value for fn:serialize. Similarly, only the UTF lower-case variants are listed for fn:unparsed-text, and there may be other places.

I think we should mention the upper-case variants everywhere, and add notes that upper/case is ignored when processing the encoding string.

Issue #2360 created #created-2360

07 Jan at 11:48:43 GMT
fn:root() vs. absolute path expressions

Is there a particular reason why the absolute slash / is defined as complicated as…

self::gnode()/(fn:root(.) treat as (document-node()|jnode())/PP

…and wouldn’t it be helpful to simplify it get rid of the treat as expression?

self::gnode()/fn:root(.)/PP

In many cases, the document node does not exist or is not really needed, and it would allow users to use the slash for nodes that would otherwise needs to wrapped into document nodes, for example:

let $as := analyze-string('abc', 'b')
return $as/fn:match[/fn:non-match]

Issue #2359 created #created-2359

06 Jan at 21:27:05 GMT
Implicit conversion to JNodes with absolute path expressions

Section 4.7.1 discusses absolute path expressions.

The first part of the section concerns leading "/", and includes the note:

If the context value includes a map or array, it is not converted implicitly to a JNode; rather, a type error occurs.

The second part concerns leading "//", and includes the statement:

Any map or array that is present in the context value is first coerced to a JNode by applying the [fn:jtree] function.

It might be inferred that "/" doesn't do this conversion, but "//" does. However, this certainly isn't stated explicitly, and there would be no logical reason for treating the two cases differently.

We should either do the conversion for both cases, or for neither.

I'm inclined to do it for neither. Partly because an implicit conversion wouldn't do any upwards navigation to a different "root" node, as users might expect; partly because doing the conversion reduces the type information available to the compiler.

QT4 CG meeting 147 draft minutes #minutes-01-06

06 Jan at 17:15:00 GMT

Draft minutes published.

Issue #407 closed #closed-407

06 Jan at 16:58:54 GMT

XSLT-specific context properties used in function items

Issue #2274 closed #closed-2274

06 Jan at 16:58:53 GMT

407 Function items capturing XSLT context components

Issue #1011 closed #closed-1011

06 Jan at 16:55:55 GMT

fn:transform() improvements

Issue #2348 closed #closed-2348

06 Jan at 16:55:54 GMT

1011 fn transform improvements

Issue #2339 closed #closed-2339

06 Jan at 16:52:33 GMT

Default priority of match="element(A|B)"

Issue #2335 closed #closed-2335

06 Jan at 16:52:32 GMT

Make `jnode()` like `element()`

Issue #2334 closed #closed-2334

06 Jan at 16:52:32 GMT

XSLT: Parenthesized subexpressions within Patterns

Issue #2297 closed #closed-2297

06 Jan at 16:52:32 GMT

XSLT pattern ambiguities with typed matches

Issue #2336 closed #closed-2336

06 Jan at 16:52:31 GMT

2334 Revise XSLT pattern syntax and semantics

Issue #2048 closed #closed-2048

06 Jan at 16:51:41 GMT

Untrusted execution, and security more generally

QT4 CG meeting 148 draft agenda #agenda-01-13

06 Jan at 11:30:00 GMT

Draft agenda published.

Pull request #2358 created #created-2358

05 Jan at 16:26:16 GMT
2357 Standardize on element() rather than element(*)

Fix #2357

Issue #2357 created #created-2357

05 Jan at 16:12:28 GMT
element() vs element(*) in function signatures

We use element(*) and element() interchangeably in function signatures. I propose we standardise on the simpler form, element().

Ditto attribute().

Issue #2356 created #created-2356

05 Jan at 15:34:47 GMT
Clarification on scope of variables in xsl:for-each-group/(@split-when|@merge-when)

A user experimenting with xsl:for-each-group/@split-when with my 4->3 source-code transformer, inferred that the variable $group was available within the sequence constructor of the grouping instruction.

(Unfortunately due to an error in my transformer code $group was within scope in the sequence constructor, though with the wrong value ;-) - this has since been corrected.)

A close and detailed reading of the spec shows that $group and $next are implied as only in scope for the evaluation of the @split-when expression. Might I suggest that there is a small note emphasising this is the case? Similar clarification may be worthwhile for @merge-when too.

QT4 CG meeting 147 draft agenda #agenda-01-06

05 Jan at 12:00:00 GMT

Draft agenda published.