QT4 CG Meeting 142 Minutes 2025-11-18

Meeting index / QT4CG.org / Dashboard / GH Issues / GH Pull Requests

Minutes
Summary of new and continuing actions [0/4]
1. Administrivia
2. Technical agenda
3. Any other business

Minutes

Approved at meeting 143 on 25 November 2025.

Summary of new and continuing actions `[0/4]`

[ ] QT4CG-140-02: MK to add a note about dealing with binary in parse-csv and parse-json
[ ] QT4CG-141-01: MK to follow up on a comment by JWL on #2269
[ ] QT4CG-142-01: MK to review the “Captured Groups within Lookahead” example.
[ ] QT4CG-142-02: MK to add explanatory note about the difference between typed an untyped values in string-length

1. Administrivia

1.1. Roll call `[9/10]`

Regrets: EP

[X] David J Birnbaum (DB)
[X] Reece Dunn (RD)
[X] Christian Grün (CG)
[X] Joel Kalvesmaki (JK)
[X] Michael Kay (MK)
[X] Juri Leino (JLO)
[X] John Lumley (JWL)
[X] Wendell Piez (WP)
[ ] Ed Porter (EP)
[X] Norm Tovey-Walsh (NW) Scribe. Chair.

1.2. Accept the agenda

Proposal: Accept the agenda.

Accepted.

1.3. Approve minutes of the previous meeting

Proposal: Accept the minutes of the previous meeting.

Accepted.

1.4. Next meeting

The next meeting is planned for 25 November 2025.

Regrets: EP, JLO

1.5. Review of open action items `[0/2]`

[ ] QT4CG-140-02: MK to add a note about dealing with binary in parse-csv and parse-json
[ ] QT4CG-141-01: MK to follow up on a comment by JWL on #2269

1.6. Review of open pull requests and issues

This section summarizes all of the issues and pull requests that need to be resolved before we can finish. See Technical Agenda below for the focus of this meeting.

1.6.1. Blocked

The following PRs are open but have merge conflicts or comments which suggest they aren’t ready for action.

PR #2256: 2216 All atomic types become ordered
PR #2247: Deferred Evaluation in XPath - the f:generator record
PR #2160: 2073 data model changes for JNodes and Sequences
PR #2124: 573 Functions to Construct Trees
PR #2071: 77c deep update
PR #2019: 1776: XSLT template rules for maps and array

1.6.2. Merge without discussion

The following PRs are editorial, small, or otherwise appeared to be uncontroversial when the agenda was prepared. The chairs propose that these can be merged without discussion. If you think discussion is necessary, please say so.

PR #2293: Updated RELAX NG grammar for XSLT 4.0 stylesheets
PR #2290: Updated schema for XSLT 4.0 stylesheets

Proposal: accept without discussion.

Accepted.

1.6.3. Close without action

It has been proposed that the following issues be closed without action. If you think discussion is necessary, please say so.

Issue #2252: Dynamic XPath Evaluation the functional way
Issue #1618: Adaptive serialization: doubles
JWL: Discussion of #2252 just sort of tailed off…
MK: Yes, I thought we should drop it not because it isn’t doable, but to limit our ambitions.
JLO: I think it would be good to have such a function because almost all XQuery implementations I know already have that function.
CG: I have concerns that it may be too implementation-specific as it revolves around optimizing code. I think there were two proposals in this discussion.
- One was a compiled instance that you could reuse,
- The other was to compile a string into a function item.
MK: Let’s leave #2252 open for the moment then.
CG: I think there have been other discussions about evaluating XPath
- This is more about optimizing
MK: This is primarily about the functionality.

Proposal: close #1618 with no further action.

Accepted.

2. Technical agenda

2.1. PR #2246: 2233 Expand xsl:analyze-string; introduce fn:regex-groups()

See PR #2246

MK: Now that regular expressions can match a zero length string, you can find all the word boundaries in an input string. The fn:regex-group function turns out to be somewhat inadequate; it returns the string but not where it found it.
- … This proposes a fn:regex-groups function that returns the string and the matches.
- … There are now “string segments” that are a combination of a substring and its position in the input.
- … We now operate on a non-overlapping sequence of segments.
- … Within a matching substring, you can access the groups.
- … This distinguishes between matches on zero length strings and failure to match
- … The fn:regex-group function is defined in terms of the new function, for backwards compatibility.
JWL: I’d be interested in a comparison between this and the fn:analyze-string function that produces the same sort of output.
- … Is there something to be said here about where one might be preferable?
MK: I think we enhanced fn:analyze-string function to account for zero-length string matches.
- … But I’m not sure it does it quite as well.
JWL: In one case you get a map and in another you get elements. Might be worth saying something here.
JK: I think this is really nice. The definition says that the string segments are non-overlapping. What about look-ahead groups?
MK: The groups can overlap, but the matched segments don’t.
JK: In the example, there are two cyan colored ones. In the first one, I don’t understand the select.
MK: Yes…that is clearly gone wrong somewhere.

ACTION QT4CG-142-01: MK to review the “Captured Groups within Lookahead” example.

Proposal: Accept this PR.

Accepted.

2.2. PR #2295: 2294 Clarify semantics of `element(N, xs:anyType)`

See PR #2295

MK: This is purely editorial and it’s a bug fix, because anyType is a supertype of untyped.
- And an attempt to clarify a few things.
JLO: What is the test of xs:anyType (without the question mark)?
MK: That matches anything that hasn’t been nilled.
JLO: What would xs:untyped? mean?
MK: It’s allowed by the grammar but it doesn’t effect the meaning because an untyped thing can never be nilled.

Proposal: Accept this PR.

Accepted.

2.3. PR #2289: 2195 (partial) Editorial notes (incremental)

See PR #2289

MK: In Safari the arrows look horrible. The arrows are very different.
CG: I’ll take another look at the arrows.
CG: But there are some other changes:
- There’s some changes to the summary of changes text.
MK: That’s all fine.
CG: Some of the examples in the binary module didn’t work for me so I made a few changes.
- … Mostly it’s about changing formatting.
CG: I think the code snippet for bin:shift is still buggy, but I can try to fix it.
JWL: If the arrows are sufficiently variable across the browser, should we make them images.

2.4. PR #2286: 2279 fn:string-length#1, fn:normalize-space#1: accept xs:anyAtomicType

See PR #2286

CG: There has been a lot of discussion. Liam noted that string-length() and string-length(.) do different things.
- … Without an argument, the context item is “stringified”.
- … But with an argument, it can only take strings.
CG: But because of typed nodes, this not as easy as I thought.
CG: We could make the item type xs:anyAtomicType instead of xs:string.
- … With this change, string length and normalize space would be similar to other functions that accept the context item as the first item.
JLO: I’m in favor. I think there’s a slight chance of misalignment because . could be an array. Don’t they get atomized?
- … But xs:anyAtomicType doesn’t allow that.
CG: Arrays are going to be atomized. If that produces more than one string, then you’ll get an error. I don’t think that changes.
JLO: I can pass that in even with xs:anyAtomicType?
MK: Yes, I think so.
JLO: Why is that in string length?
MK: That’s for typed nodes. If you pass in a name surrounded by spaces, the current spec (since XPath 2.0) says that the string length is the length of the string value of the node not the typed value. Those can be different.
- … We’re retaining that compatibility.

Proposal: Accept this PR.

Accepted.

ACTION QT4CG-142-02: MK to add explanatory note about the difference between typed an untyped values in string-length

2.5. PR #2285: 2198 Add pi-for-cdata parameter

See PR #2285

MK: This is a PR that just changes serialization, before going through all the other specs.
MK: This adds a new PI for CDATA section parameter that names a PI.
- … The requirement was someone who wanted to generate only CDATA sections where they’re needed.
- … I generalized that requirement to generate an arbitrary CDATA section anywhere a PI can occur.
NW: But putting data in a PI causes it to be lost by most down-stream processing.
- … I strongly object.
JK: I don’t understand the name of the option. It sounds like a campaign slogan.
MK: It was intended to be an abbreviation for this the name of a processing instruction that you use for inserting cdata.
CG: When I first saw this issue, I had other use cases in mind.
- … People want to have all text that uses special characters encoded.
- … I’d prefer a solution that lets you do that dynamically.
- … So maybe we have to use cases here, one where I want text explicitly escaped and another where I want that done more automatically.
WP: This cool in that it addresses a real requirement. I share NW’s hesitation. I think there are a couple of other possibilities. NW’s suggestion of a preceding PI would work. The other thing I wonder is, maybe there’s a way to flag an element with an attribute in namespace.
MK: You could maybe do something like we did for disable output escaping.
WP: I think there are a couple of options. I think NW’s concerns are well founded.
MK: Let’s try that one, putting an attribute on xsl:text that says CDATA section.
RD: It probably makes sense to have that be an XPath expression so you can call a function that makes that decision.
MK: Interesting.
JWL: RD’s suggestion would involve evaluating the function against the result.
RD: Yes, you’d get the string from the xsl:text or xsl:value-of and then pass that to the XPath expression.
JWL: The default value would be true() but you could put in a function.
MK: I think I can run with this.

2.6. PR #2282: 2278 Add function bin:infer-encoding; simplify bin:decode-string

See PR #2282

MK: I was becoming unhappy with the complexity of fn:decode-string. The interaction of skipping byte order marks and specifying offsets was getting very confusing.
- … I wondered if we could get a cleaner design?
MK: What I’ve proposed is a function fn:infer-encoding that returns an encoding and an offset where the real data starts.
- … So it might return “UTF-8” for the encoding and offset 3 if there’s a BOM.
- … In particular it addresses the case where the binary is embedded in another stream.
- … It simplifies bin:decode-string by saying it’s UTF-8 if you don’t specify it.
CG: I think the changes technically clean, but I have some concerns that where we started from was that we just had simple functions. One of my ideas was to try to make it easy for users who have maybe never heard of BOMs to be able to process data.
- … It’s nice, but I would have liked a simpler solution for those users.
MK: We have fn:json-doc and fn:csv-doc and those do try to work out the encoding for you.
- … But if you break it into steps, we now provide clean primitives.
CG: So you can write it to disk and read it back, but it’s probably easier to have a single function.
- … Web applications and things like RESTXQ where you don’t want to decode the data on the fly.
JWL: In fn:decode-string, my concern is that if I want to decode the string, if I don’t know what it is, isn’t there an argument saying that the default behavior should be to do an implicit infer?
MK: I think you end up with non-intutive behavior that way. Changing the offset from 0 to 1 changes the behavior that has nothing to do with the offset.

Some discussion of what the defaults should be and how they should interact.

JWL: If you only pass in a string, the fn:decode-string function should basically call the infer function. And if you provide an offset, that’s an offset in the real data.
- … If I just want to decode a string and I don’t know anything about BOM, do I have to check?
MK: We could make it so that if there’s only one argument supplied it does that logic.
JWL: Does the offset I ask for in fn:decode-string include the chopped off BOM.
JLO: The binary module for me is always just a hex viewer. I want to be able to get each byte that I was given.
- … I can see the possibility that I don’t want an inferred offset.
- … I just want to see the raw data; or I really want to skip them but I don’t want to infer anything from them.
- … And I can still see CG’s argument that we’d like it to be simple.
CG: In principle, I agree with JWL. I think it would be nice for the function to have a default behavior to infer the encoding.
- … One other option would be to remove the offset and size. The question is when does it make sense to read data when you don’t know the encoding.
MK: Indeed, that problem is at the heart of this. Even if you have UTF-8 data, the specifying a start and byte position doesn’t make much sense given that the characters vary in length.
- … You might have a message format that says how many octets each segment is. But…

Out of time. We’ll return to this next week.

3. Any other business

JWL: We’re going to try to put the tutorial material on the QT4CG website.

General nods of approval.

QT4 CG Meeting 142 Minutes 2025-11-18

Table of Contents

Minutes

Summary of new and continuing actions [0/4]

1. Administrivia

1.1. Roll call [9/10]

1.2. Accept the agenda

1.3. Approve minutes of the previous meeting

1.4. Next meeting

1.5. Review of open action items [0/2]

1.6. Review of open pull requests and issues

1.6.1. Blocked

1.6.2. Merge without discussion

1.6.3. Close without action

2. Technical agenda

2.1. PR #2246: 2233 Expand xsl:analyze-string; introduce fn:regex-groups()

2.2. PR #2295: 2294 Clarify semantics of `element(N, xs:anyType)`

2.3. PR #2289: 2195 (partial) Editorial notes (incremental)

2.4. PR #2286: 2279 fn:string-length#1, fn:normalize-space#1: accept xs:anyAtomicType

2.5. PR #2285: 2198 Add pi-for-cdata parameter

2.6. PR #2282: 2278 Add function bin:infer-encoding; simplify bin:decode-string

3. Any other business

Summary of new and continuing actions `[0/4]`

1.1. Roll call `[9/10]`

1.5. Review of open action items `[0/2]`