QT4 CG Meeting 142 Minutes 2025-11-18
Meeting index / QT4CG.org / Dashboard / GH Issues / GH Pull Requests
Table of Contents
- Draft Minutes
- Summary of new and continuing actions
[0/4] - 1. Administrivia
- 2. Technical agenda
- 2.1. PR #2246: 2233 Expand xsl:analyze-string; introduce fn:regex-groups()
- 2.2. PR #2295: 2294 Clarify semantics of `element(N, xs:anyType)`
- 2.3. PR #2289: 2195 (partial) Editorial notes (incremental)
- 2.4. PR #2286: 2279 fn:string-length#1, fn:normalize-space#1: accept xs:anyAtomicType
- 2.5. PR #2285: 2198 Add pi-for-cdata parameter
- 2.6. PR #2282: 2278 Add function bin:infer-encoding; simplify bin:decode-string
- 3. Any other business
Draft Minutes
Summary of new and continuing actions [0/4]
[ ]QT4CG-140-02: MK to add a note about dealing with binary in parse-cvs and parse-json[ ]QT4CG-141-01: MK to follow up on a comment by JWL on #2269[ ]QT4CG-142-01: MK to review the “Captured Groups within Lookahead” example.[ ]QT4CG-142-02: MK to add explanatory note about the difference between typed an untyped values in string-length
1. Administrivia
1.1. Roll call [9/10]
Regrets: EP
[X]David J Birnbaum (DB)[X]Reece Dunn (RD)[X]Christian Grün (CG)[X]Joel Kalvesmaki (JK)[X]Michael Kay (MK)[X]Juri Leino (JLO)[X]John Lumley (JWL)[X]Wendell Piez (WP)[ ]Ed Porter (EP)[X]Norm Tovey-Walsh (NW) Scribe. Chair.
1.2. Accept the agenda
Proposal: Accept the agenda.
Accepted.
1.3. Approve minutes of the previous meeting
Proposal: Accept the minutes of the previous meeting.
Accepted.
1.4. Next meeting
The next meeting is planned for 25 November 2025.
Regrets: EP, JLO
1.5. Review of open action items [0/2]
[ ]QT4CG-140-02: MK to add a note about dealing with binary in parse-cvs and parse-json[ ]QT4CG-141-01: MK to follow up on a comment by JWL on #2269
1.6. Review of open pull requests and issues
This section summarizes all of the issues and pull requests that need to be resolved before we can finish. See Technical Agenda below for the focus of this meeting.
1.6.1. Blocked
The following PRs are open but have merge conflicts or comments which suggest they aren’t ready for action.
1.6.2. Merge without discussion
The following PRs are editorial, small, or otherwise appeared to be uncontroversial when the agenda was prepared. The chairs propose that these can be merged without discussion. If you think discussion is necessary, please say so.
- PR #2293: Updated RELAX NG grammar for XSLT 4.0 stylesheets
- PR #2290: Updated schema for XSLT 4.0 stylesheets
Proposal: accept without discussion.
Accepted.
1.6.3. Close without action
It has been proposed that the following issues be closed without action. If you think discussion is necessary, please say so.
- Issue #2252: Dynamic XPath Evaluation the functional way
- Issue #1618: Adaptive serialization: doubles
- JWL: Discussion of #2252 just sort of tailed off…
- MK: Yes, I thought we should drop it not because it isn’t doable, but to limit our ambitions.
- JLO: I think it would be good to have such a function because almost all XQuery implementations I know already have that function.
- CG: I have concerns that it may be too implementation-specific as it revolves
around optimizing code. I think there were two proposals in this discussion.
- One was a compiled instance that you could reuse,
- The other was to compile a string into a function item.
- MK: Let’s leave #2252 open for the moment then.
- CG: I think there have been other discussions about evaluating XPath
- This is more about optimizing
- MK: This is primarily about the functionality.
Proposal: close #1618 with no further action.
Accepted.
2. Technical agenda
2.1. PR #2246: 2233 Expand xsl:analyze-string; introduce fn:regex-groups()
See PR #2246
- MK: Now that regular expressions can match a zero length string, you can find
all the word boundaries in an input string. The
fn:regex-groupfunction turns out to be somewhat inadequate; it returns the string but not where it found it.- … This proposes a
fn:regex-groupsfunction that returns the string and the matches. - … There are now “string segments” that are a combination of a substring and its position in the input.
- … We now operate on a non-overlapping sequence of segments.
- … Within a matching substring, you can access the groups.
- … This distinguishes between matches on zero length strings and failure to match
- … The
fn:regex-groupfunction is defined in terms of the new function, for backwards compatibility.
- … This proposes a
- JWL: I’d be interested in a comparison between this and the
fn:analyze-stringfunction that produces the same sort of output.- … Is there something to be said here about where one might be preferable?
- MK: I think we enhanced
fn:analyze-stringfunction to account for zero-length string matches.- … But I’m not sure it does it quite as well.
- JWL: In one case you get a map and in another you get elements. Might be worth saying something here.
- JK: I think this is really nice. The definition says that the string segments are non-overlapping. What about look-ahead groups?
- MK: The groups can overlap, but the matched segments don’t.
- JK: In the example, there are two cyan colored ones. In the first one, I don’t understand the select.
- MK: Yes…that is clearly gone wrong somewhere.
ACTION QT4CG-142-01: MK to review the “Captured Groups within Lookahead” example.
Proposal: Accept this PR.
Accepted.
2.2. PR #2295: 2294 Clarify semantics of `element(N, xs:anyType)`
See PR #2295
- MK: This is purely editorial and it’s a bug fix, because anyType is a supertype of untyped.
- And an attempt to clarify a few things.
- JLO: What is the test of
xs:anyType(without the question mark)? - MK: That matches anything that hasn’t been nilled.
- JLO: What would
xs:untyped?mean? - MK: It’s allowed by the grammar but it doesn’t effect the meaning because an untyped thing can never be nilled.
Proposal: Accept this PR.
Accepted.
2.3. PR #2289: 2195 (partial) Editorial notes (incremental)
See PR #2289
- MK: In Safari the arrows look horrible. The arrows are very different.
- CG: I’ll take another look at the arrows.
- CG: But there are some other changes:
- There’s some changes to the summary of changes text.
- MK: That’s all fine.
- CG: Some of the examples in the binary module didn’t work for me so I made a
few changes.
- … Mostly it’s about changing formatting.
- CG: I think the code snippet for bin:shift is still buggy, but I can try to fix it.
- JWL: If the arrows are sufficiently variable across the browser, should we make them images.
2.4. PR #2286: 2279 fn:string-length#1, fn:normalize-space#1: accept xs:anyAtomicType
See PR #2286
- CG: There has been a lot of discussion. Liam noted that
string-length()andstring-length(.)do different things.- … Without an argument, the context item is “stringified”.
- … But with an argument, it can only take strings.
- CG: But because of typed nodes, this not as easy as I thought.
- CG: We could make the item type
xs:anyAtomicTypeinstead ofxs:string.- … With this change, string length and normalize space would be similar to other functions that accept the context item as the first item.
- JLO: I’m in favor. I think there’s a slight chance of misalignment because
.could be an array. Don’t they get atomized?- … But
xs:anyAtomicTypedoesn’t allow that.
- … But
- CG: Arrays are going to be atomized. If that produces more than one string, then you’ll get an error. I don’t think that changes.
- JLO: I can pass that in even with
xs:anyAtomicType? - MK: Yes, I think so.
- JLO: Why is that in string length?
- MK: That’s for typed nodes. If you pass in a name surrounded by spaces, the
current spec (since XPath 2.0) says that the string length is the length of
the string value of the node not the typed value. Those can be different.
- … We’re retaining that compatibility.
Proposal: Accept this PR.
Accepted.
ACTION QT4CG-142-02: MK to add explanatory note about the difference between typed an untyped values in string-length
2.5. PR #2285: 2198 Add pi-for-cdata parameter
See PR #2285
- MK: This is a PR that just changes serialization, before going through all the other specs.
- MK: This adds a new PI for CDATA section parameter that names a PI.
- … The requirement was someone who wanted to generate only CDATA sections where they’re needed.
- … I generalized that requirement to generate an arbitrary CDATA section anywhere a PI can occur.
- NW: But putting data in a PI causes it to be lost by most down-stream processing.
- … I strongly object.
- JK: I don’t understand the name of the option. It sounds like a campaign slogan.
- MK: It was intended to be an abbreviation for this the name of a processing instruction that you use for inserting cdata.
- CG: When I first saw this issue, I had other use cases in mind.
- … People want to have all text that uses special characters encoded.
- … I’d prefer a solution that lets you do that dynamically.
- … So maybe we have to use cases here, one where I want text explicitly escaped and another where I want that done more automatically.
- WP: This cool in that it addresses a real requirement. I share NW’s hesitation. I think there are a couple of other possibilities. NW’s suggestion of a preceding PI would work. The other thing I wonder is, maybe there’s a way to flag an element with an attribute in namespace.
- MK: You could maybe do something like we did for disable output escaping.
- WP: I think there are a couple of options. I think NW’s concerns are well founded.
- MK: Let’s try that one, putting an attribute on
xsl:textthat says CDATA section. - RD: It probably makes sense to have that be an XPath expression so you can call a function that makes that decision.
- MK: Interesting.
- JWL: RD’s suggestion would involve evaluating the function against the result.
- RD: Yes, you’d get the string from the
xsl:textorxsl:value-ofand then pass that to the XPath expression. - JWL: The default value would be
true()but you could put in a function. - MK: I think I can run with this.
2.6. PR #2282: 2278 Add function bin:infer-encoding; simplify bin:decode-string
See PR #2282
- MK: I was becoming unhappy with the complexity of
fn:decode-string. The interaction of skipping byte order marks and specifying offsets was getting very confusing.- … I wondered if we could get a cleaner design?
- MK: What I’ve proposed is a function
fn:infer-encodingthat returns an encoding and an offset where the real data starts.- … So it might return “UTF-8” for the encoding and offset 3 if there’s a BOM.
- … In particular it addresses the case where the binary is embedded in another stream.
- … It simplifies
bin:decode-stringby saying it’s UTF-8 if you don’t specify it.
- CG: I think the changes technically clean, but I have some concerns that where
we started from was that we just had simple functions. One of my ideas was to
try to make it easy for users who have maybe never heard of BOMs to be able to
process data.
- … It’s nice, but I would have liked a simpler solution for those users.
- MK: We have
fn:json-docandfn:csv-docand those do try to work out the encoding for you.- … But if you break it into steps, we now provide clean primitives.
- CG: So you can write it to disk and read it back, but it’s probably easier to have a single function.
- … Web applications and things like RESTXQ where you don’t want to decode the data on the fly.
- JWL: In
fn:decode-string, my concern is that if I want to decode the string, if I don’t know what it is, isn’t there an argument saying that the default behavior should be to do an implicit infer? - MK: I think you end up with non-intutive behavior that way. Changing the offset from 0 to 1 changes the behavior that has nothing to do with the offset.
Some discussion of what the defaults should be and how they should interact.
- JWL: If you only pass in a string, the
fn:decode-stringfunction should basically call the infer function. And if you provide an offset, that’s an offset in the real data.- … If I just want to decode a string and I don’t know anything about BOM, do I have to check?
- MK: We could make it so that if there’s only one argument supplied it does that logic.
- JWL: Does the offset I ask for in
fn:decode-stringinclude the chopped off BOM. - JLO: The binary module for me is always just a hex viewer. I want to be able
to get each byte that I was given.
- … I can see the possibility that I don’t want an inferred offset.
- … I just want to see the raw data; or I really want to skip them but I don’t want to infer anything from them.
- … And I can still see CG’s argument that we’d like it to be simple.
- CG: In principle, I agree with JWL. I think it would be nice for the function
to have a default behavior to infer the encoding.
- … One other option would be to remove the offset and size. The question is when does it make sense to read data when you don’t know the encoding.
- MK: Indeed, that problem is at the heart of this. Even if you have UTF-8 data,
the specifying a start and byte position doesn’t make much sense given that
the characters vary in length.
- … You might have a message format that says how many octets each segment is. But…
Out of time. We’ll return to this next week.
3. Any other business
JWL: We’re going to try to put the tutorial material on the QT4CG website.
General nods of approval.