QT4 CG Meeting 170 Minutes 2026-06-30
Meeting index / QT4CG.org / Dashboard / GH Issues / GH Pull Requests
Table of Contents
- Summary of new and continuing actions
[0/9] - Draft Minutes
- 1. Administrivia
- 2. Technical agenda
- 2.1. PR #2736: 2169 Reformulate `StringInterpolation` to remove the `` `{ `` and `` }` `` literal terminals
- 2.2. PR #2734: 2733 A step too far
- 2.3. PR #2731: 2709 Well-formed sub-documents (2nd attempt)
- 2.4. PR #2719: 1234 Serialization Parameters: Indentation, Whitespace, Newlines
- 2.5. PR #2717: 2660 fn:matching-segments: named capture groups
- 3. Any other business
Summary of new and continuing actions [0/9]
[ ]QT4CG-143-02: MK to try to recover the ability to extract formal equivalences into tests[ ]QT4CG-167-01: DB to write a PR for #2641, comments in CSV[ ]QT4CG-167-02: MK to make a PR for #2591, grammar for step?lookup is invalid[ ]QT4CG-167-03: NW to make a PR for #2482, fallback on bin:decode-string[ ]QT4CG-167-05: MK to write a proposal to change #2393 so the functions return JNodes[ ]QT4CG-167-07: NW to review tests for interpolated strings with edge cases in mind[ ]QT4CG-167-08: MK to review the state of #1949 to see which items are still outstanding.[ ]QT4CG-167-09: NW to close all “nice to have” issues at the end of October if they haven’t progressed[ ]QT4CG-170-01: RD to draft a proposal that attempts to address the lexical issues differently.
Draft Minutes
1. Administrivia
1.1. Roll call [11/11]
[X]David J Birnbaum (DB)[X]Reece Dunn (RD)[X]Christian Grün (CG)[X]Joel Kalvesmaki (JK)[X]Michael Kay (MK)[X]Juri Leino (JLO)[X]John Lumley (JWL)[X]Alan Painter (AP)[X]Wendell Piez (WP)[X]Bethan Tovey-Walsh (BTW)[X]Norm Tovey-Walsh (NW) Scribe. Chair.
1.2. Accept the agenda
Proposal: Accept the agenda.
Accepted.
1.3. Approve minutes of the previous meeting
Proposal: Accept the minutes of the previous meeting.
Accepted.
1.4. Next meeting
The next meeting is planned for 7 July.
No regrets heard.
1.5. Review of open action items [1/9]
[ ]QT4CG-143-02: MK to try to recover the ability to extract formal equivalences into tests[ ]QT4CG-167-01: DB to write a PR for #2641, comments in CSV[ ]QT4CG-167-02: MK to make a PR for #2591, grammar for step?lookup is invalid[ ]QT4CG-167-03: NW to make a PR for #2482, fallback on bin:decode-string[X]QT4CG-167-04: NW to make a PR explaining load-xquery-module for PR #2464- Overtaken by events; MK did it!
[ ]QT4CG-167-05: MK to write a proposal to change #2393 so the functions return JNodes[ ]QT4CG-167-07: NW to review tests for interpolated strings with edge cases in mind[ ]QT4CG-167-08: MK to review the state of #1949 to see which items are still outstanding.[ ]QT4CG-167-09: NW to close all “nice to have” issues at the end of October if they haven’t progressed
1.6. Review of open pull requests and issues
This section summarizes all of the issues and pull requests that need to be resolved before we can finish. See Technical Agenda below for the focus of this meeting.
1.6.1. Blocked
The following PRs are open but have merge conflicts or comments which suggest they aren’t ready for action.
- PR #2638: 2632-6: cross-cutting consistency (automatically generated & reviewed)
- PR #2637: 2632-5: refresh stale 4.0 content (automatically generated & reviewed)
- PR #2636: 2632-4: logic and semantics (automatically generated & reviewed)
- PR #2635: 2632-3: typos and grammar (automatically generated & reviewed)
- PR #2634: 2632-2: fix broken examples in expressions.xml (automatically generated & reviewed)
- PR #2633: 2632-1: fix critical bugs and DTD-validity issues (automatically generated & reviewed)
- PR #2594: 2389 Adaptive Serialization: more freedom
- PR #2350: 708 An alternative proposal for generators
- PR #2247: 716 Deferred Evaluation in XPath - the f:generator record
- PR #2160: 2073 data model changes for JNodes and Sequences
- PR #2071: 77c deep update
1.6.2. Merge without discussion
The following PRs are editorial, small, or otherwise appeared to be uncontroversial when the agenda was prepared. The chairs propose that these can be merged without discussion. If you think discussion is necessary, please say so.
- PR #2732: 2708 Obsolete references to maps being unordered
- PR #2711: 2536 EXPath modules: handling of default values
- PR #2701: 2677 Clarify effect of serialization-params on fn:transform
Proposal: merge without discussion.
Accepted.
1.6.3. Close without action
It has been proposed that the following issues be closed without action. If you think discussion is necessary, please say so.
- Issue #2464: add method to use path()
Proposal: close with no further action
Accepted.
1.6.4. Substantive PRs
The following substantive PRs were open when this agenda was prepared.
- PR #2736: 2169 Reformulate `StringInterpolation` to remove the `` `{ `` and `` }` `` literal terminals
- PR #2734: 2733 A step too far
- PR #2731: 2709 Well-formed sub-documents (2nd attempt)
- PR #2719: 1234 Serialization Parameters: Indentation, Whitespace, Newlines
- PR #2717: 2660 fn:matching-segments: named capture groups
- PR #2715: 2653 FLWOR, member/key/value clauses: allow sequences
- PR #2714: 2219 Generalize method calls to sequences
- PR #2713: 2257 Record declarations without namespace
- PR #2712: 2702 Dynamic node tests
- PR #2710: 2702 Dynamic selectors: focus-dependency
- PR #2707: 1962 fn:map-to-element
- PR #2706: 2704 change path() output for JNodes
- PR #2698: 2641 Support comments in csv
- PR #2696: 2695 Apply templates to maps, arrays, and JNodes
- PR #2649: 2647 descendants: recursion, filtering
- PR #2739: 2738 XSLT - tree terminology
2. Technical agenda
2.1. PR #2736: 2169 Reformulate `StringInterpolation` to remove the `` `{ `` and `` }` `` literal terminals
See PR #2736
- CG: I asked Gunther to create the PR.
- RD: I don’t think Gunther’s proposal is the correct one that resolves the underlying issue.
- … I think the underlying issue is in how we’re describing the tokenization rules.
- … Perhaps we should reformulate this by grouping the parser constructs by the logical construct they define
- … There are two grammar rules for comments, for example. Do something similar for string interpolation, etc.
- … Then pull in some of the logic from the old tokenizer draft. Describe it how the HTML5 rules work.
- … If you find this token, then transition to this state, etc.
- … Then it would be clearer what characters a valid in which states.
- … I think all the XQuery parsers are using some sort of state-based parsing.
- MK: We need to keep the specification distinct from an implementation. HTML
has gone down the route of putting an implementation in the spec and I think
that’s a disaster.
- … We need to keep things more abstract.
- RD: What I’m proposing wouldn’t constrain implementations. An implementation could use a state based lexer, or they could drive it by the current parse context or…
- MK: The problem with the HTML approach is that bugs in the spec become bugs in the implementation.
- … If the state tables are wrong, you have to implement them incorrectly to be conformant.
- RD: I think what I’m suggesting is to group the rules and describe the lexical
states that enter and exit those groups.
- … I’m not suggesting a specific implementation, I’m proposing organizing things differenty.
- … Something between what we have now and the HTML rules, but in an implementation-flexible way.
- … One of the difficulties is that because of the complexities of tokenizing and parsing XQuery and XPath, you need to document some of this otherwise it’s unclear what the behavior is.
- JLO: I do like the idea of splitting up the grammar differently. I don’t think
that it necessarily requires a particular implementation.
- … I fail to see the problem, but Gunther showed me.
- RD: I’m willing to experiment and try to see what this looks like.
- MK: It needs a concrete proposal.
- JLO: We all agree that there is a problem.
- MK: I’m not convinced that there’s a problem after we apply Gunther’s patch.
- JLO:
ws:explicitisn’t a problem? - MK: I expect that could use a bit more explanation of exactly what that means.
- RD: I’m not convinced because I find the wording of the note on “complex terminals” confusing, as I said in a comment on the PR.
- … That’s around the tokenization of the processing instruction. From the wording there, it seems to imply that if you’ve got a preprocessing literal inside the content of a string interpolation, then by the longest token rules, and because the tokenization rules aren’t context dependent with these changes, then that should be processed as a preprocessed literal rather than a string interpolation content.
- NW: We don’t seem to have consensus…
- MK: Are you sure? Gunther’s PR improves the situation.
Proposal: accept this PR.
Accepted.
ACTION QT4CG-170-01: RD to draft a proposal that attempts to address the lexical issues differently.
2.2. PR #2734: 2733 A step too far
See PR #2734
- MK: The proposal we accepted to treat any step that yields an atomic value as
a selection, was going to far: it leads to some undesirable consequences.
- … Let’s look at the proposal and then discuss some of the alternatives.
MK walks through the PR.
- MK: If you’re in the context of JNodes, any step that returns atomic values
was being interpreted as a selector. So you got a sort of double-evaluation if
you put
jvalue()after a slash.- … What I propose here is that we should only treat
E2inE1/E2as a selector if it’s a literal or variable reference. - … All other expressions, whether they return atomic values or nodes are treated as mapping expressions.
- … It’s not referentially transparent, unfortunately.
- … As soon as you put in an expression instead of a literal, it means something else.
- … What I propose here is that we should only treat
- MK: One other alternative depends on deciding if an expression was context sensitive. But that’s hard to do.
On other possible formulation is in the comments on the PR.
- MK: We can decide if an expression is “navigational” or not, with some rules.
And we do the rewrite for
E2only if it isn’t navigational.- … If it is navigational, but doesn’t return JNodes then it throws an error.
- … That means using
!or=!>instead of/in some cases.
- MK: It’s still a bit arbitrary, but it seems like it gives better results.
- … Basically
/has two meanings: it’s either navigational or a selector.
- … Basically
- WP: Am I right that this would expose all my bad habits and make me fix them?
- MK: No, but you won’t be able to carry your bad habits over into JNodes.
- … It’s fully backwards compatible with XNodes
- WP: With respect to JNodes it means we have a different classification rather than an anomaly.
- JLO: This is only if I want to output a literal, I have to use
! - MK: Yes. We’re also discouraging you from writing
…/function()where that function computes a value rather than navigating.
Some discussion of paths ending with /string() or /data().
- JLO: If it really is about
string()anddata(), we should make that more prominent. - RD: For an existing 3.1 expression using XML nodes, it works as is.
- WP: It seems to me that the underlying issue is a feature of the transparency: JNodes work like XNodes except when they don’t.
- CG: From a technical point of view, I definitely like the simpler proposal.
- … We already have similar situations like constant strings in predicates.
- … Regarding the two proposals: the existing one or navigation one. I think the navigational one will cause fewer surprises.
- … With the first proposal, you’ll get different results depending on whether you use single or double quotes or back ticks.
- MK: Okay, I’ll try to develop the navigational approach into a proposal.
2.3. PR #2731: 2709 Well-formed sub-documents (2nd attempt)
See PR #2731
This is purely editorial. MK observes that something seems to have gone wrong with the commit as it’s incomplete.
MK to update the PR so that it has all of the files.
2.4. PR #2719: 1234 Serialization Parameters: Indentation, Whitespace, Newlines
See PR #2719
- CG: There are a lot of implementation dependent ways to specify things like indentation width.
- … I thought it might be good to try to formalize this.
CG describes the new parameters: indent-unit, indent-attributes, and line-ending.
- JWL: Will you get line endings without indentation? I can see a case where you get some extraordinarily deep trees, you might want the ident-unit to be empty or a single character. I’d be happy for the algorithm to be adaptive.
- CG: Line endings will always be output.
- JWL: So if you don’t turn on indentation, you’ll still get all the tags on different lines?
- CG: No, then it would be everything on one line.
- JWL: That’s my argument for allowing the indent-unit to be the empty string.
- CG: I can change that.
- MK: Three comments:
- … Why is this only for the XML output method, rather than for XHTML and JSON etc. I think we’d want to apply it to all of them.
- … The proposal needs to cover all the wretched different ways to set serialization parameters in XSLT and XQuery.
- … From a usability perspective: how are we going to present these values in
the attributes. Requiring explicit backslash t and backslash s might be the
way to do it.
- … We’ve introduced those in the
char()function. I’d suggest doing it that way.
- … We’ve introduced those in the
- CG: I think that we could use the backslash forms in some of the cases.
- MK: We can limit it to what survives round tripping in XML.
- CG: I should add it to HTML and XHTML.
- JLO: I’m confused about the discussion of backslash t and U+0009? Is there an example?
- CG: I think it would be something like
\t\tand\r\n(etc) for the values that are represented. - JLO: And not entities? That’s confusing.
- MK: Entities are fine in XML, but what if you then get the serialization parameters from a JSON document, what do you write there?
- JLO: This would be a hard sell for me.
- MK: We could just use
tsrandn. - JLO: What about using a zero-width characters?
- MK: And they aren’t treated as whitespace by an XML parser.
- JLO: Ok, nevermind then.
- CG: We could allow three constants and then if the input isn’t one of those constants, we could use the input as a literal string.
2.5. PR #2717: 2660 fn:matching-segments: named capture groups
See PR #2717
CG introduces the PR.
- CG: The
fn:matching-segmentfunction returns numbered groups. The idea is that we should allow names as many other languages do.
CG jumps to the example that shows year, month, day instead of 1, 2, 3.
- CG: I’ve limited the names to ASCII literals. You get only the names if you use names. The names must be distinct.
- MK: I like it, apart from the fact that it’s more work.
- … I do think we need to carry it through into the replace function.
- CG: Yep. I’ve created an issue for that and will do it next.
- JK: In one of my comments, I observed that a named capturing group will not also have an integer index value.
- … If you have a named group but you also want to refer to it by number, what behavior do you expect.
- CG: My first proposal included the group number, but that’s redundant. You can also have groups that aren’t part of the match, so the numbers are not always consecutive.
- MK: I thought it was preferable not to have the redundancy, but I don’t feel strongly about it.
- JLO: I think that if a named capturing group that can occur more than once,
you might want to find them by number.
- … I haven’t seen it in the wild, but it seems possible.
- CG: I can also check what other languages provide.
- WP: Are we also going to have named groups for
xsl:analyze-string? - CG: Yes, I did that in
xsl:analyze-stringas well. - JK: The other question I asked, will there be any potential pitfalls if someone attempts to name the groups with integers.
- CG: The group names will be strings.
Some discussion of how lookup expressions work.
- CG: Also, at the moment, digits aren’t allowed at the beginning.
- BTW: I’m wondering about making group names ASCII letters and digits.
- … Does this disadvantage people who are working in non-latin languages.
- WP: Probably not more than they are already disadvantaged.
Some discussion of ASCII vs any Unicode name character.
- WP: That’s why I try to use XML Names in this place.
- MK: Have we abandoned the idea that this should be implemented with third party libraries.
3. Any other business
None heard.