QT4 CG Meeting 139 Minutes 2025-10-21
Meeting index / QT4CG.org / Dashboard / GH Issues / GH Pull Requests
Table of Contents
- Draft Minutes
- Summary of new and continuing actions
[0/0]
- 1. Administrivia
- 2. Technical agenda
- 2.1. PR #2249: 2221 fn:unparsed-text: Encoding, BOM handling
- 2.2. PR #2248: 2148b XDM Recognize that Base URI property may be invalid
- 2.3. PR #2223: 2193 fn:parse-xml, fn:doc: Drop security options
- 2.4. PR #2205: 2190 Drop binary input for parse-csv and parse-json
- 2.5. PR #2251: 323 Add select attribute to xsl:text
- 3. Any other business
Draft Minutes
Summary of new and continuing actions [0/0]
None.
1. Administrivia
1.1. Roll call [9/11]
[X]
David J Birnbaum (DB)[X]
Reece Dunn (RD)[X]
Christian Grün (CG)[X]
Joel Kalvesmaki (JK)[X]
Michael Kay (MK)[X]
Juri Leino (JLO)[X]
John Lumley (JWL)[X]
Wendell Piez (WP)[ ]
Ed Porter (EP)[ ]
Bethan Tovey-Walsh (BTW)[X]
Norm Tovey-Walsh (NW) Scribe. Chair.
1.2. Accept the agenda
Proposal: Accept the agenda.
Accepted.
1.3. Approve minutes of the previous meeting
Proposal: Accept the minutes of the previous meeting.
Accepted.
1.4. Next meeting
The next meeting is planned for 28 October 2025.
Heads up: daylight saving time ends in the UK and Europe on 26 October 2025. It ends in the United States on 2 November 2025. Our meetings are scheduled on European civil time, consequently, our meeting of 28 October 2025 will occur one hour later in the United States at 12:00EDT.
JK gives regrets.
1.5. Review of open action items [/]
None
1.6. Review of open pull requests and issues
This section summarizes all of the issues and pull requests that need to be resolved before we can finish. See Technical Agenda below for the focus of this meeting.
1.6.1. Blocked
The following PRs are open but have merge conflicts or comments which suggest they aren’t ready for action.
- PR #2256: 2216 All atomic types become ordered
- PR #2222: 2217 bin:decode-string: Input encoding
- PR #2208: 675 (part) Update XSLT streamability rules
- PR #2160: 2073 data model changes for JNodes and Sequences
- PR #2124: 573 Functions to Construct Trees
- PR #2120: 2007 Revised design for xsl:array
- PR #2071: 77c deep update
- PR #2019: 1776: XSLT template rules for maps and array
1.6.2. Merge without discussion
The following PRs are editorial, small, or otherwise appeared to be uncontroversial when the agenda was prepared. The chairs propose that these can be merged without discussion. If you think discussion is necessary, please say so.
- PR #2255: 2254 Fix spelling of nevertheless
Proposed: accept without discussion
Accepted
1.6.3. Close without action
It has been proposed that the following issues be closed without action. If you think discussion is necessary, please say so.
- Issue #1885: Use the spcification grammar markup to define the regular expression grammar in F&O
- Issue #1111: xsl:pipeline
- Issue #760: Serialize functions: consistency
Proposed: close without further action
Accepted
2. Technical agenda
2.1. PR #2249: 2221 fn:unparsed-text: Encoding, BOM handling
See PR #2249
CG introduces the PR.
- CG: There are some clarifications.
- … I added a discussion of how to determine the encoding.
- … The old rules weren’t complete.
- JLO: It sounds reasonable. Is it incompatible with the current version?
- CG: It’s compatible.
- NW: I’m surprised that the encoding specified isn’t unconditionally accepted
- CG: You do. The first four rules only apply if the encoding is absent.
- NW: blush
Some discussion of server encodings.
- WP: Are the different very often?
- NW: I don’t think so, most of the world is using Unicode now.
- RD: According to the WHATWG encoding spec, all the old encodings (Big5, ShiftJIS, etc) are marked as legacy. They’ve pretty much standardized on UTF-8 but also supporting UTF-16. That’s typically supported by all the major web browsers. See https://encoding.spec.whatwg.org
- MK: Does the new text make it clear whether the BOM is dropped or included.
- CG: It says “can be consumed”.
- MK: That says “can”, I don’t think that’s clear. There was a final sentence that said the BOM was discarded.
- CG: That’s because the BOM is only discarded in the first three cases.
- NW: I think an explicit statement that if a UTF encoding is determined and a BOM is detected, it is discard.
Proposal: accept this PR.
Accepted.
NW will merge it after CG adds the clarification requested.
2.2. PR #2248: 2148b XDM Recognize that Base URI property may be invalid
See PR #2248
MK introduces the changes proposed for the Data Model.
- MK: It adds 7.3 Base URI that explains that the base URI can change, can be
absent, and that in practice you can’t rely on the value conforming to XML Base.
- … You may accept a DOM or some other tree model that was constructed with a value that wasn’t validated.
- … Pragmatically, we acknowledge a third state: present but invalid.
- MK: That sets the ground work for saying that anything that relies on the base URI may raise a dynamic error in this case.
- CG: Thanks. I wondered what would happen if fn:base-uri is called or if you have a direct constructor?
- MK: If you’ve got an invalid base URI, it’s best to say that all bets are off.
- CG: So it’s implementation dependent in practice.
Proposal: accept this PR.
Accepted.
2.3. PR #2223: 2193 fn:parse-xml, fn:doc: Drop security options
See PR #2223
- CG: I asked to have these options added, but on further exploration, there are
a lot of other limits in the JDK. So we would need lots of additional options.
- … We also discussed the possibility of specifying different trust levels, but we dropped that because it was limited to doc and parse-xml. It should be more global.
- CG: So if we’re going to do that, it makes sense to drop these options.
- RD: So are we thinking about something more like the number-format options?
- … In XQuery you can use declare-decimal-format and declare a name.
- … We could have a declare-security-options …
- MK: That’s a leap.
- CG: That could be one way. Some use cases I have in mind are that users can
write their own query and we need to decide before that step that we need to
know what the user should be able to do. Those could be prefixes to the query.
- … There are use cases where option declarations would make sense.
- JLO: We both have global options; they are not in the spec, but we don’t need to have them in the spec unless you want to override them.
- CG: One difficult question is which options will be sensible at all? It depends a lot on the implementation: Java, JavaScript, C, etc. What can the implementor actually do?
- MK: I’ve got mixed views because I agree these options are inadequate. I’m slightly reluctant to get rid of the only thing we’ve got. The area is important to people and we do get questions about them. My instinct is to accept this proposal but recognize that we have to do something.
- JWL: If we think about this notion of doing a declaration, recall that the changes you can make at that level can only be more strict than the security policy above the query.
- WP: I think MK put his finger on the problem. As I see it, it’s a question of how the control interface can be exposed to the user in a useful way. But if it raises more questions than it answers, then it might be better to have something simple that’s easier to test.
- RD: There are different levels. With Java, there are various flags you can
pass to the parser. In Python, there are various libraries and things with
different options and things like parser hardening. It’s difficult to get a
set of options that are platform/language agnostic.
- … Then there’s the point about where do you pass them. If you want to be really secure, you’d pass them as command line options.
- … Do we want to have the ability to report different exceptions have failed to parse entities.
- NW: I think there’s another dimension: does the user writing the query or stylesheet trust the user? Sometimes I want to impose the constraints at the point of the parse.
- JLO: There are different levels, like NW said, I might want to run a parser or something with limited options.
- CG: Look at #2034 that we eventually decided to close.
- JWL: There’s another PR talking about these levels; putting the possibility of individually restricting on calls. A very good example might be that I’ve got an XSLT stylesheet that uses xsl:evaluate; I need to do that, but I might want to limit specific calls to xsl:evaluate to limit file access, for example.
- MK: Is it worth looking at the #2213.
CG displays PR #2213.
- MK: This attempts to provide a summary of the ways that you can access
external resources.
- … Then it adds a trust level to the context. How much do you trust a particular piece of code?
- … You don’t really want to say allow external entities on one document and not another, what you want to say is that I trust the person requesting these documents.
- … The main level where you want to change the trust level is when you invoke new code: the environment, fn:transform, xsl:evaluate, etc.
- … I’ve tried to limit it to three levels.
- JLO: I think this goes in a good direction. The level names are somewhat arbitrary, I’d prefer things like, “no external access”, or other labels that tell me more about the limitations.
- CG: What I can add is that I like the idea to have a global way to decide what to do. The challenge is to line this up with implementations. In BaseX, we have lots of levels of access to databases, admin permissions, etc. But it’s all BaseX specific. It’s not going to be possible to make this relevant for Saxon, for example. And how would we bring these levels together with our own security management.
- JK: I think from this conversation, what I’m getting is that we need this not just statically but also at runtime. What I want as a user is to make that interface pretty much the same in both case.
Proposal: accept #2223.
Accepted.
2.4. PR #2205: 2190 Drop binary input for parse-csv and parse-json
See PR #2205
- CG: When we discussed this PR last time, we thought it might be best to wait until we’ve revised decode-string.
- NW: Fair point, let’s wait until we’ve done that.
2.5. PR #2251: 323 Add select attribute to xsl:text
See PR #2251
- MK: I resisted for a while, but I’ve been persuaded.
- MK: What I’ve tried to do is make xsl:text and xsl:value-of as similar as
possible.
- … There are a fair few incidental mentions that are no longer relevant.
- … A new section, Creating Text Nodes, introduces both xsl:text and xsl:value-of
- … The only real difference is that whitespace text nodes are preserved in xsl:text
- … (There’s also a difference in 1.0 backwards compatibility mode)
- … Most of the rest of the text is from xsl:value-of
- JWL: Is there any merit in trying to make xsl:attribute similar as well?
- MK: Where they come together is in creating simple content.
- JK: I really like this. The biggest take away is that xsl:value-of is essentially deprecated.
- MK: We haven’t explicitly deprecated it, but I agree I wouldn’t recommend it anymore.
- WP: I also like the proposal. The one thing I’d throw in there is that in the first example we wouldn’t use three instructions like that in a row.
Some discussion of the distinctions with respect to whitespace.
- JLO: Does it make sense to have a separator on comments?
- MK: I don’t see a strong need to change comments just in the interest of orthogonality.
Proposal: Accept this PR.
Accepted.
3. Any other business
None heard.