QT4 CG Meeting 140 Minutes 2025-10-28
Meeting index / QT4CG.org / Dashboard / GH Issues / GH Pull Requests
Table of Contents
- Draft Minutes
- Summary of new and continuing actions
[0/2] - 1. Administrivia
- 2. Technical agenda
- 2.1. PR #2120: 2007 Revised design for xsl:array
- 2.2. PR #2264: 2214 Separate streaming into its own spec
- 2.3. PR #2222: 2217 bin:decode-string: Input encoding
- 2.4. PR #2213: 2047 External resources and security
- 2.5. PR #2205: 2190 Drop binary input for parse-csv and parse-json
- 2.6. PR #2259: 938 Canonical serialization
- 3. Any other business
Draft Minutes
Summary of new and continuing actions [0/2]
[ ]QT4CG-140-01: NW to make sure that the new spec is in the dashboard.[ ]QT4CG-140-02: MK to add a note about dealing with binary in parse-cvs and parse-json
1. Administrivia
1.1. Roll call [8/11]
Regrets: JK
[X]David J Birnbaum (DB)[ ]Reece Dunn (RD)[X]Christian Grün (CG)[ ]Joel Kalvesmaki (JK)[X]Michael Kay (MK)[X]Juri Leino (JLO)[X]John Lumley (JWL)[X]Wendell Piez (WP)[X]Ed Porter (EP)[ ]Bethan Tovey-Walsh (BTW)[X]Norm Tovey-Walsh (NW) Scribe. Chair.
1.2. Accept the agenda
Proposal: Accept the agenda.
Accepted.
1.3. Approve minutes of the previous meeting
Proposal: Accept the minutes of the previous meeting.
Accepted.
1.4. Next meeting
The next meeting is planned for 11 November 2025.
We will not meet on 4 November.
1.5. Review of open action items [/]
None
1.6. Group photograph for JWL/JLO presentation
JWL has asked if he might take a screenshot of us all to show at the tutorial he and JLO are giving at Declarative Amsterdam.
Done!
1.7. Review of open action items [0/0]
None recorded.
1.8. Review of open pull requests and issues
This section summarizes all of the issues and pull requests that need to be resolved before we can finish. See Technical Agenda below for the focus of this meeting.
1.8.1. Blocked
The following PRs are open but have merge conflicts or comments which suggest they aren’t ready for action.
1.8.2. Merge without discussion
The following PRs are editorial, small, or otherwise appeared to be uncontroversial when the agenda was prepared. The chairs propose that these can be merged without discussion. If you think discussion is necessary, please say so.
- PR #2262: 2258 Correct the statement that split is the inverse of join
Proposal: merge without discussion.
Accepted.
2. Technical agenda
2.1. PR #2120: 2007 Revised design for xsl:array
See PR #2120
JWL/JLO would like an update on the status of this PR as it’s reflected in their proposed tutorial about the 4.0 specifications at Declarative Amsterdam.
- MK: This is awaiting further work following the last review.
- … We have a spec reorg pending that makes it messy to work on it.
- … I’m not entirely sure what to do about it.
- JWL: I’m going to use it anyway on the basis of that PR.
- … The problem of making an array with sequences inside it isn’t trivial.
- MK: That’s why I’m stuck!
- JWL I’ve got a projection of it into the 3.0 in my workbench
- JWL: I’ll make it clear that it’s a work in progress.
2.2. PR #2264: 2214 Separate streaming into its own spec
See PR #2264
Reminder: see https://qt4cg.org/pr/streaming/ for the formatted “dashboard” view.
- MK: How do people want to review it.
- … The immediate motivation was to get streaming off the critical path if we need it to be
- … I’m hoping that’s not the case, but it’s good to have options. I don’t think that much needs to be done with streaming.
- … I think both specs are easier to read.
- MK: There’s almost no new or deleted text other than notes and cross
references to make the narrative make sense.
- … I ended up with a fairly substantial reorganization of the streaming specification.
- NW: I’m happy to accept that you’ve done a competent editorial job and we’ll see more of the details as we work through the respective drafts.
- JWL: I started reading the streaming one, it’s nice at the beginning that you have principles of streaming. There’s a narrative before you get to the real problems. The one thing I didn’t find, in the main spec, is there a concept at the top saying there are other bits, like serialization and streaming, that may impact you but aren’t here.
- MK: I tried to do that. There’s still a section in the XSLT concepts section
about it.
- … There’s plenty of room for refining that sort of thing.
Proposal: accept this PR.
Accepted.
ACTION QT4CG-140-01: NW to make sure that the new spec is in the dashboard.
2.3. PR #2222: 2217 bin:decode-string: Input encoding
See PR #2222
- CG: The term “Unicode encoding” ehas been changed to “UTF encoding” in F&O.
- … Most of the changes are in the binary module.
- CG: Last week we talked about unparsed text; I basically adopted the decisions
we made then.
- …
- NW: The binary module is zero based. Huh.
- JLO: So there is no way to get to the BOM?
- CG: Yes.
- JLO: Can’t you just say the encoding is different and then get the BOM?
- CG: If there’s a BOM at the zero position, it’s ignored. That’s what unparsed text does. Only if you have another offset then it’s not interpreted.
- MK: You can get the BOM with a different function that doesn’t decode it. If you just try to get the part of the string starting at 0, it’s only if you decode it that you can’t get the BOM.
Some agreement.
- CG: I also opened an issue for a function to infer the encoding, #2250.
- … My thinking is that the ICU library gives you a confidence for each encoding. So perhaps you could return multiple results with a confidence value. But that’s a different issue.
Proposal: Accept this PR.
Accepted.
2.4. PR #2213: 2047 External resources and security
See PR #2213
Based on the outcome of disucssions about #2222, perhaps progress can be made here?
- MK: I’ve now expanded this based on feedback from CG and others.
- … We need to address the concerns somehow.
- … XQuery begins with a discussion of the subject, 2.3 External Resources and Security.
MK walks through the proposed new text.
- MK: In F&O, if we look at
load-xquery-module, for example, we find it has a “trusted” option with the default set tofalse().- … If we look at
doc, we find a “trusted” option again with appropriate semantics. - … That carries through to other functions.
- … If we look at
- CG: The file module could be added to the list.
- MK: Yes, we should say something more explicit about that.
- CG: If
untrustedis false by default, then how many existing projects will now raise errors? Maybe it would be better to keep ittrue(). - MK: As long as we provide a way to provide a way to override it at top level,
that’s probably good enough.
- … But products are in charge of their own API design; you can have a way to change that.
- … But at the spec level, I think we should make it secure by default.
- CG: Then shouldn’t we make everything untrusted?
- MK: I think the question of whether the code at the top level is trusted is a question of API design.
- CG: Having two flags, one that controls functions and another for the top
level, that could be confusing.
- … It probably depends on the use cases. Having everything trusted might be one use case, but loaded query modules will be untrusted.
- … It needs more thought.
- JWL: If we take the example where we’re going to run
fn:transform, you could say I don’t want this thing executing anyfn:docfunctions. We could put that on the call to the transform. Is that the case? - MK: Yes, you could say that call is not trusted.
- JWL: What about a case where I don’t want the code to be doing
fn:doc()but I’m happy to get them. - MK: You could do that by passing parameters.
- JWL: Yes, I guess.
- … Can you read your own trust level?
- MK: I’m not sure if you should be able to.
- JLO: I’m all for secure-by-default. Having a way to override that per-call makes sense.
- … Even though I don’t have access to the callers context, I can still be passed any values. That way I can send specific resources to the untrusted query.
Some discussion of the use cases for which that works; it doesn’t work if you want to crawl the web.
- JLO: What about having a sandbox that lets you access the filesystem or the web or something else.
- MK: This is intended to be abstract enough to support that kind of API.
- … In untrusted, you could for example, say that access to the database is allowed but not access to the web.
- JLO: What about a function?
- MK: I think that gets too complicated.
- JLO: I guess I need to think about this some more too.
Some discussion of implementation defined mechanisms that might be used.
- WP: I like the direction. It is a hard problem. The biggest issue I have so
far is that
false()doesn’t mean here mean “false” it means “false except when the system says true”. I wonder if there could be an in-between setting, “as configured” that means use configurations. That frees upfalse()to mean actually false.
Some discussion of the previous three-valued approach.
- WP: As a user, if I set the flag to
false()I can’t prohibit what the system allows. Thefalse()value doesn’t mean false. - WP: What about import and include?
- MK: Yes, I think they have to be controlled the same way.
- WP: That’s one of the concerns. There’s also the use case where a user is using a library and there’s in-between code. It’s not just Saxon, it’s the library I’m using that uses Saxon. Are they doing the right thing? Generally, not.
- WP: You want implementation level control, but I think the user needs to be aware of that.
- CG: Have you checked html-doc?
- MK: I think I chickened out on that. It’s just doing unparsed binary and
passing it to the parse HTML function.
- … But I’m not sure the HTML parsers give us any control anyway.
Some discussion of HTML security issues; usually related to rendering.
We’ll continue the discussion in two weeks.
2.5. PR #2205: 2190 Drop binary input for parse-csv and parse-json
See PR #2205
Based on the outcome of discussions about #2222, perhaps progress can be made here?
- MK: If you have CSV and JSON, you can now use
decode-stringto parse it and pass it to those functions. There’s no reason to combine decoding and parsing because they’re unrelated, unlike XML. - CG: I still have a slight preference to make it easier for users, but I understand that it makes sense to keep the functionality separated.
- JLO: I think this separation is much better.
- … I think parse-csv and parse-json shouldn’t do any decoding, but the csv-doc and json-doc still need to do it.
- MK: Those are composite functions.
- JWL: I wonder if it’s worth putting a note on parse-csv and parse-json to tell users a technique for dealing with binary input.
ACTION QT4CG-140-02: MK to add a note about dealing with binary in parse-cvs and parse-json
MK displays the changes.
Proposal: accept this PR
Accepted.
2.6. PR #2259: 938 Canonical serialization
See PR #2259
Skip until JK is present.
3. Any other business
None heard.