QT4 CG Meeting 140 Minutes 2025-10-28

Meeting index / QT4CG.org / Dashboard / GH Issues / GH Pull Requests

Table of Contents

Draft Minutes

Summary of new and continuing actions [0/2]

  • [ ] QT4CG-140-01: NW to make sure that the new spec is in the dashboard.
  • [ ] QT4CG-140-02: MK to add a note about dealing with binary in parse-cvs and parse-json

1. Administrivia

1.1. Roll call [8/11]

Regrets: JK

  • [X] David J Birnbaum (DB)
  • [ ] Reece Dunn (RD)
  • [X] Christian Grün (CG)
  • [ ] Joel Kalvesmaki (JK)
  • [X] Michael Kay (MK)
  • [X] Juri Leino (JLO)
  • [X] John Lumley (JWL)
  • [X] Wendell Piez (WP)
  • [X] Ed Porter (EP)
  • [ ] Bethan Tovey-Walsh (BTW)
  • [X] Norm Tovey-Walsh (NW) Scribe. Chair.

1.2. Accept the agenda

Proposal: Accept the agenda.

Accepted.

1.3. Approve minutes of the previous meeting

Proposal: Accept the minutes of the previous meeting.

Accepted.

1.4. Next meeting

The next meeting is planned for 11 November 2025.

We will not meet on 4 November.

1.5. Review of open action items [/]

None

1.6. Group photograph for JWL/JLO presentation

JWL has asked if he might take a screenshot of us all to show at the tutorial he and JLO are giving at Declarative Amsterdam.

Done!

1.7. Review of open action items [0/0]

None recorded.

1.8. Review of open pull requests and issues

This section summarizes all of the issues and pull requests that need to be resolved before we can finish. See Technical Agenda below for the focus of this meeting.

1.8.1. Blocked

The following PRs are open but have merge conflicts or comments which suggest they aren’t ready for action.

  • PR #2256: 2216 All atomic types become ordered
  • PR #2160: 2073 data model changes for JNodes and Sequences
  • PR #2124: 573 Functions to Construct Trees
  • PR #2120: 2007 Revised design for xsl:array
  • PR #2071: 77c deep update
  • PR #2019: 1776: XSLT template rules for maps and array

1.8.2. Merge without discussion

The following PRs are editorial, small, or otherwise appeared to be uncontroversial when the agenda was prepared. The chairs propose that these can be merged without discussion. If you think discussion is necessary, please say so.

  • PR #2262: 2258 Correct the statement that split is the inverse of join

Proposal: merge without discussion.

Accepted.

2. Technical agenda

2.1. PR #2120: 2007 Revised design for xsl:array

See PR #2120

JWL/JLO would like an update on the status of this PR as it’s reflected in their proposed tutorial about the 4.0 specifications at Declarative Amsterdam.

  • MK: This is awaiting further work following the last review.
    • … We have a spec reorg pending that makes it messy to work on it.
    • … I’m not entirely sure what to do about it.
  • JWL: I’m going to use it anyway on the basis of that PR.
    • … The problem of making an array with sequences inside it isn’t trivial.
  • MK: That’s why I’m stuck!
  • JWL I’ve got a projection of it into the 3.0 in my workbench
  • JWL: I’ll make it clear that it’s a work in progress.

2.2. PR #2264: 2214 Separate streaming into its own spec

See PR #2264

Reminder: see https://qt4cg.org/pr/streaming/ for the formatted “dashboard” view.

  • MK: How do people want to review it.
    • … The immediate motivation was to get streaming off the critical path if we need it to be
    • … I’m hoping that’s not the case, but it’s good to have options. I don’t think that much needs to be done with streaming.
    • … I think both specs are easier to read.
  • MK: There’s almost no new or deleted text other than notes and cross references to make the narrative make sense.
    • … I ended up with a fairly substantial reorganization of the streaming specification.
  • NW: I’m happy to accept that you’ve done a competent editorial job and we’ll see more of the details as we work through the respective drafts.
  • JWL: I started reading the streaming one, it’s nice at the beginning that you have principles of streaming. There’s a narrative before you get to the real problems. The one thing I didn’t find, in the main spec, is there a concept at the top saying there are other bits, like serialization and streaming, that may impact you but aren’t here.
  • MK: I tried to do that. There’s still a section in the XSLT concepts section about it.
    • … There’s plenty of room for refining that sort of thing.

Proposal: accept this PR.

Accepted.

ACTION QT4CG-140-01: NW to make sure that the new spec is in the dashboard.

2.3. PR #2222: 2217 bin:decode-string: Input encoding

See PR #2222

  • CG: The term “Unicode encoding” ehas been changed to “UTF encoding” in F&O.
    • … Most of the changes are in the binary module.
  • CG: Last week we talked about unparsed text; I basically adopted the decisions we made then.
  • NW: The binary module is zero based. Huh.
  • JLO: So there is no way to get to the BOM?
  • CG: Yes.
  • JLO: Can’t you just say the encoding is different and then get the BOM?
  • CG: If there’s a BOM at the zero position, it’s ignored. That’s what unparsed text does. Only if you have another offset then it’s not interpreted.
  • MK: You can get the BOM with a different function that doesn’t decode it. If you just try to get the part of the string starting at 0, it’s only if you decode it that you can’t get the BOM.

Some agreement.

  • CG: I also opened an issue for a function to infer the encoding, #2250.
    • … My thinking is that the ICU library gives you a confidence for each encoding. So perhaps you could return multiple results with a confidence value. But that’s a different issue.

Proposal: Accept this PR.

Accepted.

2.4. PR #2213: 2047 External resources and security

See PR #2213

Based on the outcome of disucssions about #2222, perhaps progress can be made here?

  • MK: I’ve now expanded this based on feedback from CG and others.
    • … We need to address the concerns somehow.
    • … XQuery begins with a discussion of the subject, 2.3 External Resources and Security.

MK walks through the proposed new text.

  • MK: In F&O, if we look at load-xquery-module, for example, we find it has a “trusted” option with the default set to false().
    • … If we look at doc, we find a “trusted” option again with appropriate semantics.
    • … That carries through to other functions.
  • CG: The file module could be added to the list.
  • MK: Yes, we should say something more explicit about that.
  • CG: If untrusted is false by default, then how many existing projects will now raise errors? Maybe it would be better to keep it true().
  • MK: As long as we provide a way to provide a way to override it at top level, that’s probably good enough.
    • … But products are in charge of their own API design; you can have a way to change that.
    • … But at the spec level, I think we should make it secure by default.
  • CG: Then shouldn’t we make everything untrusted?
  • MK: I think the question of whether the code at the top level is trusted is a question of API design.
  • CG: Having two flags, one that controls functions and another for the top level, that could be confusing.
    • … It probably depends on the use cases. Having everything trusted might be one use case, but loaded query modules will be untrusted.
    • … It needs more thought.
  • JWL: If we take the example where we’re going to run fn:transform, you could say I don’t want this thing executing any fn:doc functions. We could put that on the call to the transform. Is that the case?
  • MK: Yes, you could say that call is not trusted.
  • JWL: What about a case where I don’t want the code to be doing fn:doc() but I’m happy to get them.
  • MK: You could do that by passing parameters.
  • JWL: Yes, I guess.
    • … Can you read your own trust level?
  • MK: I’m not sure if you should be able to.
  • JLO: I’m all for secure-by-default. Having a way to override that per-call makes sense.
    • … Even though I don’t have access to the callers context, I can still be passed any values. That way I can send specific resources to the untrusted query.

Some discussion of the use cases for which that works; it doesn’t work if you want to crawl the web.

  • JLO: What about having a sandbox that lets you access the filesystem or the web or something else.
  • MK: This is intended to be abstract enough to support that kind of API.
    • … In untrusted, you could for example, say that access to the database is allowed but not access to the web.
  • JLO: What about a function?
  • MK: I think that gets too complicated.
  • JLO: I guess I need to think about this some more too.

Some discussion of implementation defined mechanisms that might be used.

  • WP: I like the direction. It is a hard problem. The biggest issue I have so far is that false() doesn’t mean here mean “false” it means “false except when the system says true”. I wonder if there could be an in-between setting, “as configured” that means use configurations. That frees up false() to mean actually false.

Some discussion of the previous three-valued approach.

  • WP: As a user, if I set the flag to false() I can’t prohibit what the system allows. The false() value doesn’t mean false.
  • WP: What about import and include?
  • MK: Yes, I think they have to be controlled the same way.
  • WP: That’s one of the concerns. There’s also the use case where a user is using a library and there’s in-between code. It’s not just Saxon, it’s the library I’m using that uses Saxon. Are they doing the right thing? Generally, not.
  • WP: You want implementation level control, but I think the user needs to be aware of that.
  • CG: Have you checked html-doc?
  • MK: I think I chickened out on that. It’s just doing unparsed binary and passing it to the parse HTML function.
    • … But I’m not sure the HTML parsers give us any control anyway.

Some discussion of HTML security issues; usually related to rendering.

We’ll continue the discussion in two weeks.

2.5. PR #2205: 2190 Drop binary input for parse-csv and parse-json

See PR #2205

Based on the outcome of discussions about #2222, perhaps progress can be made here?

  • MK: If you have CSV and JSON, you can now use decode-string to parse it and pass it to those functions. There’s no reason to combine decoding and parsing because they’re unrelated, unlike XML.
  • CG: I still have a slight preference to make it easier for users, but I understand that it makes sense to keep the functionality separated.
  • JLO: I think this separation is much better.
    • … I think parse-csv and parse-json shouldn’t do any decoding, but the csv-doc and json-doc still need to do it.
  • MK: Those are composite functions.
  • JWL: I wonder if it’s worth putting a note on parse-csv and parse-json to tell users a technique for dealing with binary input.

ACTION QT4CG-140-02: MK to add a note about dealing with binary in parse-cvs and parse-json

MK displays the changes.

Proposal: accept this PR

Accepted.

2.6. PR #2259: 938 Canonical serialization

See PR #2259

Skip until JK is present.

3. Any other business

None heard.