QT4 CG Meeting 017 Minutes 2023-01-10

Table of Contents

Minutes

Approved at meeting 018 on 17 January 2023.

Summary of new and continuing actions [0/17]

  • [ ] QT4CG-002-10: BTW to coordinate some ideas about improving diversity in the group
  • [ ] QT4CG-015-02: NW to improve the width of the diagrams, perhaps multiple views
  • [ ] QT4CG-015-03: NW to make sure the direction of the arrow is in the legend
  • [ ] QT4CG-015-04: NW to investigate of a dynamic presentation is practical
  • [ ] QT4CG-016-02: NW to add an ed-note indicating when it was approved.
  • [ ] QT4CG-016-03: RD to add a note clarifying “known character encoding”
  • [ ] QT4CG-016-04: RD to add a note clarifying the “*”/”*” html/version combination
  • [ ] QT4CG-016-05: RD to add a “todo” noting the dependency on keyword arguments
  • [ ] QT4CG-016-06: RD to reword the introduction to mapping to clarify who’s doing the mapping
  • [ ] QT4CG-016-07: NW to make an issue about the problems of document-uri uniqueness
  • [ ] QT4CG-016-08: RD to clarify how namespace comparisons are performed.
  • [ ] QT4CG-016-09: RD to add a note stating that the local name should always be lowercase
  • [ ] QT4CG-016-10: RD to consider how to clarify parsed entity parsing.
  • [ ] QT4CG-017-01: MK to clarify the characters numbers in points 1 and 2 of fn:char
  • [ ] QT4CG-017-02: MK to change the order of points to 3, 4, 1, 2 in fn:char
  • [ ] QT4CG-017-03: SF to follow up on whether or not the browser has all the Unicode and emoji names.
  • [ ] QT4CG-017-04: MK to revise PR #284 to include an optional else ‘else’

1. Administrivia

1.1. Roll call [10/14]

Regrets: BTW

  • [ ] Anthony (Tony) Bufort (AB)
  • [X] Reece Dunn (RD)
  • [X] Sasha Firsov (SF) [:15-]
  • [X] Christian Grün (CG)
  • [X] Joel Kalvesmaki (JK) [:12-]
  • [X] Michael Kay (MK)
  • [X] John Lumley (JL)
  • [X] Dimitre Novatchev (DN)
  • [X] Ed Porter (EP)
  • [ ] Liam Quin (LQ)
  • [ ] Adam Retter
  • [X] C. M. Sperberg-McQueen (MSM)
  • [ ] Bethan Tovey-Walsh (BTW)
  • [X] Norm Tovey-Walsh (NW). Scribe. Chair.

1.2. Accept the agenda

Happy New Year!

Proposal: Accept the agenda.

Accepted.

1.3. Approve minutes of the previous meeting

Proposal: Accept the minutes of the previous meeting.

Accepted.

1.4. Next meeting

The next meeting is scheduled for Tuesday, 17 January 2023.

No regrets heard.

1.5. Review of open action items [1/14]

  • [ ] QT4CG-002-10: BTW to coordinate some ideas about improving diversity in the group
  • [ ] QT4CG-015-02: NW to improve the width of the diagrams, perhaps multiple views
  • [ ] QT4CG-015-03: NW to make sure the direction of the arrow is in the legend
  • [ ] QT4CG-015-04: NW to investigate of a dynamic presentation is practical
  • [X] QT4CG-016-01: DN to provide prose with more of the details for #281 Completed, reopened #299
  • [ ] QT4CG-016-02: NW to add an ed-note indicating when it was approved.
  • [ ] QT4CG-016-03: RD to add a note clarifying “known character encoding”
  • [ ] QT4CG-016-04: RD to add a note clarifying the “*”/”*” html/version combination
  • [ ] QT4CG-016-05: RD to add a “todo” noting the dependency on keyword arguments
  • [ ] QT4CG-016-06: RD to reword the introduction to mapping to clarify who’s doing the mapping
  • [ ] QT4CG-016-07: NW to make an issue about the problems of document-uri uniqueness
  • [ ] QT4CG-016-08: RD to clarify how namespace comparisons are performed.
  • [ ] QT4CG-016-09: RD to add a note stating that the local name should always be lowercase
  • [ ] QT4CG-016-10: RD to consider how to clarify parsed entity parsing.

2. Technical Agenda

2.1. Issue #281, reopened as #299

We had some discussion of #281 previously, but no resolution. Discussion of this item is contingent on action QT4CG-016-01.

  • DN: I would like to thank MK and CG for very valuable feedback. I think we probably should not try to have the discussion here. There needs to be some more feedback before we can really discuss it.
    • … MK wanted to include partial evaluation of structured objects. Would be good to get some feedkback from RD on this issue.
  • NW: Ok, we’ll leave this open for more feedback.

2.2. Review pull request #259: parse-html (issue #74)

See pull request #259

The proposal was reviewed in meeting 016. Discussion is expected to continue.

  • MK: I’d like to report on the test suite.
    • … Someone pointed me to the HTML5 test suite which was too big to use in practice. I took a sample of 1,300 test cases out of it, chosen so that they have different tag structure. I effectively did a majority vote on those between three supposed implementations of the HTML5 parsing algorithm: JSoup and Validator.nu in Java and AngleSharp in C#. After tweaking to set options on how they deal with comments and such, they deliver the same results in about 1,200 cases.
    • … The reference results I constructed from JSoup. Then I’ve got two implementations one using Validator.nu and AngleSharp.
    • … Down to about 30 cases for each product that need to be resolved.
  • MSM: You said you know which one is right in some cases, does it seem possible to induce the products producing the wrong results to do the right thing?
  • MK: I’m using the browser to arbitrate. If I get different results, I assume the browser is right.
  • RD: In the cases where they differ, how much does that matter in terms of conformance?
  • MK: I think we have to live with the fact that there will be some variation across products. Remember that this test suite was designed to test edge cases.
  • RD: Are we going to ignore the edge cases?
  • MK: I’d recommend that we try to get the test suite to a point where there’s a single correct result for each test. If an implementation knows it fails a test, it can document that failure and exclude the test.

Some discussion of a “reference implementation”. The W3C doesn’t typically have a reference implementation, instead it publishes test results. There’s a slight difficulty here because the HTML5 spec is moving, but that’s true of the Unicode spec as well so we’ve learned to cope.

Proposal: Accept the PR.

Accepted.

2.3. Review pull request #261: fn:char (issue #121)

See pull request #261

  • MK reviews #261
    • … The backslash variant is limited to “n”, and “r”, and “t”
  • RD: A point on the missing bibliographic reference: my PR #259 adds a bibiliographic reference for HTML5: The Living Standard.
  • MSM: In item 3 we allow implementations to recognize other names. I wonder if we want to allow an imaginary DTD aware processor to allow any general entity name.
  • RD: I don’t think we have a mechanism for bringing the DTD entities into scope.
  • SF: That means it would be reference to implemented entities in the environment.
  • MSM: I don’t see a lot of support, so nevermind. :-)
  • MK: I wanted to make it slightly extensible because HTML is a living standard.
  • RD: What about the Unicode names?
  • MK: I think the database of Unicode names is just too large.
  • RD: But if you need the Unicode regex classes, those have the Unicode names.
  • SF: What about entities defined in the document?
  • MK: This isn’t suggesting that you should get them any DTD, it’s limited to the ones in the standard.
    • … If you’re in XSLT and you want to use entity references, then you don’t need this function to refer directly to entities in the stylesheet’s DTD.
  • MSM: In an XSLT document, or an XQuery, I don’t want entities that are declared in the document I’m working on to be in-scope. The entities that should be in scope are the ones in the stylesheet.
    • … If I’m going to use entity syntax (or something entity adjacent like this function) in an XPath expression or XQuery, then I don’t want to pick up entity names from entities that are declared for the document I’m processing. The scoping rules are wrong.
  • DN: Maybe I don’t understand, isn’t it possible for an entity to expand to multiple characters? And what about emojis? What is a character.
  • MK: That’s a good point. I think that the HTML5 ones are all single characters.
  • NW: Maybe today but what about tomorrow?
  • RD: I know there are several that are multiple UTF-16 code points. I can’t remember if any of them are multi-character.
  • NW: Is anything lost if we say this returns a string?
  • MK: No, I don’t think so. It raises the question of whether a single call should be able name two characters (e.g., “\r\n”)
  • RD: I think “#xnnn” is potentially misleading because there can be up to six numbers.

ACTION QT4CG-017-01: MK to clarify the characters numbers in points 1 and 2 of fn:char

Some discussion of how emoji fit in.

  • MSM: If we were to allowe any Unicode names then we could use the Unicode names for those emoji, but my instinct is that people who are working with this function library who need to refer to an emoji will have a hex number for it.
  • NW: I have very mixed feelings about the Unicode names. I want them but I agree with MK about not shipping the whole database everywhere.
  • MSM: If we wanted to allow that we could rewrite rule 3 to allow more flexibility.
  • JK: When I first read this, I didn’t understand why I needed it from points 1 and 2. I think it would make sense to change the order so that points 3 and 4 come first.

ACTION QT4CG-017-02: MK to change the order of points to 3, 4, 1, 2 in fn:char

  • MSM: Did I hear correctly, SF, that browsers have all the Unicode and emoji names built in?
  • RD: I think they should do to the extent that they make use of various Unicode libraries like ICU.
  • MK: I don’t think there’s anything in ICU that gives you access to characters by name.

ACTION QT4CG-017-03: SF to follow up on whether or not the browser has all the Unicode and emoji names.

Proposal: accept the PR

Accepted.

2.4. Review pull request #284: Grammar for if-then w/o else

See pull request #284

Some discussion of the background, ideas from CG, RD, DN, and MK at least.

  • MK reviews 284
    • … If you want to do a conditional without an else branch, you write an enclosed expression in curly braces.
  • CG: Thank you, MK, for the proposal. Could we allow ‘else’ after the closing curly brace?

Some concern that this re-introduces else-ambiguity. Looking at CG’s examples in #284 clarifies that ‘else’ attachment is, after all, unambiguous.

NW expresses confusion about why we want to have ‘else’ when we started talking about how to avoid the ‘else’.

  • MSM: I think the reasoning for needing this is that I’ve got an elseless condition and I decide that I need to add an else. I wrote it with braces, so it’ll feel simpler if I don’t have to undo the braces and add the keyword. For people who want elseless-ifs also going to want do without the ‘else’ keyword?
  • RD: Being able to omit the ‘else’ is useful because you often end up with lots of else ().

Some discussion of whether or not there’s ambiguity in the grammar. We don’t have anyone doing rigorous analysis.

ACTION QT4CG-017-04: MK to revise PR #284 to include an optional else ‘else’

  • SF: Perhaps we should try to compare our grammar to other languages common in our ecosystem like Java, Typescript, etc.

3. Any other business