@qt4cg statuses in 2026
This page displays status updates about the QT4 CG project from 2026.
See also recent statuses.
Issue #2424 closed #closed-2424
More Explorer tweaks
Issue #2425 created #created-2425
Permanent diffs for PRs
Since we link to pull requests in the spec and in test cases, I wonder whether it would be possible to publish a permanent diff showing the effect of each PR?
Essentially, the idea would be to take the HTML diff as we currently publish it, and reduce it to those sections of the specs that actually contain changes.
I would find this very useful, for example, when a PR has been accepted but is still marked with "Tests needed" - it's not easy at present to see retrospectively what tests might be required. It would also be useful, of course, when implementing the PR. But I think that all readers of the specs might find this beneficial.
Pull request #2424 created #created-2424
More Explorer tweaks
Building on the awesome work from @ndw I just tweaked the grammar explorer layout a little more
- layout works better on bigger and smaller screens
- consistent navigation between all screens with back button alwasy on the navigation at the top
- consistent sizes, paddings, colors set by CSS variables
- output as html5 which fixes small issues with whitespace in inline elements
- additional, minor layout improvmements
Before
After
Pull request #2423 created #created-2423
2421 document XSLT incompatibility with simplified stylesheets
Fix #2421
Issue #2422 created #created-2422
XSLT: drop 3.11 Embedded Stylesheet Modules
XSLT Section 3.11 describes embedded stylesheet modules - a stylesheet rooted at an element node which is not the outermost element of a document. There are no real conformance requirements associated with this feature and it isn't widely used. I proposed we drop the section, while retaining the statement in §3.5 that a stylesheet module can be "all or part" of an XML document.
Issue #2421 created #created-2421
XSLT edge case incompatibility with simplified stylesheet
Simplified stylesheets have changed so that the implicit template rule now does match="." rather than match="/".
This creates a theoretical incompatibility when
(a) the stylesheet is invoked supplying a node other than a document node as the input. It will now execute the (only) template rule, previously it would execute the built-in template for the node kind
(b) the simplified stylesheet module is included/imported into another stylesheet. This is a highly unlikely scenario, but it is tested by test case include-0601.
The incompatibility should be documented.
Issue #2420 closed #closed-2420
Explorer tweaks
Pull request #2420 created #created-2420
Explorer tweaks
h/t @line-o
Plus a few other tweaks.
Pull request #2419 created #created-2419
2292 XSLT document() function: options parameter
Fix #2292
Issue #2414 closed #closed-2414
Diff markup issues
Pull request #2418 created #created-2418
2399b Add rules and advice for JSON output of special numerics
Fix #2399
Issue #2417 closed #closed-2417
2399 Add rules/advice for JSON output of special xs:double values
Pull request #2417 created #created-2417
2399 Add rules/advice for JSON output of special xs:double values
Fix #2399
Pull request #2416 created #created-2416
2406 Add fn:parts-of-dateTime and fn:build-dateTime functions
Fix #2406
Issue #2415 closed #closed-2415
Publish the grammar explorer pages
Pull request #2415 created #created-2415
Publish the grammar explorer pages
Issue #2414 created #created-2414
Diff markup issues
There seem to be two consistent errors in the diff markup that appears in PRs on the dashboard:
- When an inline
<code>element is modified, the diff version shows the old code as deleted (red background), but does not show the new code.
For example:
- When a grammar entry is modified, the text gets duplicated.
For example:
Pull request #2413 created #created-2413
2365 Drop extensible record types
This PR drops the concept of extensible record types, replacing it with a rule that coercion to a record type drops any map entries that are not defined by the record type. In effect this means that a record type used when declaring a function parameter is implicitly extensible.
The benefits of the proposal are:
- It simplifies the spec, especially rules on type subsumption and on generation of implicit constructor functions
- It avoids the need to declare pairs of record types, one extensible and one not.
- It avoids all the awkward decisions about whether record types used in core functions should be extensible or not.
The rules for type patterns in XSLT are changed to invoke coercion.
Fix #1484 Fix #2365
Pull request #2412 created #created-2412
2395 2396 Add missing "new in 4.0" entries
Fix #2395 Fix #2396
Pull request #2411 created #created-2411
2397 add to F&O list of functions defined in XSLT
Fix #2397
Pull request #2410 created #created-2410
2398 Fix fn:highest to match fn:lowest
Fix #2398
Pull request #2409 created #created-2409
2407 Change function to fn in type-of output
Fix #2407
Issue #2195 closed #closed-2195
Editorial notes (incremental)
Issue #2408 created #created-2408
Editorial notes (incremental)
This issue summarizes the unresolved comments from #2195:
- [x] Oxford/serial comma should be used consistently at numerous places. Candidates:
- sine, cosine and tangent
- durations, dates and times
- year, month, day, hour, minute, second and timezone
- The month, day, hour and minute components
- the tokens w, W and Ww
- rules for overflow, underflow and approximation
- [ ] The comma may need to be removed at other places:
- This function is context-independent, and focus-independent.
- A dynamic error is raised [err:FODC0002] if a relative URI reference is supplied, and the base-URI property in the static context is absent.
- [x]
fn:distinct-values:$coll→$collation - [x] “If … is an empty sequence” vs. “If … is the empty sequence” (which one do we prefer?)
- MHK: I can't say I have a strong preference, but pedantically, the is probably more accurate.
- [ ] Section 2.1.3 Values of the XPath 4.0 spec includes a change note highlighting that the terms XNode and JNode have been introduced but the section text only mentions XNode; there's no reference to JNode in that section other than in the change note itself.
- [ ] In the XPath 4.0 spec, the
publocURL appears to be incorrect (/spec/header/publoc/loc) - [x] A number of examples in the function catalog should be annotated with
spec="XQuery"so that the corresponding test cases are marked as inapplicable to XPath. Specifically:
fo-test-fn-count-001
fo-test-fn-deep-equal-005
fo-test-fn-every-010
fo-test-fn-function-annotations-002
fo-test-fn-function-annotations-003
fo-test-fn-hash-009
fo-test-fn-hash-010
fo-test-fn-serialize-004
fo-test-fn-sort-with-005
- [ ] The serialization spec, in the section on the serialization parameter document, makes it rather hard to discover that the namespace prefix
outputis bound to the URIhttp://www.w3.org/2010/xslt-xquery-serialization - [x] F+O The definition of fn:compare refers to a function
fn:months-from-dateTime, this should befn:month-from-dateTime - [ ] F+O The description of unparsed-text refers to the available text resources component of the dynamic context which has been dropped.
- [ ] The serialization spec for Adaptive serialization of function items gives an example
**fn:exists#1** is serialized as **function fn:exists#1**but the word function does not actually appear in the result. - [ ] fn:parse-html needs to define an error code for use when $encoding is an unknown or invalid encoding.
- [ ] Consistent rendition for term definitions. Most of the specs output definitions as
[Definition: here is the definition]. F+O uses[Definition] here is the definition. XSLT puts the keyword "Definition" in small caps. - [x] The function catalog contains entries for functions such as
op:gMonthDay-equalthat are no longer referenced. - [ ] Serialization, HTML5, Processing Instructions: Dashes in the name must be escaped as well. Example:
<?a---b c---d?>should be serialized as<!--?a- - -b c- - -d?-->.. - [ ]
UTF-16le→UTF-16LE,UTF-16be→UTF-16BE(caused by #2239) - [ ]
fn:unparsed-text: Referencebin:infer-encoding, drop redundant rules - [x]
op:divide-dayTimeDuration-by-dayTimeDuration: An example could be simplified by usingseconds(1)
Issue #2407 created #created-2407
`fn:type-of`: function vs fn
As we use the fn alias in (almost) all XQFO signatures, we could also return it by the fn:type-of function.
Issue #2355 closed #closed-2355
bin:infer-encoding error conditions
Issue #2362 closed #closed-2362
2355 bin:infer-encoding: further alignments
QT4 CG meeting 150 draft minutes #minutes-01-27
Draft minutes published.
Issue #2361 closed #closed-2361
Encoding parameters: upper/lower case, normalization
Issue #2394 closed #closed-2394
2361 Use upper case for encoding names; comparisons are case-blind
Issue #2349 closed #closed-2349
Revert `array:join`
Issue #2363 closed #closed-2363
2349 Revert array:join
Issue #2378 closed #closed-2378
HTML indenting
Issue #2391 closed #closed-2391
2378 HTML indenting: clarify the definition of inline elements
Issue #1944 closed #closed-1944
Try/Catch/Finally - order of evaluation
Issue #2127 closed #closed-2127
JNodes: Include atomic items
Issue #2159 closed #closed-2159
JNodes: Learning from JSONiq?
Issue #2351 closed #closed-2351
Current Drafts: What will we keep, what may be dropped?
Issue #2354 closed #closed-2354
`fn:append`
Issue #2360 closed #closed-2360
fn:root() vs. absolute path expressions
Issue #2384 closed #closed-2384
`fn:xsd-validator` - attribute nodes
Issue #2392 closed #closed-2392
2384 Clarify that fn:xsd-validator can validate attributes
Issue #2406 created #created-2406
Rounding dates/times and durations
The precision returned for dates, times, and durations is implementation-defined, and it has changed between Saxon releases. This leads one of our users to point out that there is no easy way to request a reduced precision (e.g. milliseconds) in order to ensure interoperability. The simplest approach we can offer seems to be current-dateTime() => format-dateTime("....") => xs:dateTime() which is pretty cumbersome and inefficient.
Rather than providing specific functions for rounding dates, times, and durations, the most versatile solution to this might be to provide functions that reduce a dateTime or duration to a record containing the numeric values of the components, allowing these to be manipulated as numbers, with a further function to reconstruct the dateTime or duration from the record: rather like the parse-uri()/build-uri() pair. Something like:
parts(current-dateTime()) ! map:put(., 'seconds', round(?seconds, 3)) ! build-dateTime()
Issue #2405 closed #closed-2405
The published XML for XPath and XQuery is incorrect
Pull request #2405 created #created-2405
The published XML for XPath and XQuery is incorrect
It’s not the specification XML, it’s the pre-fixed-up HTML as XML. h/t to @martian-a for noticing first!
Pull request #2404 created #created-2404
2403 Enhancements to fos.xsd
Schema enhancements to the function catalog for
Issue #2403 - allow non-testable results for examples to be labelled narrative="true" Issue #2177 - allow a fos:see-also element to make links to related functions
Note this PR is purely an enabler, it does not include changes to the function catalog to exploit this features, nor stylesheet enhancements to render them.
QT4 CG meeting 150 draft agenda #agenda-01-27
Draft agenda published.
Issue #2402 closed #closed-2402
This PR should fail to build
Issue #2403 created #created-2403
Testable examples in the file spec
In PR #2401 Norm introduced a temporary fix needed because the function catalog in the EXPath file spec doesn't conform to the fos.xsd schema.
I think we can fix this without a schema or stylesheet change by using existing mechanisms illustrated by this example from fn:collation:
<fos:test>
<fos:expression>collation({ 'lang': 'de', 'strength': 'primary' })</fos:expression>
<fos:result>"http://www.w3.org/2013/collation/UCA?lang=de;strength=primary"</fos:result>
<fos:test-assertion>
<result xmlns="http://www.w3.org/2010/09/qt-fots-catalog">
<any-of>
<assert-string-value>http://www.w3.org/2013/collation/UCA?lang=de;strength=primary</assert-string-value>
<assert-string-value>http://www.w3.org/2013/collation/UCA?strength=primary;lang=de;</assert-string-value>
</any-of>
</result>
</fos:test-assertion>
<fos:postamble>The order of query parameters may vary.</fos:postamble>
</fos:test>
The fos:result (or fos:error-result) element must always be present, and will always be rendered in the spec as the expected result. If the example will not always deliver this result, then <fos:test-assertion> can appear to give the result as it will appear in the generated test case.
But I suggest we add another attribute <fos:result narrative="true"/> to indicate that the result is given as explanatory prose, not as a testable XPath expression. This would allow another fn:collation example
<fos:example>
<p>The expression <code>collation({ 'lang': default-language() })</code>
returns a collation suitable for the default language in the
dynamic context.</p>
</fos:example>
to be rewritten as
<fos:example>
<fos:test>
<fos:expression>collation({ 'lang': default-language() })</fos:expression>
<fos:result narrative="true">A collation suitable for the default language in the
dynamic context.</fos:result>
<fos:test-assertion>
<result xmlns="http://www.w3.org/2010/09/qt-fots-catalog"><assert>true()</assert></result>
</fos:test-assertion>
</fos:test>
</fos:example>
which will (a) make it easier to fit the results into the tabular presentation of examples, and (b) cause a test case to be generated which will ensure that the example is syntactically valid.
(Or we could leave out <fos:test-assertion> in this example. If the supplied fos:result has narrative="true" and there is no test assertion, the generated test case can assume <assert>true()</assert>)
Pull request #2402 created #created-2402
This PR should fail to build
We're never going to merge this, it's just a CI test.
Issue #2379 closed #closed-2379
Use exported schema to validate function catalogs
Issue #1948 closed #closed-1948
fn:element-to-map: Tests
Issue #2401 closed #closed-2401
Stopgap fix to get the status quo drafts built
Pull request #2401 created #created-2401
Stopgap fix to get the status quo drafts built
The content model of fos:test requires an fos:result or fos:error-result. For the EXPath File module, we have to work out what those should be or change the markup or change the schema.
In the short term, I’ve made bogus fos:results of FIXME:
Issue #2400 closed #closed-2400
Irrelevant whitespace change to nudge CI
Pull request #2400 created #created-2400
Irrelevant whitespace change to nudge CI
Issue #2399 created #created-2399
Canonical JSON Serialization: edge cases
RFC 8785 says:
Note: Since Not a Number (NaN) and Infinity are not permitted in JSON, occurrences of NaN or Infinity MUST cause a compliant JCS implementation to terminate with an appropriate error.
We have just decided to treat these cases more liberally, but I think we should continue to raise an error if canonical serialization is requested.
In our serialization spec, we also say:
Implementations may serialize an xs:double value using any lexical representation of a JSON number defined in [RFC 7159], but it is recommended to use the same representation as when the canonical parameter is true.
We may need to exclude the edge cases from this recommendation.
If we keep the recommendation, we may need to fix the test case Serialization-json-11, which expects -0 instead of 0 (what is returned for RFC8785).
Issue #2398 created #created-2398
fn:highest documentation in F&O spec not up to date?
The description for the new fn:highest function does not align with the description for the fn:lowest function; contrary to what I'd expect. In particular, the rules section makes no mention of the $key argument; so I suspect this is not up to date?
Issue #2397 created #created-2397
Additions for "Functions Defined in XSLT" section in F&O spec
The new XSLT 4.0 functions current-merge-key-array and regex-groups are missing from the "Functions Defined in XSLT" section.
Also I believe the function unparsed-text-available should be added saying "Originally XSLT 2.0; then XPath 3.0 and later".
Issue #2396 created #created-2396
Missing "New in 4.0" labels for functions in F&O Spec
Please add changes entries to say "New in 4.0" for the functions: function-identity and jnode-content.
Also I assume the first change entry for function-annotations should actually say "New in 4.0" (rather than be a duplicate of the second change entry).
Issue #2395 created #created-2395
The new fn:regex-groups function is not labelled "New in 4.0"
Please add a changes entry in the spec for the new XSLT 4.0 regex-groups function.
Pull request #2394 created #created-2394
2361 Use upper case for encoding names; comparisons are case-blind
Standardizes on upper case for encoding names, and mentions that comparisons are case-blind.
Fix #2361
Issue #2393 created #created-2393
Keep or drop `array:members` and `array:of-members`?
Adopted from #2351:
We have recently dropped map:pairs and map:of-pairs. With array:members, the members of arrays are returned as single-entry maps, which may confuse users. Thus, for the sake of reducing redundant functionality, do we want to keep array:members and array:of-members, or rather promote the use of for member $m and array:split/array:join instead?
If we keep the functions, we should add a dedicated record type for record(value).
Pull request #2392 created #created-2392
2384 Clarify that fn:xsd-validator can validate attributes
Fix #2384
Pull request #2391 created #created-2391
2378 HTML indenting: clarify the definition of inline elements
Fix #2378
Issue #2390 created #created-2390
methods and inheritance
Don’t panic, i am not suggesting a large change :)
But i would like to suggest a small change to the semantics of method calls, with the goal of third parties being able to build something much larger.
Today, we have,
A method call combines accessing a map M to look up an entry whose value is a function item F, and calling the function item F supplying the map M as the implicit value of the first argument.
I’d like to add,
If there is no such function, but there is in the map a key fn:fallback whose value is a function, then that function is called with the map, the function name, and the arity of the desired function as arguments.
In this way one could write a function that looked for an "isa" entry in the map whose value was a sequence of "class" maps, and find the function.
It’s limited in that there is no possibility of polymorphic functions, but we do not have those elsewhere in the language.
One practical benefit is that you can have a map with all your functions in it, and “instance maps’ then do not need to have, say, 40 entries for all the methods that can be called. In an application in which a map gets updated a million times (I do have one of those), adding 40 extra entries to copy is a significant burden, even though of course it’s encapsulated in a single function.
Issue #2344 closed #closed-2344
HTML Serialization: Processing Instructions
Issue #2372 closed #closed-2372
2344 Change rendition of PIs in HTML5
QT4 CG meeting 149 draft minutes #minutes-01-20
Draft minutes published.
Issue #2359 closed #closed-2359
Implicit conversion to JNodes with absolute path expressions
Issue #2373 closed #closed-2373
2359 No conversion to JNode in absolute paths
Issue #2337 closed #closed-2337
XSLT xsl:mode/@typed attribute
Issue #2376 closed #closed-2376
2337 Extend xsl:mode/@typed to handle JNodes etc
Issue #2387 closed #closed-2387
641 NaN/Infinity in JSON
Issue #2088 closed #closed-2088
File Module: Feedback, Observations
Issue #2364 closed #closed-2364
2088 File Module: Feedback, Observations
Issue #2185 closed #closed-2185
Request for an `fn:xproc` function
Issue #2383 closed #closed-2383
Attempt to resolve action QT4CG-148-01
Issue #2389 created #created-2389
Adaptive Serialization: more freedom?
The adaptive serialization method was introduced “for the purposes of debugging query results”. For our processor, it has turned out pretty soon that it does not satisfy the requirements of our users, which is why we have introduced a custom debugging method.
I wonder what others think: Shouldn’t we relax several of the rules and let the implementation decide what to output? We haven’t defined either how the output of fn:trace needs to look like.
Some examples:
- The output of doubles often causes confusion. If parsed JSON is output, small integers will be output in exponential notation. For example,
parse-json('{ "A" : 20 }')needs to be output as{ "A": 2.0e1 }. xs:date("2001-01-01")is output asxs:date("2001-01-01"), whilexs:token('x')is output as"x".fn() { 1 }is output as(anonymous-function)#0, whereas an implementation could prefer to use the output offn:function-identity(see #2388), or output the original query string (if available), reproduce a string representation of the function body, etc.
I will be glad to create a PR.
Issue #2388 created #created-2388
Adaptive Serialization: function items
The serialization specs defines rules for creating a string representation for function items. Now that we have `fn:function-identity', we should replace the rules and use this string instead.
QT4 CG meeting 149 draft agenda #agenda-01-20
Draft agenda published.
Issue #2386 closed #closed-2386
Add namespace declaration to environment for generated tests
Pull request #2387 created #created-2387
641 NaN/Infinity in JSON
Addresses part of issue #641
In the JSON serialization method, NaN is output as null, and infinity is output as ±1e9999.
The parse-json function adds recommendations on how to achieve round-tripping of these values.
Pull request #2386 created #created-2386
Add namespace declaration to environment for generated tests
Changes the stylesheet for generating keyword and function signature tests so that the namespace prefix "output" is explicitly declared in the test environment. This prefix is used in one of the tests and it needs to be declared if the test is to work in XPath.
Issue #2385 created #created-2385
The XML version of the XPath spec isn't the XML version of the spec, it's HTML
Probably XQuery too. I have no idea why.
Issue #2384 created #created-2384
`fn:xsd-validator` - attribute nodes
The type signature of the xsd-validator function suggests that it can be used to validate attribute nodes (as well as documents and elements), and the prose description concurs with this. However the function summary says "can be invoked to validate a document or element node against this schema."
Test case xsd-validator-092 expects an attribute node to be rejected.
Also the Notes in the specification say
The validation process is explained in more detail in the XQuery ([[XQuery 4.0: An XML Query Language]] section [4.25 Validate Expressions] and XSLT ([[XSL Transformations (XSLT) Version 4.0]] section [25.4 Validation]
but the detailed description has since been moved to F&O 17.2.4.
Note that XSLT has always allowed validation of free-standing attribute nodes, but the validate expression in XQuery allows only document and element nodes.
Pull request #2383 created #created-2383
Attempt to resolve action QT4CG-148-01
Per #2315:
- Added ‘at-risk’ changes to the fn:insert-separator, array:members, and array:of-members
- Added ‘at-risk’ changes to the XPath/XQuery section on map and array filtering
- Added a note about what ‘at risk’ means to the status sections
Issue #2382 closed #closed-2382
Tool changes for action QT4CG-148-01
Pull request #2382 created #created-2382
Tool changes for action QT4CG-148-01
These should have no effect without additional commits, but they have to be merged into main in order to have a visible effect on my subsequent PR.
Issue #573 closed #closed-573
Node construction functions
Issue #2124 closed #closed-2124
573 Functions to Construct Trees
QT4 CG meeting 148 draft minutes #minutes-01-13
Draft minutes published.
Issue #2357 closed #closed-2357
element() vs element(*) in function signatures
Issue #2358 closed #closed-2358
2357 Standardize on element() rather than element(*)
Issue #2367 closed #closed-2367
Documentation for new main-module attribute of xsl:stylesheet
Issue #2366 closed #closed-2366
json-lines attribute for xsl:output and xl:result-document in XSLT spec
Issue #2356 closed #closed-2356
Clarification on scope of variables in xsl:for-each-group/(@split-when|@merge-when)
Issue #2368 closed #closed-2368
2367 Misc XSLT editorial fixes
Issue #2369 closed #closed-2369
F+O section 11 is empty
Issue #2371 closed #closed-2371
2369 Add content for F&O section 11 (Processing binary values)
Issue #2375 closed #closed-2375
2195 Editorial Omnibus
Issue #1591 closed #closed-1591
Implausible filter expressions
Issue #1934 closed #closed-1934
Supporting RELAX NG validation
Issue #2377 closed #closed-2377
2195 F+O Editorial Corrections
Issue #2381 created #created-2381
Add facility to serialize binary values as url-safe base64 encoded strings
In XQuery 3.1 there are two XDM types to represent binary values xs:base64binary and xs:hexBinary.
The current draft adds binary literals as an additional option.
Thus, it is possible to base64 encode any value by casting a xs:base64binary to a xs:string.
xs:base64Binary("+w==") => xs:string()
There is no standard way to serialize those binary values to the URL safe variant of that encoding described in section 5 of RFC 4648.
The simplest workaround is replacing the unsafe characters of the alphabet (+ and /) and dropping the padding at the end with
xs:base64Binary("+w==") => translate("+/=", "-_")
This of course will only work for relatively small binary values. In order for processors to offer a performant and efficient way I see several options.
- adding new type
xs:base64BinaryUrlSafewhose string representation uses the adapted alphabet with-and_and does not add padding at the end - a new function in fn namespace
fn:encode-base64-url-safe($data as (xs:string | xs:base64Binary | xs:hexBinary)) as xs:string - a new function in bin namespace
bin:encode-base64-url-safe($data as (xs:string | xs:base64Binary | xs:hexBinary)) as xs:string - add an output option that will serialize all binary values to base64 url-safe when cast to strings
Addendum
I am also wondering why binary values cannot be created from numeric literals. Especially now that we have the binary notation for integer literals and the xs:integer type is unbounded this would be a perfectly fine literal notation to create binary values from. At least as suitable as string literals that are currently allowed.
xs:hexBinary(0xfb) and xs:base64Binary(0b11111111)
Issue #2380 created #created-2380
Use Case for Generators: News Feeds Aggregation Using Generators
In response to:
QT4CG-147-02: NW to chase up DN and LQ about follow-up to the generator discussion
Use Case: News Feeds Aggregation Using Generators
Contents
Use Case: News Feeds Aggregation Using Generators
- Actors
- Goals
- Functional Requirements
- Constraints / Assumptions / Preconditions
- Proposed High-Level Solution
- Known Approaches that are Problematic
- Benefits of the Generators Approach
- End-to-End Flow
- Brief Description of the Core Processes in the Pipeline
- Notes on the Process Pipeline
- Why This Fits the Generator Datatype Extremely Well
- Alternative Flows
- Alternative Flow-1: A Feed Temporarily Stops Producing New Items
- Alternative Flow-2: Partial Consumption of the Pipeline
- Alternative Flow-3: Editor Inserts or Reorders Items 11
- Exception Flows
- Exception Flow-1: Feed Unreachable or Network Failure
- Exception Flow-2: Malformed Feed Data
- Exception Flow-3: Resource Exhaustion Risk
- Postconditions
- References
The Problem
Modern RSS/JSON aggregators must process hundreds of continuously updating feeds without excessive memory usage or latency, while supporting filtering, merging, and prioritization in real time.
Actors
- End-User
- Editor
- Administrator
- System components (internal processes acting as secondary actors)
- External services (RSS providers, APIs, social signals)
Goals
-
End-User
“As a user, I want to get the latest, up-to-the-minute news from many important sources. I want each brief news item to be presented with a link to more detailed information from the original source.” -
Editor
“As an editor, I want to be alerted to any change in the aggregated news-stream, as it happens continuously, and to have powerful ways of inserting, reordering, appending, prepending or deleting one or more news-items.” -
Administrator
“As an administrator, I want to start, stop, or restart the system, manage the configured feeds, and monitor operational health and error conditions.”
Functional Requirements
- Consume RSS / Atom / JSON-LD feeds incrementally
- Filter items by topic or sensitivity
- Merge multiple feeds chronologically
- Produce continuously updated summaries
Constraints / Assumptions / Preconditions
Assumptions
- Feeds may be large or unbounded
- Items arrive over time
Constraint
- Memory usage must remain bounded
Preconditions
- At least one news feed is configured
- Feeds are RSS or JSON-LD and timestamped
- Items within a feed are presented in reverse-chronological order
- Each item contains a content-link or optionally - inline content
- Items may belong to multiple categories
Proposed High-Level Solution
Each feed is modeled as a generator producing yield values lazily.
The ordered set of values produced by successive, demand-driven calls to move-next() is called the yield of the generator.
A generator’s yield may be finite or infinite, and may be empty for a given generator instance without implying exhaustion of the underlying data source.
Known Approaches That Are Problematic
These approaches require full materialization in memory:
- Eager sequences (XPath)
- DOM-style loading
- Materialized feeds
Benefits of the Generators Approach
- Bounded memory usage
- Low latency
- Composability
- Deterministic control of evaluation
End-to-End Flow
+-------------------------------+
| 1. Feed Fetching |
| Input: external providers |
| Output: G_rawItems |
+---------------+---------------+
|
+---------------v---------------+
| 2. Normalization |
| Input: G_rawItems |
| Output: G_normalizedItems |
+---------------+---------------+
|
+---------------v---------------+
| 3. Filtering | <-- unwanted content removed
| Input: G_normalizedItems |
| Output: G_filteredItems |
+---------------+---------------+
|
+---------------v---------------+
| 4. Topic Classification |
| Input: G_filteredItems |
| Output: G_classifiedItems |
+---------------+---------------+
|
+---------------v---------------+
| 5. Clustering |
| Input: G_classifiedItems |
| Output: G_clusteredItems |
+---------------+---------------+
|
+---------------v---------------+
| 6. Ranking |
| Input: G_clusteredItems |
| Output: G_rankedItems |
+---------------+---------------+
|
+---------------v---------------+
| 7. Summary Page Generation |
| Input: G_rankedItems |
| Output: G_summaryPageItems, |
| HTML |
+---------------+---------------+
|
+---------------v---------------+
| 8. Detail Page Generation |
| Input: G_summaryPageItems |
| Output: HTML Detail Pages |
+-------------------------------+
Remarks
- The participating generator instances are named using the convention
G_{name}. - Every stage except the final one produces a new generator.
- Every stage except the very first uses a generator as its input.
- Arrow semantics: the output generator of one stage is the input for the next stage.
Brief Description of the Core Processes in the Pipeline
Process 1 — Feed Fetching & Acquisition
Goal:
Continuously pull RSS / Atom / JSON-LD feeds from CNN, Fox, NBC, BBC, etc.
Includes:
- Periodic polling (e.g., every 5 minutes)
- Detection of new items (GUID, URL hash, published timestamps)
- N-way merging to ensure the resulting yield is sorted in reverse-chronological order
- Basic sanity validation (e.g., XML schema validity)
Output:
A generator whose yield values are raw feed items (XML / JSON documents) → input to Process 2.
Process 2 — Parsing & Normalization
Goal:
Convert heterogeneous raw feed items into a uniform internal format.
Normalized fields include:
- Title
- Description / Summary
- Full text (if available)
- URL
- Publication time (converted to UTC)
- Source
- Images, categories, tags
- Named entities (optional NLP-based enrichment)
Output:
A generator yielding clean, normalized NewsItem documents → input to Process 3.
Process 3 — Content Filtering & Exclusion Rules
Goal:
Remove unwanted items early using configurable rule sets.
Examples:
- Blocked topics: politics, celebrity gossip, violence, etc.
- Blocked entities: Donald Trump, Joe Biden, Kanye West, etc.
- Blocked publishers (optional)
- Expiration rules:
- Tech news stale after 48 hours
- Breaking news stale after 6 hours
Techniques:
- Keyword filtering
- Named Entity Recognition (NER)
- Sensitive-topic classifiers (ML-based)
- Freshness scoring
Output:
A generator yielding allowed, filtered NewsItem documents → input to Process 4.
Rejected items are stored separately for auditing.
Process 4 — Topic Classification
Goal:
Assign each item to one or more topics.
Example topics:
- Politics
- World
- Tech
- Health
- Sports
- Business
- Disasters / Urgent events
- Crime / Safety
- Entertainment
Approaches:
- Fine-tuned BERT classifier (preferred)
- TF-IDF + SVM (simpler)
- Feed-provided category tags (fallback)
Output:
A generator yielding categorized NewsItem documents → input to Process 5.
Process 5 — Similarity Analysis & Clustering
Goal:
Group news items from different sources describing the same event.
Techniques:
- Semantic vector embeddings (e.g., SBERT, Ada embeddings)
- Cosine similarity
- Hierarchical clustering or DBSCAN
Produces:
- Clusters of highly similar articles
- A primary (best) representative per cluster
Output:
A generator yielding clusters of related articles → input to Process 6.
Note:
To better match streaming behavior, clustering may operate within bounded windows (e.g., sliding windows) while still consuming the input generator.
Process 6 — Ranking, Urgency, and Freshness Scoring
Goal:
Prioritize which news appears on the Summary Page.
Computed scores:
- Freshness score (more recent → higher)
- Urgency score (disasters, crises, violence)
- Coverage score (number of sources reporting)
- Engagement score (optional: social signals)
Weighted formula:
FinalScore = a*Urgency + b*Freshness + c*Coverage + d*EditorRules
Items with the highest scores per topic are selected.
This stage does not require a full total ordering; instead a partial ordering (e.g., top-K per topic) preserves bounded memory.
Editor-driven operations (insert, remove, reorder) are modeled as generator transformations applied downstream of ranking.
Output:
A generator yielding ranked clusters → input to Process 7.
Process 7 — Summary Page Generation
This stage consumes the input generator and produces finite views intended for presentation.
Goal:
Build a continuously updated Summary Page (“Front Page”) containing:
- Top events per topic
- Short summaries
- Links to primary articles
- “Read similar news” (cluster siblings)
- Source icons
- Timestamp of most recent update
The page auto-refreshes and always reflects the newest items.
Process 8 — Detailed Pages & Cross-Links
This stage consumes its input generator and produces finite presentation views.
For each cluster:
- Canonical article (primary representative)
- Related articles across sources
- Timeline of developments
- Additional metadata (images, entities, tags)
Cross-links include:
- “More like this…”
- “Earlier developments…”
- “Follow-up stories…”
Notes on the Process Pipeline
- Feed Fetching typically wraps one or more data providers
→ producesG_rawItemslazily (RSS, JSON APIs, DB cursors, web services) - Every stage is expressible as:
for-each,filter,append,prepend,insert-at,remove-where,concat, orfold, etc., producing a new generator derived from the previous one
- No stage requires full materialization unless explicitly demanded
(e.g.,to-array, bounded sort, pagination) - Infinite generators are valid until stage 6; stages 7–8 typically consume finite prefixes (
take(n))
Why This Fits the Generator Datatype Extremely Well
- The pipeline is a composition of generator transformers
- Each box maps almost 1-to-1 to generator operations
- External data providers integrate naturally at Stage 1
- Sorting can be introduced in different ways:
- External merge-sort over generators
- Bounded-window ranking
- Top-K lazy ranking – e.g. using heaps.
Alternative Flows
Alternative Flow 1 — Feed Temporarily Stops Producing New Items
Condition:
A feed is reachable but has no new items since the last polling cycle.
Flow:
- The feed generator advances (
move-next()). - The data provider returns no new items.
- The feed-generator instance yields no items during this interval.
- Downstream generators remain operational.
- If all feeds are empty, no new items are added downstream.
Result:
The pipeline continues uninterrupted; no special handling is required.
Alternative Flow 2 — Partial Consumption of the Pipeline
Condition:
Only a finite prefix of the stream is required (e.g., top N items).
Flow:
- Downstream consumers apply
take(N). - Upstream generators are evaluated only as needed.
- Remaining potential yield values are never materialized.
Result:
Latency and memory usage remain bounded. The pipeline supports early termination naturally.
Alternative Flow 3 — Editor Inserts or Reorders Items
Condition:
An editor manually modifies the aggregated stream.
Flow:
- Editor operations are applied as generator transformations
(append,prepend,insert-at,remove-at,remove-where). - A new generator with the modified yield is produced.
- Downstream stages consume it transparently.
Result:
Editorial control integrates seamlessly without breaking the pipeline.
Exception Flows
Exception Flow 1 — Feed Unreachable or Network Failure
Condition:
A feed cannot be reached during polling.
Flow:
- The data provider reports an error or timeout.
- The next instance of the feed generator yields no items during this polling interval.
- The error is logged for monitoring.
- A retry policy (e.g., exponential backoff) is applied.
Result:
The system continues operating with remaining feeds.
Exception Flow 2 — Malformed Feed Data
Condition:
A feed item is malformed (invalid XML/JSON or schema validation problems, e.g. missing required fields).
Flow:
- The normalization stage detects the issue.
- The item is discarded or quarantined.
- Processing continues with subsequent items.
Result:
Malformed data does not propagate downstream.
Exception Flow 3 — Resource Exhaustion Risk
Condition:
A downstream operation risks exceeding memory limits.
Flow:
- Bounded strategies (windowing, top-K selection) are applied.
- Full materialization is avoided.
- If needed, the operation degrades gracefully (e.g., reduced clustering depth).
Result:
System stability is preserved under load.
Postconditions
Upon successful execution:
Functional Outcomes
- End users see an up-to-date Summary Page.
- Each summary item links to a Detailed Page.
- Editors can intervene using generator operations.
- Administrators retain full system control.
Technical Guarantees
- Memory usage remains bounded.
- Latency is minimized through lazy evaluation.
- Full materialization occurs only when explicitly requested.
System State
- All generators remain composable.
- Generator composition remains valid after alternative and exceptional flows.
- Empty generators correctly represent exhaustion.
- Infinite yields are supported up to stages that require finiteness.
References
-
RSS 2.0 Specification
https://www.rssboard.org/rss-specification -
Atom Publishing Protocol (RFC 5023)
https://www.rfc-editor.org/rfc/rfc5023 -
JSON-LD Specification
https://json-ld.org/spec/ -
TF-IDF, “Understanding TF-IDF (Term Frequency-Inverse Document Frequency)”, https://www.geeksforgeeks.org/machine-learning/understanding-tf-idf-term-frequency-inverse-document-frequency/
-
TF-IDF + SVM, “Strengthening Fake News Detection: Leveraging SVM and Sophisticated Text Vectorization Techniques. Defying BERT?”, https://arxiv.org/html/2411.12703v1
-
Sentence-BERT (SBERT)
Reimers, N. & Gurevych, I., 2019
https://arxiv.org/abs/1908.10084 -
Fine-tuned BERT, “Fine-tuning a BERT model”, https://www.tensorflow.org/tfmodels/nlp/fine_tune_bert
-
Ada Embeddings (OpenAI)
Radford et al., 2021
https://arxiv.org/abs/2103.00020 -
Cosine Similarity
https://en.wikipedia.org/wiki/Cosine_similarity -
Hierarchical Clustering
https://en.wikipedia.org/wiki/Hierarchical_clustering -
DBSCAN
Ester et al., 1996
https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf
Pull request #2379 created #created-2379
Use exported schema to validate function catalogs
This PR adds an exported SCM version of the fos.xsd schema with an embedded license. Updates to the build script use it to validate all function-catalog.xml files.
Pro: validation
Con: any change to the fos.xsd also has to be accompanied by an update to the exported schema, which probably, only Mike or I can do.
Issue #2378 created #created-2378
HTML indenting
The spec says (in both 3.1 and 4.0):
The inline elements are those included in the %inline category of any of the HTML 4.01 DTDs or those elements defined to be phrasing elements in HTML5
This could be read as defining one set of inline elements for version="4.01" and a different set of inline elements for version="5.0", or it could be read as indicating that an element is an inline element if it satisfies either of these two conditions.
Issue #2370 closed #closed-2370
Add character-maps to the allowed context dependencies
Issue #2374 closed #closed-2374
Markup: Allow empty nt elements
Pull request #2377 created #created-2377
2195 F+O Editorial Corrections
F+O: editorial corrections to issues identified in #2195 (mainly corrections to examples), plus completion of missing change metadata.
Pull request #2376 created #created-2376
2337 Extend xsl:mode/@typed to handle JNodes etc
Fix #2337
Pull request #2375 created #created-2375
2195 Editorial Omnibus
Fixes a number of problems from issue #2195.
Pull request #2374 created #created-2374
Markup: Allow empty nt elements
Stylesheet change to allow empty NT elements, bringing them into line with other referencing elements such as termref and xnt. The markup <nt def="AxisStep"/> is treated as equivalent to <nt def="AxisStep">AxisStep</nt>. This removes a common source of error which tends to result in missing text in the spec rather than in any kind of build error.
Pull request #2373 created #created-2373
2359 No conversion to JNode in absolute paths
Fix #2359
Pull request #2372 created #created-2372
2344 Change rendition of PIs in HTML5
Fix #2344
Pull request #2371 created #created-2371
2369 Add content for F&O section 11 (Processing binary values)
Fix #2369
Pull request #2370 created #created-2370
Add character-maps to the allowed context dependencies
Currently function-catalog.xml is invalid against the schema fos.xsd. This PR updates the schema to allow "character-maps" in the enumeration of allowed context dependencies.
(Note, this should probably cause the build to fail. The problem was only spotted when using Oxygen to query the function catalog.)
Issue #2369 created #created-2369
F+O section 11 is empty
F+O section 11, Processing Binary Values, is currently empty
Has something gone wrong, or should we delete the section?
Pull request #2368 created #created-2368
2367 Misc XSLT editorial fixes
Most of the changes here are to bring the change log entries up to date. Also:
Fix #2356 Fix #2366 Fix #2367
Issue #2367 created #created-2367
Documentation for new main-module attribute of xsl:stylesheet
In the XSLT 4.0 spec, 3.6 Stylesheet Element says:
The optional main-module attribute is purely documentary. By including this attribute in every stylesheet module of a package, an XSLT editing tool may be enabled to locate the top-level module of the relevant package [...]
But what does "top-level module" mean? Should this say "principal stylesheet module" instead? I can see that top-level package is defined, but not "top-level module", so I'm confused.
Issue #2366 created #created-2366
json-lines attribute for xsl:output and xl:result-document in XSLT spec
The new serialization parameter json-lines is documented at 26.2 Serialization parameters. But please add a "changes" entry for the new attribute json-lines in the sections 25.1 Creating Secondary Results and 26.1 The xsl:output declaration in the XSLT 4.0 spec. This is currently missing.
(Note that there is already a changes entry in the Serialization spec at 3 Serialization Parameters.)
Issue #2365 created #created-2365
Record types: extensible and non-extensible pairs
It is often useful in a function signature for an argument type to be an extensible record type (so additional fields are allowed, which the function can ignore, saving the need to check for their presence) while the return type is non-extensible (giving better static type checking for lookup expressions, for example).
Currently this requires two separate named record types to be declared, differing only in that one of them is extensible and the other not. This duplication is clearly undesirable.
One solution to this might be to have a single non-extensible definition of the name of the record type, with some way of indicating at the point where the record type is used that extensions are allowed.
For example (probably not viable syntax as written):
fn:element-to-map-plan(
$input as element()*
) as fn:element-to-map-conversion-plan
fn:element-to-map(
$node as element(),
$plan as extensible fn:element-to-map-conversion-plan
} as map(*)
Perhaps the syntax extensible(fn:element-to-map-conversion-plan) would work.
Pull request #2364 created #created-2364
2088 File Module: Feedback, Observations
Closes #2088
Issue #2250 closed #closed-2250
Function to detect/infer the string encoding from a binary
Issue #2092 closed #closed-2092
Drop map:pair, map:of-pairs, map:pairs, array:members, array:of-members
Issue #2194 closed #closed-2194
fn:transform sandbox=yes option
Pull request #2363 created #created-2363
2349 Revert array:join
Closes #2349
Pull request #2362 created #created-2362
2355 bin:infer-encoding: further alignments
Closes #2355
Issue #2361 created #created-2361
Encoding parameters: upper/lower case, normalization
The serializer spec says…
Serializer are required to support values of
UTF-8andUTF-16
…whereas the XQFO spec mentions utf-8 as default value for fn:serialize. Similarly, only the UTF lower-case variants are listed for fn:unparsed-text, and there may be other places.
I think we should mention the upper-case variants everywhere, and add notes that upper/case is ignored when processing the encoding string.
Issue #2360 created #created-2360
fn:root() vs. absolute path expressions
Is there a particular reason why the absolute slash / is defined as complicated as…
self::gnode()/(fn:root(.) treat as (document-node()|jnode())/PP
…and wouldn’t it be helpful to simplify it get rid of the treat as expression?
self::gnode()/fn:root(.)/PP
In many cases, the document node does not exist or is not really needed, and it would allow users to use the slash for nodes that would otherwise needs to wrapped into document nodes, for example:
let $as := analyze-string('abc', 'b')
return $as/fn:match[/fn:non-match]
Issue #2359 created #created-2359
Implicit conversion to JNodes with absolute path expressions
Section 4.7.1 discusses absolute path expressions.
The first part of the section concerns leading "/", and includes the note:
If the context value includes a map or array, it is not converted implicitly to a JNode; rather, a type error occurs.
The second part concerns leading "//", and includes the statement:
Any map or array that is present in the context value is first coerced to a JNode by applying the [fn:jtree] function.
It might be inferred that "/" doesn't do this conversion, but "//" does. However, this certainly isn't stated explicitly, and there would be no logical reason for treating the two cases differently.
We should either do the conversion for both cases, or for neither.
I'm inclined to do it for neither. Partly because an implicit conversion wouldn't do any upwards navigation to a different "root" node, as users might expect; partly because doing the conversion reduces the type information available to the compiler.
QT4 CG meeting 147 draft minutes #minutes-01-06
Draft minutes published.
Issue #407 closed #closed-407
XSLT-specific context properties used in function items
Issue #2274 closed #closed-2274
407 Function items capturing XSLT context components
Issue #1011 closed #closed-1011
fn:transform() improvements
Issue #2348 closed #closed-2348
1011 fn transform improvements
Issue #2339 closed #closed-2339
Default priority of match="element(A|B)"
Issue #2335 closed #closed-2335
Make `jnode()` like `element()`
Issue #2334 closed #closed-2334
XSLT: Parenthesized subexpressions within Patterns
Issue #2297 closed #closed-2297
XSLT pattern ambiguities with typed matches
Issue #2336 closed #closed-2336
2334 Revise XSLT pattern syntax and semantics
Issue #2048 closed #closed-2048
Untrusted execution, and security more generally
QT4 CG meeting 148 draft agenda #agenda-01-13
Draft agenda published.
Pull request #2358 created #created-2358
2357 Standardize on element() rather than element(*)
Fix #2357
Issue #2357 created #created-2357
element() vs element(*) in function signatures
We use element(*) and element() interchangeably in function signatures. I propose we standardise on the simpler form, element().
Ditto attribute().
Issue #2356 created #created-2356
Clarification on scope of variables in xsl:for-each-group/(@split-when|@merge-when)
A user experimenting with xsl:for-each-group/@split-when with my 4->3 source-code transformer, inferred that the variable $group was available within the sequence constructor of the grouping instruction.
(Unfortunately due to an error in my transformer code $group was within scope in the sequence constructor, though with the wrong value ;-) - this has since been corrected.)
A close and detailed reading of the spec shows that $group and $next are implied as only in scope for the evaluation of the @split-when expression. Might I suggest that there is a small note emphasising this is the case? Similar clarification may be worthwhile for @merge-when too.
QT4 CG meeting 147 draft agenda #agenda-01-06
Draft agenda published.