XSLT and XQuery Serialization 4.0

1 Introduction

Changes in 4.0 ⬇

Use the arrows to browse significant changes since the 3.1 version of this specification.
Sections with significant changes are marked Δ in the table of contents.

This document defines serialization of the W3C XQuery and XPath Data Model 4.0 (XDM), which is the data model of at least [XML Path Language (XPath) 4.0], [XSL Transformations (XSLT) Version 4.0], and [XQuery 4.0: An XML Query Language], and any other specifications that reference it.

In this document, examples and material labeled as “Note” are provided for explanatory purposes and are not normative.

Serialization is the process of converting an instance of the [XQuery and XPath Data Model (XDM) 4.0] into a sequence of octets.

[Definition: The XDM value supplied as input to the serializer is referred to as the input value.] Some serialization methods apply only to certain types of input value.

Note:

Where serialization is used to process the result of an XQuery evaluation or an XSLT transformation, the input value of the serializer corresponds to the output from XQuery or XSLT.

[Definition: In general the output of the serializer will represent the items actually present in the input value, together with other items that are reachable from these, for example (in the case of nodes) their descendants. The complete set of items that are represented in the output of the serializer is referred to (without loss of generality) as the input tree.]

1.1 Terminology

Changes in 4.0 ⬇ ⬆

The term atomic value has been replaced by atomic item. [Issue 1337 2 August 2024]

In this specification, where they are rendered in small capitals, the words must, must not, should, should not, may, required, and recommended are to be interpreted as described in [RFC2119].

[Definition: As is indicated in 12 Conformance, conformance criteria for serialization are determined by other specifications that refer to this specification. A serializer is software that implements some or all of the requirements of this specification in accordance with such conformance criteria.] A serializer is not required to directly provide a programming interface that permits a user to set serialization parameters or to provide an input sequence for serialization. In this document, material labeled as "Note" and examples are provided for explanatory purposes and are not normative.

Certain aspects of serialization are described in this specification as implementation-defined or implementation-dependent.

[Definition: Implementation-defined indicates an aspect that may differ between serializers, but whose actual behavior must be specified either by another specification that sets conformance criteria for serialization (see 12 Conformance) or in documentation that accompanies the serializer.]

[Definition: Implementation-dependent indicates an aspect that may differ between serializers, and whose actual behavior is not required to be specified either by another specification that sets conformance criteria for serialization (see 12 Conformance) or in documentation that accompanies the serializer.]

[Definition: In some instances, the input tree cannot be successfully converted into a sequence of octets given the set of serialization parameter (3 Serialization Parameters) values specified. A serialization error is said to occur in such an instance.] In some cases, a serializer is required to raise such an error. What it means to raise a serialization error is determined by the relevant conformance criteria (12 Conformance) to which the serializer conforms. In other cases, there is an implementation-defined choice between raising a serialization error and performing a recovery action. Such a recovery action will allow a serializer to produce a sequence of octets that might not fully reflect the usual requirements of the parameter settings that are in effect.

[Definition: Where this specification indicates that two strings are to be compared without regard to case, the serializermust translate any characters in the range U+0041 (LATIN CAPITAL LETTER A, A) through U+005A (LATIN CAPITAL LETTER Z, Z) inclusive, to the corresponding lower-case letters in the range U+0061 (LATIN SMALL LETTER A, a) through U+007A (LATIN SMALL LETTER Z, z) only for the purposes of making the comparison. The comparison succeeds if the two strings are the same length and the code point of each character in the first string is equal to the code point of the character in the corresponding position in the second string.]

Many terms used in this document are defined in the XPath specification [XML Path Language (XPath) 4.0] or the Data Model specification [XQuery and XPath Data Model (XDM) 4.0]. Particular attention is drawn to the following:

[Definition: The term atomization is defined in Section 2.5.3 Atomization^XP.]
[Definition: The term node is defined as part of [TITLE OF DM40 SPEC, TITLE OF Node SECTION]^DM40. There are seven kinds of nodes in the data model: document, element, attribute, text, namespace, processing instruction, and comment.]
[Definition: The term sequence is defined in Section 2 Basics^XP. A sequence is an ordered collection of zero or more items.]
[Definition: The term function item is defined in Section 7.18.1 Function Items^DM.]
[Definition: The term map item is defined in Section 7.28.2 Map Items^DM.]
[Definition: The term array item is defined in Section 7.38.3 Array Items^DM.]
[Definition: The term string is defined in Section 4.1.5 XML and XSD Versions^DM.]
[Definition: The term character is defined in Section 4.1.5 XML and XSD Versions^DM.]
[Definition: The term codepoint is defined in Section 4.1.5 XML and XSD Versions^DM.]
[Definition: The term string value is defined in Section 6.7.127.5.12 string-value Accessor^DM. Every node has a string value. For example, the string value of an element is the concatenation of the string values of all its descendant text nodes.]
[Definition: The term expanded QName is defined in Section 2 Basics^XP. An expanded QName consists of an optional namespace URI and a local name. An expanded QName also retains its original namespace prefix (if any), to facilitate casting the expanded QName into a string.]
[Definition: An expanded-QName whose namespace part is an empty sequence, or an element or attribute whose name expands to such an expanded-QName, is referred to as having a null namespace URI].
[Definition: An element or attribute that does not have a null namespace URI, is referred to as having a non-null namespace URI].
[Definition: A space character, TAB character, CR character or NL character is referred to as a whitespace character.]

Where this specification indicates that an XSLT instruction is evaluated, the behavior is as specified by [XSL Transformations (XSLT) Version 4.0]. Where it indicates that an XQuery expression is evaluated, the behavior is as specified by [XQuery 4.0: An XML Query Language].

2 Sequence Normalization

The input value is a sequence. Prior to serializing a sequence using any of the output methods whose behavior is specified by this document (3 Serialization Parameters), with the exception of the JSON and Adaptive output methods, the serializermust first compute a normalized sequence for serialization; it is the normalized sequence that is actually serialized. [Definition: The purpose of sequence normalization is to create a sequence that can be serialized as a well-formed XML document or external general parsed entity, that also reflects the content of the input sequence to the extent possible.] [Definition: The result of the sequence normalization process is a result tree.]

The normalized sequence for serialization is constructed by applying all of the following rules in order, with the input value being input to the first step, and the sequence that results from any step being used as input to the subsequent step. For any implementation-defined output method, it is implementation-defined whether this sequence normalization process takes place. For the JSON and Adaptive output methods, sequence normalization must not take place.

Where the process of converting the input sequence to a normalized sequence indicates that a value must be cast to xs:string, that operation is defined in Section 22.1.223.1.2 Casting to xs:string^FO of [XQuery and XPath Functions and Operators 4.0]. Where a step in the sequence normalization process indicates that a node should be copied, the copy is performed in the same way as an XSLT xsl:copy-of instruction that has a validation attribute whose value is preserve and has a select attribute whose effective value is the node, as described in Section 11.9.2 Deep Copy^XT of [XSL Transformations (XSLT) Version 4.0], or equivalently in the same way as an XQuery content expression as described in Step 1e of Section 4.12.1.3 Content^XQ of [XQuery 4.0: An XML Query Language], where the construction mode is preserve. Let S₀ be the sequence that is input to serialization. The steps in computing the normalized sequence are:

Create a new sequence S₁ from S₀ as follows. For each item in S₀, if the item is a JNode, copy the ¶value property of the item; otherwise, copy the item itself.
Create a new sequence S₁ from S₀ as follows. For each item in S₀, if the item is a JNode, copy the ¶value property of the item; otherwise, copy the item itself.
Create a new sequence S₁₂ from S₀₁ as follows. For each item in S₀₁, if the item is an array, copy the results of passing the item into the function array:flatten(); otherwise, copy the item itself. If S₀₁ is empty, let S₁₂ consist of a zero-length string.
Create a new sequence S₂₃ from S₁₂ as follows. For each item in S₁₃, if the item is atomic, copy to S₂₃ only the lexical representation resulting from casting the item to an xs:string, otherwise, copy the item to S₂₃.
Create a new sequence S₃₄ from S₂₃ as follows. If the item-separator serialization parameter is present, then copy each item in S₂₃ to S₃₄, inserting between each pair of items a string whose value is equal to the value of the item-separator parameter. If the item-separator serialization parameter is not present, then first maximally group the items in S₂₃ into subsequences of xs:string items and non-xs:string items. For each group of items, if the group is a subsequence of non-xs:string items, copy the subsequence to S₃₄; if the group is a subsequence of xs:string items, copy to S₃₄ the results of passing to fn:string-join() the subsequence and the value of item-separator as the function’s two parameters.
Create a new sequence S₄₅ from S₃₄ as follows. For each item in S₃₄, if the item is a string, copy to S₄₅ a text node whose string value is equal to the string; otherwise, copy the item to S₄₅.
Create a new sequence S₅₆ from S₄₅ as follows. For each item in S₄₅, if the item is a document node, copy its children to S₅₆; otherwise, copy the item to S₅₆.
Create a new sequence S₆₇ from S₅₆ as follows. First, remove any text nodes with values of zero length from S₅₆, then maximally group the results into groups of text nodes and non-text nodes. For each group of items, if the group is a subsequence of text nodes, copy to S₆₇ a single text node whose value is equal to the concatenated values of the subsequence; if the group is a subsequence of non-text nodes, copy the subsequence of items to S₆₇. It is a serialization error [err:SENR0001] if any item in S₆₇ is an attribute node, a namespace node, or a function.
Create a new sequence S₇₈ from S₆₇ as follows. Let S₇₈ be a single document node. Copy sequence S₆₇ to the document node as its children.

S₇ is the normalized sequence.S₈ is the normalized sequence.S₇₈ is the normalized sequence.

The result tree rooted at the document node that is created by the final step of this sequence normalization process is the value to which the rules of the appropriate output method are applied. If the sequence normalization process results in a serialization error, the serializermust raise the error.

Note:

If the item-separator serialization parameter is absent, the sequence normalization process for a sequence $seq is equivalent to constructing a document node using the XSLT instruction:

<xsl:document>
  <xsl:copy-of select="$seq" validation="preserve"/>
</xsl:document>

or the XQuery expression:

declare construction preserve;

document { $seq }

If the item-separator serialization parameter is present, the sequence normalization process for a sequence $seq is equivalent to constructing a document node using the XSLT instruction:

<xsl:document>
  <xsl:for-each select="$seq">
    <xsl:sequence select="if (position() gt 1) 
                          then $sep 
                          else ()"/>

    <xsl:choose>
      <xsl:when test=". instance of node()">
        <xsl:sequence select="."/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="."/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each>
</xsl:document>

or the XQuery expression:

declare construction preserve; 

document {
  for $item at $pos in $seq
  let $node := 
    if ($item instance of node()) then 
      $item 
    else 
      text { $item }
  return
    if ($pos eq 1) then
      $node
    else
      ($sep, $node)  
}

where the value of the sep variable is a string whose value is equal to the value of the item-separator serialization parameter.

This process results in a serialization error [err:SENR0001] if $seq contains functions, attribute nodes or namespace nodes.

5 XML Output Method

The XML output method serializes the normalized sequence as an XML entity that must satisfy the rules for either a well-formed XML document entity, a well-formed XML external general parsed entity, or both. A serialization error [err:SERE0003] results if the serializer is unable to satisfy those rules, except for content modified by the character expansion phase of serialization, as described in 4 Phases of Serialization. The effects of the character expansion phase could result in the serialized output being not well-formed, but will not result in a serialization error. If a serialization error results, the serializermust raise the error.

If the document node of the normalized sequence has a single element node child and no text node children, then the serialized output is a well-formed XML document entity, and the serialized output must conform to the appropriate version of the XML Namespaces Recommendation [XML Names] or [XML Names 1.1]. If the normalized sequence does not take this form, then the serialized output is a well-formed XML external general parsed entity, which, when referenced within a trivial XML document wrapper like this:

<?xml version="version"?>
<!DOCTYPE doc [
<!ENTITY e SYSTEM "entity-URI">
]>
<doc>&e;</doc>

where entity-URI is a URI for the entity, and the value of the version pseudo-attribute is the value of the version parameter, produces a document which must itself be a well-formed XML document conforming to the corresponding version of the XML Namespaces Recommendation [XML Names] or [XML Names 1.1].

[Definition: A reconstructed tree may be constructed by parsing the XML document and converting it into an document node as specified in [XQuery and XPath Data Model (XDM) 4.0].] The result of serialization must be such that the reconstructed tree is the same as the result tree except for the following permitted differences:

If the document was produced by adding a document wrapper, as described above, then it will contain an extra doc element as the document element.
The order of attribute and namespace nodes in the two trees may be different.
The following properties of corresponding nodes in the two trees may be different:
- the base-uri property of document nodes and element nodes;
- the document-uri and unparsed-entities properties of document nodes;
- the type-name and typed-value properties of element and attribute nodes;
- the nilled property of element nodes;
- the content property of text nodes, due to the effect of the indent and use-character-maps parameters.
The reconstructed treemay contain additional attributes and text nodes resulting from the expansion of default and fixed values in its DTD or schema; also, in the presence of a DTD, non-CDATA attributes may lose whitespace characters as a result of attribute value normalization.
The type annotations of the nodes in the two trees may be different. Type annotations in a result tree are discarded when the tree is serialized. Any new type annotations obtained by parsing the document will depend on whether the serialized XML document is assessed against a schema, and this may result in type annotations that are different from those in the original result tree.
Note:
In order to influence the type annotations in the tree that would result from processing a serialized XML document, the author of the XSLT stylesheet, XQuery expression or other process might wish to create the input tree so that it makes use of mechanisms provided by [XML Schema], such as xsi:type and xsi:schemaLocation attributes. The serialization process will not automatically create such attributes in the serialized document if those attributes were not part of the result tree that is to be serialized.
Similarly, it is possible that an element node in the input tree has the nilled property with the value true, but no xsi:nil attribute. The serialization process will not create such an attribute in the serialized document simply to reflect the value of the property. The value of the nilled property has no direct effect on the serialized result.
Additional namespace nodes may be present in the reconstructed tree if the serialization process did not undeclare one or more namespaces, as described in 5.1.7 XML Output Method: the undeclare-prefixes Parameter, and the input tree contained an element node with a namespace node that declared some prefix, but a child element of that node did not have any namespace node that declared the same prefix.
The result treemay contain namespace nodes that are not present in the reconstructed tree, as the process of creating an instance of the data model may ignore namespace declarations in some circumstances. See Section 6.6.2.37.4.2.3 Construction from an Infoset^DM and Section 6.6.2.47.4.2.4 Construction from a PSVI^DM of [XQuery and XPath Data Model (XDM) 4.0] for additional information.
If the indent parameter has the value true,
- additional text nodes consisting of whitespace characters may be present in the reconstructed tree; and
- text nodes in the result tree that contained only whitespace characters may correspond to text nodes in the reconstructed tree that contain additional whitespace characters that were not present in the result tree
See 5.1.3 XML Output Method: the indent and suppress-indentation Parameters for more information on the indent parameter.
Additional nodes may be present in the reconstructed tree due to the effect of character mapping in the character expansion phase, and the values of attribute nodes and text nodes in the reconstructed treemay be different from those in the result tree, due to the effects of URI expansion, character mapping and Unicode Normalization in the character expansion phase of serialization.
Note:
The use-character-maps parameter can cause arbitrary characters to be inserted into the serialized XML document in an unescaped form, including characters that would be considered to be part of XML markup. Such characters could result in arbitrary new element nodes, attribute nodes, and so on, in the reconstructed tree that results from processing the serialized XML document.

A consequence of this rule is that certain characters must be output as character references, to ensure that they survive the round trip through serialization and parsing. Specifically:

In text nodes, the characters U+000D (CARRIAGE RETURN) , U+0085 (NEXT LINE, NEL) , and U+2028 (LINE SEPARATOR) must be output respectively as "", "", and " ", or their equivalents
In attribute nodes, the characters U+000D (CARRIAGE RETURN) , U+000A (NEWLINE) , U+0009 (TAB) , U+0085 (NEXT LINE, NEL) , and U+2028 (LINE SEPARATOR) must be output respectively as "", "
", "	", "", and " ", or their equivalents.
In both text nodes and attribute nodes, control characters U+0001 (SOH) through U+001F (IS1) and U+007F (DELETE) through U+009F (APC) (except U+0009 (TAB) , U+000A (NEWLINE) , and U+000D (CARRIAGE RETURN) , and U+0085 (NEXT LINE, NEL) ) must be output as character references.

For example, an attribute with the value "x" followed by "y" separated by a newline will result in the output "x
y" (or with any equivalent character reference). The XML output cannot be "x" followed by a literal newline followed by a "y" because after parsing, the attribute value would be "x y" as a consequence of the XML attribute normalization rules.

Note:

XML 1.0 did not permit an XML processor to normalize U+0085 (NEXT LINE, NEL) or U+2028 (LINE SEPARATOR) characters to a U+000A (NEWLINE) character. However, if a document entity that specifies version 1.1 invokes an external general parsed entity with no text declaration or a text declaration that specifies version 1.0, the external parsed entity is processed according to the rules of XML 1.1. For this reason, U+0085 (NEXT LINE, NEL) and U+2028 (LINE SEPARATOR) characters in text and attribute nodes must always be escaped using character references, regardless of the value of the version parameter.

XML 1.0 permitted control characters in the range U+007F (DELETE) through U+009F (APC) to appear as literal characters in an XML document, but XML 1.1 requires such characters, other than U+0085 (NEXT LINE, NEL) , to be escaped as character references. An external general parsed entity with no text declaration or a text declaration that specifies a version pseudo-attribute with value 1.0 that is invoked by an XML 1.1 document entity must follow the rules of XML 1.1. Therefore, the non-whitespace control characters in the ranges U+0001 (SOH) through U+001F (IS1) and U+007F (DELETE) through U+009F (APC) must always be escaped, regardless of the value of the version parameter.

It is a serialization error [err:SEPM0004] to specify the doctype-system parameter, or to specify the standalone parameter with a value other than omit, if the input tree contains text nodes or multiple element nodes as children of the root node. The serializermust either raise the error, or recover by ignoring the request to output a document type declaration or standalone parameter.

9 JSON Output Method

Changes in 4.0 ⬇ ⬆

Added the escape-solidus parameter for JSON serialization. [Issue 530 PR 534 6 June 2023]
Added the json-lines parameter for JSON serialization. [Issue 1471 15 October 2024]
The serialization of maps retains the order of entries. [Issue 1651 PR 1703 14 January 2025]
A JNode is replaced by its ¶value property. [Issue 2025 PR 2031 29 May 2025]
A JNode is replaced by its ¶value property. [Issue 2025 PR 2031 29 May 2025]

The JSON output method serializes the input tree using the JSON syntax defined in [RFC 7159], or (if the json-lines parameter is set to true) the json-lines syntax defined at [JSON Lines]. Sequence normalization is not performed for this output method. The effect of the json-lines parameter is explained at 9.1 JSON Lines.

If json-lines is set to false, then:

If the input value is an empty sequence, it is serialized as the string null.
If the input value is a single item, it is serialized as described below.
If the input value is a sequence containing two or more items, a serialization error results [err:SERE0023].

An individual item is serialized as follows:

A JNode is serialized by serializing its ¶value property.
A JNode is serialized by serializing its ¶value property.
An array item in the input tree is serialized to a JSON array by outputting the serialized JSON value of each member within the array separated by delimiters according to the JSON array syntax, i.e. [member, member, ...]. Each member in the array is to be serialized by recursively applying the rules in this section.
A map item in the input tree is serialized to a JSON object by outputting, for each key/value pair, the string value of the key to a JSON string, followed by the serialized JSON value of the entry, separated by delimiters according to the JSON object syntax, i.e. {key:value, key:value, ...}. The key/value pairs in the serialized output retain the entry order^DM of entries in the map.
If any two keys of the map item have the same string value, serialization error [err:SERE0022] is raised, unless the allow-duplicate-names parameter has the value true.
A node in the input tree is serialized to a JSON string by outputting the result of serializing the node using the method specified by the json-node-output-method parameter. The node is serialized with the serialization parameter omit-xml-declaration set to true and with no other serialization parameters set.
An atomic value^XP in the input tree with a numeric type, or derived from a numeric type xs:float, xs:double or xs:decimal is serialized to a JSON number. Implementations may serialize the numeric value using any lexical representation of a JSON number defined in [RFC 7159]. If the numeric value cannot be represented in the JSON grammar (such as Infinity or NaN), then the serializermust raise a serialization error [err:SERE0020].
An atomic item^XP in the input tree of type xs:boolean and value true is serialized to the JSON token true.
An atomic item^XP in the input tree of type xs:boolean and value false is serialized to the JSON token false.
An atomic item^XP of type xs:QName in the input tree whose namespace part is "http://www.w3.org/2005/xpath-functions" and whose local part is "null" is serialized to the JSON token null.
Note:
This rule is introduced in 4.0, along with an option in the fn:parse-json function to allow a user-defined representation of the JSON value null. While the default representation of null as an empty sequence is usable in many circumstances, an explicit representation of null as a recognizable item can make some operations on JSON-derived values easier.
Any other atomic value^XP in the input tree is serialized to a JSON string by outputting the result of applying the fn:string function to the item.
An empty sequence in the input tree is serialized to the JSON token null.
A sequence of length greater than one in the input tree will result in a serialization error [err:SERE0023].
Any item in the input tree of a type not specified in the above list will result in a serialization error [err:SERE0021].

[Definition: Whenever a value is serialized to a JSON string, the following procedure is applied to the supplied string:

Any character in the string for which character mapping is defined (see 11 Character Maps) is substituted by the replacement string defined in the character map.
Any other character in the input string (but not a character produced by character mapping) is a candidate for Unicode Normalization if requested by the normalization-form parameter, and JSON escaping. JSON escaping replaces the characters quotation mark, backspace, form-feed, newline, carriage return, tab, or reverse solidus by the corresponding JSON escape sequences \", \b, \f, \n, \r, \t, or \\ respectively, and any other codepoint in the range 1-31 or 127-159 by an escape in the form \uHHHH where HHHH is the hexadecimal representation of the codepoint value. Escaping further replaces the solidus character (/) by the escape sequence \/ if the escape-solidus parameter is set to true, but not if it is set to false. Escaping is also applied to any characters that cannot be represented in the selected encoding.
The resulting string is enclosed in double quotation marks.

]

Finally, encoding, as controlled by the encoding parameter, converts the character stream produced by the preceding rules into an octet stream.

10 Adaptive Output Method

Changes in 4.0 ⬇ ⬆

The serialization of maps retains the order of entries. [Issue 1651 PR 1703 14 January 2025]
The output of QNames reflects the new syntax for QName literals. [Issue 2059 PR TODO 23 June 2025]
A JNode is replaced by its ¶value property. [Issue 2025 PR 2031 29 May 2025]
A JNode is replaced by its ¶value property. [Issue 2025 PR 2031 29 May 2025]

The Adaptive output method serializes the input tree into a human readable form for the purposes of debugging query results. The intention of this is to allow any input value to be serialized without raising a serialization error. Sequence normalization is not performed for this output method.

Each item in the supplied sequence is serialized individually as follows, with an occurrence of the chosen item-separator between successive items.

A JNode is serialized by serializing its ¶value property.
A JNode is serialized by serializing its ¶value property.
A document, element, text, comment, or processing instruction node is serialized using the XML output method described in 5 XML Output Method.
An attribute or namespace node is serialized as if it had a containing element node. For example an attribute node might be serialized as the string xsi:type="xs:integer"; a namespace node might be serialized as xmlns:sns="http://example.com/sample-namespace".
Note:
This may result in output of QNames containing prefixes whose binding is not displayed.

An atomic value^XP is serialized as follows:

An instance of xs:boolean is serialized as true() or false().
An instance of xs:string, xs:untypedAtomic or xs:anyURI is serialized by enclosing the value in double quotation marks and doubling any quotes within the value; or optionally by enclosing the value in apostrophes and doubling any apostrophes within the value. The resulting value is then serialized using the Text output method described in 8 Text Output Method.
Note:
The Text output method will apply character expansion and encoding rules to this string as specified by the serialization parameters.
An instance of xs:integer or xs:decimal is serialized by converting the value to a string using the fn:string function.

An instance of xs:double is serialized by applying the function format-number(?, '0.0##########################e0') using the following default decimal format properties:

Decimal format
Property name	Property value
`decimal-separator`	U+002E (FULL STOP, PERIOD, `.`)
`exponent-separator`	U+0065 (LATIN SMALL LETTER E, `e`)
`grouping-separator`	U+002C (COMMA, `,`)
`zero-digit`	U+0030 (DIGIT ZERO, `0`)
`digit`	U+0023 (NUMBER SIGN, `#`)
`infinity`	The string "INF"
`NaN`	The string "NaN"
`minus-sign`	U+002D (HYPHEN-MINUS, `-`)

An instance of xs:NOTATION is serialized as a URI-qualified name (that is, in the form Q{uri}local).
An instance of xs:QName is serialized with a # character, followed by:
- the local name if the name is in no namespace, or
- the URI-qualified name otherwise (Q{uri}local).
An atomic item of any other type is serialized using the syntax of a constructor function: xs:TYPE("VAL") where TYPE is the name of the primitive type, and VAL is the result of applying the fn:string() function. For example, xs:date("2015-07-17"). The resulting string is then serialized using the Text output method described in 8 Text Output Method.

An array item is serialized using the syntax of a SquareArrayConstructor^XP, that is as [member, member, ... ]. The members, which in general are sequences, are serialized in the form (item, item, ...) where the items are serialized by applying these rules recursively. The items are separated by commas (not by the item-separator character). The enclosing parentheses are optional if the sequence has length one.
Note:
The serializer should avoid outputting the parentheses if it is able to determine the length of the sequence before serializing the first item; but it is allowed to output parentheses around a singleton if this avoids buffering data in memory.
A map item is serialized using the syntax of a MapConstructor^XP without the optional map keyword, that is in the format {key:value, key:value, ...}. The key/value pairs in the serialized output retain the entry order^DM of entries in the map. The key is serialized by applying the rules for serializing an atomic item. The values are serialized in the same way as the members of an array (see above).
A function item is serialized to the representation name#A where fn:name is a representation of the function name and A is the arity. If the function name is in one of the namespaces http://www.w3.org/2005/xpath-functions, http://www.w3.org/2005/xpath-functions/math, http://www.w3.org/2005/xpath-functions/map, http://www.w3.org/2005/xpath-functions/array or http://www.w3.org/2001/XMLSchema, then the name is output as a lexical QName using the conventional prefix fn, math, map, array, or xs as appropriate; if it is in any other namespace or in no namespace, then the name is output as a URI-qualified name (that is, Q{uri}local). If the function is anonymous, name is replaced by the string (anonymous-function).
Note:
The following examples illustrate this rule:
- fn:exists#1 is serialized as function fn:exists#1
- Q{http://www.w3.org/2005/xpath-functions}exists#1 is serialized as fn:exists#1
- function($a) { $a } is serialized as (anonymous-function)#1
- math:pi#0 is serialized as math:pi#0

Character maps are applied (a) when nodes are serialized using the XML output method, and (b) to any value represented as a string enclosed in quotation marks.

Optionally, in all the above constructs, characters whose visual representation is ambiguous (for example tab or non-breaking-space) may be represented in the form of an XML numeric character reference (for example 	 or  )

Note:

In many cases the serialization of an item conforms to the syntax of an XQuery expression whose result is that item. There are exceptions, however. For example, the syntax will not be valid XQuery in the case of free-standing attribute or namespace nodes, or QName values, or anonymous functions; and where it is valid XQuery, the result of evaluating the expression will not necessarily be identical to the original: for example, the distinction between strings and untypedAtomic items is lost.

If any value cannot be output because doing so would cause a serialization error, the behavior is implementation-defined.

If the output is sent to a destination that allows hyperlinks to be included in the generated text, then the serializer may include implementation-dependent hyperlinks to provide additional information for example:

to allow the type of atomic items^XP to be ascertained.
to allow the namespace binding of prefixes to be ascertained.
to provide further information about the cause of error indicators.

E Glossary (Non-Normative)

array item

The term array item is defined in Section 7.38.3 Array Items^DM.

atomize

The term atomization is defined in Section 2.5.3 Atomization^XP.

character

The term character is defined in Section 4.1.5 XML and XSD Versions^DM.

codepoint

The term codepoint is defined in Section 4.1.5 XML and XSD Versions^DM.

content

The term content has the same meaning as the term Content^XML defined in Section 3.1 Start-Tags, End-Tags, and Empty-Element Tags^XML of [XML10].

EMPTY

The following XHTML elements have an EMPTY content model: area, base, br, col, embed, hr, img, input, link, meta, basefont, frame, isindex, and param.

expanded QName

The term expanded QName is defined in Section 2 Basics^XP. An expanded QName consists of an optional namespace URI and a local name. An expanded QName also retains its original namespace prefix (if any), to facilitate casting the expanded QName into a string.

expected-empty

An element node is expected to be empty if it is recognized as an HTML element and:

With HTML5, the element is a void element.
Prior to HTML5, the content model is EMPTY.

function item

The term function item is defined in Section 7.18.1 Function Items^DM.

host language

A host language is another specification that includes, by reference, this specification and all of its requirements. A host language might be a programming language such as [XSL Transformations (XSLT) Version 4.0] or [XQuery 4.0: An XML Query Language], or it might be an application programming interface (API) intended to be used by programs written in some other high-level programming language. The use of the term language is not intended to preclude the possibility that this specification might be referenced outside the context of a programming language specification.

immediate content

The immediate content of an element is the part of the content of the element that is not also in the content of a child element of that element.

implementation-defined

Implementation-defined indicates an aspect that may differ between serializers, but whose actual behavior must be specified either by another specification that sets conformance criteria for serialization (see 12 Conformance) or in documentation that accompanies the serializer.

implementation-dependent

Implementation-dependent indicates an aspect that may differ between serializers, and whose actual behavior is not required to be specified either by another specification that sets conformance criteria for serialization (see 12 Conformance) or in documentation that accompanies the serializer.

input tree

In general the output of the serializer will represent the items actually present in the input value, together with other items that are reachable from these, for example (in the case of nodes) their descendants. The complete set of items that are represented in the output of the serializer is referred to (without loss of generality) as the input tree.

input value

The XDM value supplied as input to the serializer is referred to as the input value.

map item

The term map item is defined in Section 7.28.2 Map Items^DM.

MathML namespace

the MathML namespace namespace, https://www.w3.org/1998/Math/MathML.

node

The term node is defined as part of [TITLE OF DM40 SPEC, TITLE OF Node SECTION]^DM40. There are seven kinds of nodes in the data model: document, element, attribute, text, namespace, processing instruction, and comment.

non-null namespace URI

An element or attribute that does not have a null namespace URI, is referred to as having a non-null namespace URI

null namespace URI

An expanded-QName whose namespace part is an empty sequence, or an element or attribute whose name expands to such an expanded-QName, is referred to as having a null namespace URI

Output declaration namespace

the Output declaration namespace, https://www.w3.org/2010/xslt-xquery-serialization

parameter document

An output:serialization-parameters element node used to hold the settings of serialization parameters is referred to as a parameter document

prefix normalization

During prefix normalization, any element node in the input tree that is in one of the XHTML namespace, the SVG namespace or the MathML namespace has its name replaced by the local part of its name. Such an element node is given a default namespace node whose value is the element’s namespace URI. Any namespace node for any of those three namespaces that was previously present on any element node in the input tree is also removed, unless the prefix that that namespace node declared is used as the prefix on the name of an attribute on that element or an ancestor of that element.

prior to HTML5

The term prior to HTML5 is used in this specification to qualify rules that apply only when the effective version of the html-version serialization parameter is less than 5.0.

recognized as an HTML element

An element node is recognized as an HTML element by the XHTML output method if either of the following conditions is true:

the element node is in the XHTML namespace; or
With HTML5: the element has a null namespace URI and the local part of the name is equal to the name of an element defined by HTML5 [HTML5], making the comparison without regard to case.

reconstructed tree

A reconstructed tree may be constructed by parsing the XML document and converting it into an document node as specified in [XQuery and XPath Data Model (XDM) 4.0].

requested HTML version

The requested HTML version is the value of the html-version serialization parameter if present; otherwise the value of the version serialization parameter if present; otherwise 5.0.

result tree

The result of the sequence normalization process is a result tree.

sequence

The term sequence is defined in Section 2 Basics^XP. A sequence is an ordered collection of zero or more items.

sequence normalization

The purpose of sequence normalization is to create a sequence that can be serialized as a well-formed XML document or external general parsed entity, that also reflects the content of the input sequence to the extent possible.

serialization error

In some instances, the input tree cannot be successfully converted into a sequence of octets given the set of serialization parameter (3 Serialization Parameters) values specified. A serialization error is said to occur in such an instance.

serialized as an HTML element

An element node is serialized as an HTML element if

the expanded QName of the element has a null namespace URI, or
the requested HTML version is 5.0 or greater, and the element node is in the XHTML namespace.

serializer

As is indicated in 12 Conformance, conformance criteria for serialization are determined by other specifications that refer to this specification. A serializer is software that implements some or all of the requirements of this specification in accordance with such conformance criteria.

string

The term string is defined in Section 4.1.5 XML and XSD Versions^DM.

string value

The term string value is defined in Section 6.7.127.5.12 string-value Accessor^DM. Every node has a string value. For example, the string value of an element is the concatenation of the string values of all its descendant text nodes.

SVG namespace

the SVG namespace, https://www.w3.org/2000/svg

to a JSON string

Whenever a value is serialized to a JSON string, the following procedure is applied to the supplied string:

Any character in the string for which character mapping is defined (see 11 Character Maps) is substituted by the replacement string defined in the character map.
Any other character in the input string (but not a character produced by character mapping) is a candidate for Unicode Normalization if requested by the normalization-form parameter, and JSON escaping. JSON escaping replaces the characters quotation mark, backspace, form-feed, newline, carriage return, tab, or reverse solidus by the corresponding JSON escape sequences \", \b, \f, \n, \r, \t, or \\ respectively, and any other codepoint in the range 1-31 or 127-159 by an escape in the form \uHHHH where HHHH is the hexadecimal representation of the codepoint value. Escaping further replaces the solidus character (/) by the escape sequence \/ if the escape-solidus parameter is set to true, but not if it is set to false. Escaping is also applied to any characters that cannot be represented in the selected encoding.
The resulting string is enclosed in double quotation marks.

Unicode Normalization

Unicode Normalization is the process of removing alternative representations of equivalent sequences from textual data, to convert the data into a form that can be binary-compared for equivalence, as specified in [UAX #15: Unicode Normalization Forms]. For specific recommendations for character normalization on the World Wide Web, see [Character Model for the World Wide Web 1.0: Normalization].

URI attribute values

The values of attributes listed in D List of URI Attributes are URI attribute values. Attributes are not considered to be URI attributes simply because they are namespace declaration attributes or have the type annotation xs:anyURI.

URI Escaping

URI escaping consists of the following three steps applied in sequence to the content of URI attribute values:

void

The void elements of HTML5 are area, base, br, col, embed, hr, img, input, keygen, link, meta, param, source, track and wbr.

whitespace character

A space character, TAB character, CR character or NL character is referred to as a whitespace character.

with HTML5

The term with HTML5 is used in this specification to qualify rules that apply only when the effective version of the html-version serialization parameter is 5.0.

without regard to case

Where this specification indicates that two strings are to be compared without regard to case, the serializermust translate any characters in the range U+0041 (LATIN CAPITAL LETTER A, A) through U+005A (LATIN CAPITAL LETTER Z, Z) inclusive, to the corresponding lower-case letters in the range U+0061 (LATIN SMALL LETTER A, a) through U+007A (LATIN SMALL LETTER Z, z) only for the purposes of making the comparison. The comparison succeeds if the two strings are the same length and the code point of each character in the first string is equal to the code point of the character in the corresponding position in the second string.

XHTML namespace

the XHTML namespace namespace, https://www.w3.org/1999/xhtml

XML Island

The portion of the serialized document representing the result of serializing an element that is not to be serialized as an HTML element is known as an XML Island.

XML namespace

the XML namespace, https://www.w3.org/XML/1998/namespace

G Change Log (Non-Normative)

This appendix lists changes made in version 4.0 of this specification.

Use the arrows to browse significant changes since the 3.1 version of this specification.
See 1 Introduction
Sections with significant changes are marked Δ in the table of contents.
See 1 Introduction
The term atomic value has been replaced by atomic item.
See 1.1 Terminology
Added the json-lines parameter for JSON serialization.
See 3 Serialization Parameters
See 9 JSON Output Method
PR TODO
The output of QNames reflects the new syntax for QName literals.
See 10 Adaptive Output Method
PR 342
In the HTML and XHTML output methods, the rules for adding and replacing meta elements have been revised to take account of the new HTML5 syntax, for example <meta charset="utf-8">.
See 6 XHTML Output Method
See 7 HTML Output Method
PR 534
Added the escape-solidus parameter for JSON serialization.
See 3 Serialization Parameters
See 9 JSON Output Method
See 10.1 The Influence of Serialization Parameters upon the Adaptive Output Method
PR 1703
The serialization of maps retains the order of entries.
See 9 JSON Output Method
See 10 Adaptive Output Method
PR 1977
The default HTML version is now 5. This may result in changes to the serialized output in cases where no explicit HTML version is requested.
See 6 XHTML Output Method
See 7 HTML Output Method
PR 2031
A JNode is replaced by its ¶value property.
See 9 JSON Output Method
See 10 Adaptive Output Method
PR 2031
A JNode is replaced by its ¶value property.
See 9 JSON Output Method
See 10 Adaptive Output Method

XSLT and XQuery Serialization 4.0

W3C Editor's Draft 218 February 2026

Abstract

Status of this Document

Dedication

1 Introduction

1.1 Terminology

2 Sequence Normalization

5 XML Output Method

9 JSON Output Method

10 Adaptive Output Method

E Glossary (Non-Normative)

G Change Log (Non-Normative)