This document is also available in these non-normative formats: XML.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
This document defines methods of serializing an instance of the data model defined in [XDM 4.0] into a sequence of octets, conforming to a variety of formats including XML, HTML, and JSON. Serialization is designed to be a component that can be used either on its own, or invoked from languages such as [XSLT 4.0], [XPath 4.0] or [XQuery 4.0].
This section describes the status of this document at the time of its publication. Other documents may supersede this document.
This document is a working draft developed and maintained by a W3C Community Group, the XQuery and XSLT Extensions Community Group unofficially known as QT4CG (where "QT" denotes Query and Transformation). This draft is work in progress and should not be considered either stable or complete. Standard W3C copyright and patent conditions apply.
The community group welcomes comments on the specification. Comments are best submitted as issues on the group's GitHub repository.
The community group maintains two extensive test suites, one oriented to XQuery and XPath, the other to XSLT. These can be found at qt4tests and xslt40-test respectively. New tests, or suggestions for correcting existing tests, are welcome. The test suites include extensive metadata describing the conditions for applicability of each test case as well as the expected results. They do not include any test drivers for executing the tests: each implementation is expected to provide its own test driver.
The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).
Changes in 4.0 (next)
If a section of this specification has been updated since version 3.1, an overview of the changes is provided, along with links to navigate to the next or previous change.
Sections with significant changes are marked with a ✭ symbol in the table of contents.
This document defines methods of serializing the W3C XQuery and XPath Data Model 4.0 ([XDM 4.0]), that is, methods of representing instances of the data model as strings or octet sequences. This is the data model used by [XPath 4.0], [XSLT 4.0], and [XQuery 4.0], and any other specifications that reference it.
In this document, examples and material labeled as “Note” are provided for explanatory purposes and are not normative.
Serialization is the process of converting an instance of the [XDM 4.0] into a sequence of octets.
[Definition: The XDM value supplied as input to the serializer is referred to as the input value.] Some serialization methods apply only to certain types of input value.
Note:
Where serialization is used to process the result of an XQuery evaluation or an XSLT transformation, the input value of the serializer corresponds to the output from XQuery or XSLT.
[Definition: In general the output of the serializer will represent the items actually present in the input value, together with other items that are reachable from these, for example (in the case of nodes) their descendants. The complete set of items that are represented in the output of the serializer is referred to (without loss of generality) as the input tree.]
Changes in 4.0 (next | previous)
The term atomic value has been replaced by atomic item. [Issue 1337 2 August 2024]
In this specification, where they are rendered in small capitals, the words must, must not, should, should not, may, required, and recommended are to be interpreted as described in [RFC2119].
[Definition: As is indicated in 12 Conformance, conformance criteria for serialization are determined by other specifications that refer to this specification. A serializer is software that implements some or all of the requirements of this specification in accordance with such conformance criteria.] A serializer is not required to directly provide a programming interface that permits a user to set serialization parameters or to provide an input sequence for serialization. In this document, material labeled as "Note" and examples are provided for explanatory purposes and are not normative.
Certain aspects of serialization are described in this specification as implementation-defined or implementation-dependent.
[Definition: Implementation-defined indicates an aspect that may differ between serializers, but whose actual behavior must be specified either by another specification that sets conformance criteria for serialization (see 12 Conformance) or in documentation that accompanies the serializer.]
[Definition: Implementation-dependent indicates an aspect that may differ between serializers, and whose actual behavior is not required to be specified either by another specification that sets conformance criteria for serialization (see 12 Conformance) or in documentation that accompanies the serializer.]
[Definition: In some instances, the input tree cannot be successfully converted into a sequence of octets given the set of serialization parameter (3 Serialization Parameters) values specified. A serialization error is said to occur in such an instance.] In some cases, a serializer is required to raise such an error. What it means to raise a serialization error is determined by the relevant conformance criteria (12 Conformance) to which the serializer conforms. In other cases, there is an implementation-defined choice between raising a serialization error and performing a recovery action. Such a recovery action will allow a serializer to produce a sequence of octets that might not fully reflect the usual requirements of the parameter settings that are in effect.
[Definition: Where this specification indicates that two strings are to be compared without regard to case, the serializermust translate any characters in the range U+0041 (LATIN CAPITAL LETTER A, A) through U+005A (LATIN CAPITAL LETTER Z, Z) inclusive, to the corresponding lower-case letters in the range U+0061 (LATIN SMALL LETTER A, a) through U+007A (LATIN SMALL LETTER Z, z) only for the purposes of making the comparison. The comparison succeeds if the two strings are the same length and the code point of each character in the first string is equal to the code point of the character in the corresponding position in the second string.]
Many terms used in this document are defined in the XPath specification [XPath 4.0] or the Data Model specification [XDM 4.0]. Particular attention is drawn to the following:
[Definition: The term atomization is defined in [XPath 4.0] section 2.6.3 Atomization.]
[Definition: The term node is defined in [XPath 4.0] section 2 Basics. There are seven kinds of nodes in the data model: document, element, attribute, text, namespace, processing instruction, and comment.]
[Definition: The term sequence is defined in [XPath 4.0] section 2 Basics. A sequence is an ordered collection of zero or more items.]
[Definition: The term function item is defined in [XDM 4.0] section 8.1 Function Items.]
[Definition: The term map item is defined in [XDM 4.0] section 8.2 Map Items.]
[Definition: The term array item is defined in [XDM 4.0] section 8.3 Array Items.]
[Definition: The term string is defined in [XDM 4.0] section 4.1.5 XML and XSD Versions.]
[Definition: The term character is defined in [XDM 4.0] section 4.1.5 XML and XSD Versions.]
[Definition: The term codepoint is defined in [XDM 4.0] section 4.1.5 XML and XSD Versions.]
[Definition: The term string value is defined in [XDM 4.0] section 7.6.12 string-value Accessor. Every node has a string value. For example, the string value of an element is the concatenation of the string values of all its descendant text nodes.]
[Definition: The term expanded QName is defined in [XPath 4.0] section 2 Basics. An expanded QName consists of an optional namespace URI and a local name. An expanded QName also retains its original namespace prefix (if any), to facilitate casting the expanded QName into a string.]
[Definition: An expanded-QName whose namespace part is anthe empty sequence, or an element or attribute whose name expands to such an expanded-QName, is referred to as having a null namespace URI].
[Definition: An element or attribute that does not have a null namespace URI, is referred to as having a non-null namespace URI].
[Definition: A space character, TAB character, CR character or NL character is referred to as a whitespace character.]
Where this specification indicates that an XSLT instruction is evaluated, the behavior is as specified by [XSLT 4.0]. Where it indicates that an XQuery expression is evaluated, the behavior is as specified by [XQuery 4.0].
There are a number of parameters that influence how serialization is performed. Host languagesmay allow users to specify any or all of these parameters, but they are not required to be able to do so. However, the host language specification must specify how the values of all applicable parameters are to be determined.
Host languages may also define alternative representations of the values of serialization parameters. For example, both XSLT and XQuery allow the boolean values true and false to be written as 1/0 or yes/no. The $options map passed to the fn:serialize function, by contrast, requires an xs:boolean value.
It is a serialization error [err:SEPM0016] if a parameter value is invalid for the given parameter. It is the responsibility of the host language to specify how invalid values should be handled at the level of that language.
The following serialization parameters are defined:
| Serialization parameter name | Permitted values for parameter |
|---|---|
allow-duplicate-names | A boolean value, true or false. This parameter indicates whether a map item serialized as a JSON object using the JSON output method is allowed to contain duplicate member names. If the value false is specified, a serialization error [err:SERE0022] may be raised under certain conditions. |
byte-order-mark | A boolean value, true or false. This parameter indicates whether the serialized sequence of octets is to be preceded by a Byte Order Mark (See Section 5.1 of [Unicode Encoding]). The actual octet order used is implementation-dependent. If the encoding defines no Byte Order Mark, or if the Byte Order Mark is prohibited for the specific encoding or implementation environment, then this parameter is ignored. |
canonical | A boolean value, true or false. |
cdata-section-elements | A list of expanded QNames, possibly empty. |
doctype-public | A string of PubidCharXML characters. This parameter may be absent. |
doctype-system | A string of Unicode characters that does not include both the characters U+0027 (APOSTROPHE, ') and U+0022 (QUOTATION MARK, ") . This parameter may be absent. |
encoding | A string of Unicode characters in the range U+0021 (EXCLAMATION MARK, !) through U+007E (TILDE, ~) (that is, printable ASCII characters); the value should be a charset registered with the Internet Assigned Numbers Authority [IANA], [RFC2978] or begin with the characters x- or X-. |
escape-solidus | A boolean value, true or false. |
escape-uri-attributes | A boolean value, true or false. |
html-version | A decimal value. This parameter may be absent. |
include-content-type | A boolean value, true or false. |
indent | A boolean value, true or false. |
item-separator | A string of Unicode characters. This parameter may be absent. |
json-lines | A boolean value, true or false. |
json-node-output-method | An expanded QName with a non-null namespace URI, or with a null namespace URI and a local name equal to one of xml, xhtml, html or text. If the namespace URI is non-null, the parameter specifies an implementation-defined output method. |
media-type | A string of Unicode characters specifying the media type (MIME content type) [RFC2046]; the charset parameter of the media type must not be specified explicitly in the value of the media-type parameter. If the destination of the serialized output is annotated with a media type, this parameter may be used to provide such an annotation. For example, it may be used to set the media type in an HTTP header. |
method | An expanded QName with a non-null namespace URI, or with a null namespace URI and a local name that must be equal to one of xml, xhtml, html, text, json, or adaptive, in which case, the output method specified must be used for serializing. If the namespace URI is non-null, the parameter specifies an implementation-defined output method; its behavior is not specified by this document. |
normalization-form | One of the enumerated values NFC, NFD, NFKC, NFKD, fully-normalized or none, or an implementation-defined value of type NMTOKEN. |
omit-xml-declaration | A boolean value, true or false. |
standalone | Either a boolean value, true or false, or the value or omit. |
suppress-indentation | A list of expanded QNames, possibly empty. |
undeclare-prefixes | A boolean value, true or false. |
use-character-maps | A list of pairs, possibly empty, with each pair consisting of a single Unicode character and a string of Unicode characters. |
version | A string of Unicode characters. This parameter may be absent. |
In those cases where they have no important effect on the content of the serialized result, details of the output methods defined by this specification are left unspecified and are regarded as implementation-dependent. Whether a serializer uses apostrophes or quotation marks to delimit attribute values in the XML output method is an example of such a detail.
The detailed semantics of each parameter will be described separately for each output method for which it is applicable. If the semantics of a parameter are not described for an output method, then it is not applicable to that output method.
Implementations may define additional serialization parameters, and may allow users to do so. For this purpose, the name of a serialization parameter is considered to be a QName; the parameters listed above are QNames whose expanded-QName has a null namespace URI, while any additional serialization parameters that are either implementation-defined or defined by the host languagemust have names that are namespace-qualified. Any such additional serialization parameters must not be in the namespace https://www.w3.org/2010/xslt-xquery-serialization. A host languagemay specify the means by which an implementation can define such an additional serialization parameter, and implementations may provide mechanisms by which users can define such an additional serialization parameter. If the serialization method is one of the six methods xml, html, xhtml, text, json, or adaptive then the additional serialization parameters may affect the output of the serializer to the extent (but only to the extent) that this specification leaves the output implementation-defined or implementation-dependent. For example, such parameters might control whether namespace declarations on an element are written before or after the attributes of the element, or they might define the number of space or tab characters to be inserted when the indent parameter is set to true; but they could not instruct the serializer to suppress the error that occurs when the HTML output method encounters characters that are not permitted (see error [err:SERE0014]).
A host languagemay provide, by reference to this section, a mechanism by which the settings of serialization parameters are supplied in the form of an output:serialization-parameters element node.
Note:
The namespace prefix output is assumed to be bound to the namespace URI http://www.w3.org/2010/xslt-xquery-serialization. The document may use any namespace prefix or none.
[Definition: An output:serialization-parameters element node used to hold the settings of serialization parameters is referred to as a parameter document].
Note:
The use of the word document does not imply that the output:serialization-parameters element must be the outermost element of an XDM document, although this will often be the case.
The parameter documentmust be processed as if by the procedure described below.
With the exception of the use-character-maps parameter, the setting of each serialization parameter defined in this specification is equal to the result of evaluating the XQuery expression
document { . }
/output:serialization-parameters
/(validate lax {
output:*[local-name() eq $param-name]
})
/data(@value)or equivalently the XSLT instructions
<xsl:sequence>
<xsl:variable name="validated-instance">
<xsl:document validation="lax">
<xsl:sequence select="
self::output:serialization-parameters
/output:*
[local-name() eq $param-name]"/>
</xsl:document>
</xsl:variable>
<xsl:sequence select="$validated-instance
/data(@value)"/>
</xsl:sequence>with the parameter document as the context value, the param-name variable bound to a value of type xs:string equal to the local part of the name of the particular serialization parameter, and the other components of the dynamic context and static context as specified in the subsequent tables. If in any case evaluating this expression would yield an error, serialization error [err:SEPM0017] results.
If the result of evaluating this expression for a particular serialization parameter is the empty sequence, then
If the parameter is either cdata-section-elements or suppress-indentation, and the result of evaluating the XQuery expression
document { . }
/output:serialization-parameters
/(validate lax {
output:*[local-name() eq $param-name]
})or equivalently the XSLT instructions
<xsl:sequence>
<xsl:variable name="validated-instance">
<xsl:document select="." validation="lax">
<xsl:sequence select="
self::output:serialization-parameters
/output:*
[local-name() eq $param-name]"/>
</xsl:document>
</xsl:variable>
<xsl:sequence select="$validated-instance"/>
</xsl:sequence>with the same settings of the static context and dynamic context is not anthe empty sequence, the setting of the parameter is the empty list;
otherwise, the setting of the parameter is absent.
The components of the static context used in evaluating the XQuery expressions or XSLT instructions are as defined in the following table.
| Static Context Component | XQuery or XSLT | Setting |
|---|---|---|
| XPath 1.0 compatibility mode | Both | false |
| Statically known namespaces | XQuery | The pair (output,http://www.w3.org/2010/xslt-xquery-serialization) |
| XSLT | The pairs (output,http://www.w3.org/2010/xslt-xquery-serialization), (xsl,http://www.w3.org/1999/XSL/Transform) | |
| Default element/type namespace | Both | "none" |
| Default function namespace | Both | http://www.w3.org/2005/xpath-functions |
| In-scope schema types, In-scope element declarations, Substitution groups, In-scope attribute declarations | Both | As defined by the schema for serialization parameters (B Schema for Serialization Parameters) and any additional implementation-defined in-scope schema components |
| In-scope variables | Both | {param-name} |
| Context value static type | Both | node() |
| Statically known function signatures | Both | {fn:data($arg as item()*) as xs:anyAtomicType*}, {fn:local-name($arg as node()?) as xs:string} |
| Statically known collations | Both | { (http://www.w3.org/2005/xpath-functions/collation/codepoint, The Unicode codepoint collation ) } |
| Default collation | Both | The Unicode codepoint collation |
| Construction mode | XQuery | strip |
| Ordering mode | XQuery | ordered |
| Default order for empty sequences | XQuery | least |
| Boundary space policy | XQuery | strip |
| Copy-namespaces mode | XQuery | (preserve,inherit) |
| Base URI | Both | Absent |
| Statically known documents | Both | None |
| Statically known collections | Both | None |
| Statically known default collection type | Both | node()* |
| Statically known decimal formats | Both | None |
| Set of named keys | XSLT | {} |
| Values of system properties | XSLT | None |
| Set of available instructions | XSLT | The set of all instructions defined by [XSLT 4.0] |
The remaining components of the dynamic context used in evaluating the XQuery expressions or XSLT instructions in the preceding table are as defined in the following table.
| Dynamic Context Component | XQuery or XSLT | Setting |
|---|---|---|
| Context position | Both | 1 |
| Context size | Both | 1 |
| Variable values | Both | The param-name variable has a value of type xs:string equal to the local part of the name of the serialization parameter under consideration |
| Function implementations | Both | The implementation of fn:data |
| Current dateTime | Both | Absent |
| Implicit timezone | Both | Absent |
| Available documents | Both | None |
| Available collections | Both | None |
| Default collection | Both | None |
| Current template rule | XSLT | Absent |
| Current mode | XSLT | The default mode |
| Current group | XSLT | Absent |
| Current grouping key | XSLT | Absent |
| Current captured substrings | XSLT | The empty sequence |
| Output state | XSLT | Temporary output state |
In the case of the use-character-maps parameter, the XQuery expression
document { . }
/output:serialization-parameters
/ ( validate lax { output:use-character-maps } )
/output:character-map[@character eq $char]
/string(@map-string)or equivalently the XSLT instructions
<xsl:sequence>
<xsl:variable name="validated-instance">
<xsl:document validation="lax">
<xsl:sequence select="
self::output:serialization-parameters
/output:use-character-maps"/>
</xsl:document>
</xsl:variable>
<xsl:sequence select="$validated-instance
/output:character-map
[@character eq $char]
/string(@map-string)"/>
</xsl:sequence>is evaluated for each Unicode character that is permitted in an XML document. The dynamic context and static context used to evaluate the expression are as defined above, except that the component In-scope variables is the set {char} and the value of the variable "char" is a value of type xs:string of length one whose value is the Unicode character under consideration. If the result of evaluating the expression is not anthe empty sequence, the pair consisting of the Unicode character and the result of evaluating the expression is part of the list of pairs in the value of the use-character-maps parameter. It is a serialization error [err:SEPM0018] if the result of evaluating this expression for any character is a sequence of length greater than one.
Using the same settings of the components of the dynamic context and static context, serialization error [err:SEPM0019] results if the result of evaluating the following XQuery expression is not true
(document { . })/output:serialization-parameters
/(count(distinct-values(*/node-name(.))) eq (count(*)))or equivalently if the result of evaluating the following XSLT instructions is not true.
<xsl:sequence>
<xsl:variable name="doc">
<xsl:document>
<xsl:sequence select="."/>
</xsl:document>
</xsl:variable>
<xsl:sequence
select="$doc/output:serialization-parameters
/(count(distinct-values(
*/node-name(.)))
eq (count(*)))"/>
</xsl:sequence>The result of evaluating either will be false if the parameter document supplies a value for any particular serialization parameter more than once, or will be the empty sequence if the parameter document is not an element node whose local name is serialization-parameters and whose namespace URI is http://www.w3.org/2010/xslt-xquery-serialization.
Note:
A serializer or implementation of a host language does not need to be accompanied by an XQuery processor nor by a general-purpose schema validator in order to meet the requirements of this section. It merely needs to be capable of extracting values from an XDM instance that conforms to the schema for serialization parameters, while checking that the constraints implied by the schema and additional constraints implied by the XQuery validate expression or explicitly stated in this section are satisfied.
The host languagemay provide additional mechanisms for overriding the values of any serialization parameters specified through the mechanism defined in this section, as well as additional mechanisms for specifying the values of any serialization parameters whose values are absent after applying the mechanism defined in this section.
If the parameter document contains elements or attributes that are in a namespace other than http://www.w3.org/2010/xslt-xquery-serialization, the implementation may interpret them to specify the values of implementation-defined serialization parameters in an implementation-defined manner.
The following XML document, if parsed as a parameter document and processed using the mechanism described in this section, would specify the settings of the method, version and indent serialization parameters with the values xml, 1.0 and true, respectively.
<output:serialization-parameters
xmlns:output
= "http://www.w3.org/2010/xslt-xquery-serialization">
<output:method value="xml"/>
<output:version value="1.0"/>
<output:indent value="yes"/>
</output:serialization-parameters>The following document would specify the value of the cdata-section-elements serialization parameter with value equal to the pair of expanded QNames (http://example.org/book/chapter,heading) and (http://example.org/book,footnote)
<output:serialization-parameters
xmlns:output
= "http://www.w3.org/2010/xslt-xquery-serialization"
xmlns:book="http://example.org/book"
xmlns="http://example.org/book/chapter">
<output:cdata-section-elements value="heading book:footnote"/>
</output:serialization-parameters>The following document would specify the value of the method serialization parameter with the value html.
Notice that in this example, the default namespace declaration in scope has no effect on the interpretation of the setting of the method parameter.
<output:serialization-parameters
xmlns:output
= "http://www.w3.org/2010/xslt-xquery-serialization"
xmlns="http://example.org/ext">
<output:method value="html"/>
</output:serialization-parameters>The following document would specify the value of the method serialization parameter with value equal to the expanded QName (http://example.org/ext, jsp), and the use-character-maps parameter with value equal to the list of pairs, («, <%), (», %>).
<output:serialization-parameters
xmlns:output
= "http://www.w3.org/2010/xslt-xquery-serialization"
xmlns:ext="http://example.org/ext">
<output:method value="ext:jsp"/>
<output:use-character-maps>
<output:character-map character="«" map-string="<%"/>
<output:character-map character="»" map-string="%>"/>
</output:use-character-maps>
</output:serialization-parameters>Changes in 4.0 (next | previous)
In the HTML and XHTML output methods, the rules for adding and replacing meta elements have been revised to take account of the new HTML5 syntax, for example <meta charset="UTF-8">. [Issue 318 PR 342 14 February 2023]
The default HTML version is now 5. This may result in changes to the serialized output in cases where no explicit HTML version is requested. [Issue 1889 PR 1977 2 May 2025]
The HTML output method serializes the input tree as HTML.
For example, the following XSL stylesheet generates html output,
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" version="4.0"/>
<xsl:template match="/">
<html>
<xsl:apply-templates/>
</html>
</xsl:template>
...
</xsl:stylesheet>In the example, the version attribute of the xsl:output element indicates the version of the HTML Recommendation [HTML] to which the serialized result is to conform.
[Definition: The requested HTML version is the value of the html-version serialization parameter if present; otherwise the value of the version serialization parameter if present; otherwise 5.0.]
This document provides the normative definition of serialization for the HTML output method if the requested HTML version has the lexical form of a value of type decimal whose value is 1.0 or greater, but no greater than 5.0. For any other requested HTML version, the behavior is implementation-defined. In that case the implementation-defined behavior may supersede all other requirements of this recommendation.
An implementation is required to behave as specified in this document when the requested version is 5.0. If the requested version is greater than or equal to 1 but less than 5.0, then the processor may behave as if the requested version were 5.0.
It is entirely the responsibility of the supplier of the input tree to ensure that it conforms to the relevant HTML specification. It is not an error if the input tree is invalid HTML. Equally, it is entirely under the control of the supplier of the input tree whether the output conforms to HTML. If the result tree is valid HTML, the serializermust serialize the result in a way that conforms with the requested HTML version.
Comment nodes are written as HTML comments, using the syntax <!--comment-->.
Prior to HTML5, the HTML output method must terminate processing instructions with > rather than ?>. It is a serialization error [err:SERE0015] to use the HTML output method when > appears within a processing instruction in input tree.
With HTML5, a processing instruction must be written as an HTML comment, in the form <!--?targetdata?-->. Here target is the name of the processing instruction, and data is its string value; they are separated by a single space, but this omitted in the case where data is zero-length. If theeither the name or the string value of the processing instruction contains two adjacent hyphens, they must be separated in the output by a single space. The space separator is omitted if the string value of the processing instruction is zero-length. For example, the processing instruction <?xml-stylesheet type="text/xsl" href="render.xsl"?> is serialized as <--?xml-stylesheet type="text/xsl" href="render.xsl"?-->
Changes in 4.0 (next | previous)
Added the escape-solidus parameter for JSON serialization. [Issue 530 PR 534 6 June 2023]
Added the json-lines parameter for JSON serialization. [Issue 1471 15 October 2024]
The serialization of maps retains the order of entries. [Issue 1651 PR 1703 14 January 2025]
A JNode is replaced by its ·content· property. [Issue 2025 PR 2031 29 May 2025]
JSON canonicalization is supported by the ·canonical· property. [Issue 938 PR TODO 20 October 2025]
The JSON output method now produces fallback representation of NaN and infinity, rather than reporting an error for such values. [Issue 641 PR 2387 16 January 2026]
The JSON output method serializes the input tree using the JSON syntax defined in [RFC 7159], or (if the json-lines parameter is set to true) the json-lines syntax defined at [JSON Lines]. Sequence normalization is not performed for this output method. The effect of the json-lines parameter is explained at 9.2 JSON Lines.
If json-lines is set to false, then:
If the input value is anthe empty sequence, it is serialized as the string null.
If the input value is a single item, it is serialized as described below.
If the input value is a sequence containing two or more items, a serialization error results [err:SERE0023].
An individual item is serialized as follows:
A JNode is serialized by serializing its ·content· property.
An array item in the input tree is serialized to a JSON array by outputting the serialized JSON value of each member within the array separated by delimiters according to the JSON array syntax, i.e. [member, member, ...]. Each member in the array is to be serialized by recursively applying the rules in this section.
A map item in the input tree is serialized to a JSON object by outputting, for each key/value pair, the string value of the key to a JSON string, followed by the serialized JSON value of the entry, separated by delimiters according to the JSON object syntax, i.e. {key:value, key:value, ...}. The key/value pairs in the serialized output retain the entry orderDM of entries in the map, unless canonical is true, in which case map entries must be sorted according to the rules of [RFC8785].
Note:
These rules require sorting according to the UTF-16 representation of the string, which is not the same (in the presence of surrogate pairs) as sorting using the XPath codepoint collation.
If any two keys of the map item have the same string value, serialization error [err:SERE0022] is raised, unless the allow-duplicate-names parameter is true and the canonical parameter is false.
A node in the input tree is serialized to a JSON string by outputting the result of serializing the node using the method specified by the json-node-output-method parameter. If the value of the canonical parameter is false, the node is serialized with the serialization parameter omit-xml-declaration set to true and with no other serialization parameters set, otherwise (canonical is true) all parameters are inherited by the sub-serialization process.
An atomic valueXP of type xs:numeric is serialized to a JSON number.
If the canonical parameter is true, then the value is cast to type xs:double and the result is serialized according to the rules of [RFC8785] (and by extension [RFC7493]).
If the canonical parameter is false then:
A value of type xs:decimal (including xs:integer) is output in the format that results from casting the value to xs:string.
A value of type xs:float is first cast to type xs:double, and then output in the same way as an xs:double.
Implementations may serialize an xs:double value using any lexical representation of a JSON number defined in [RFC 7159], but it is recommended to use the same representation as when the canonical parameter is true.
The value NaN is serialized as the JSON token null.
The values positive and negative infinity are serialized as 1e9999 and -1e9999 respectively.
Note:
These values are permitted by the JSON grammar, but some JSON parsers may reject them. For interoperability, they should be avoided.
An atomic itemXP of type xs:boolean is serialized to the JSON token true or false.
An atomic itemXP of type xs:QName in the input tree whose namespace part is "http://www.w3.org/2005/xpath-functions" and whose local part is "null" is serialized to the JSON token null.
Note:
This rule is introduced in 4.0, along with an option in the fn:parse-json function to allow a user-defined representation of the JSON value null. While the default representation of null as anthe empty sequence is usable in many circumstances, an explicit representation of null as a recognizable item can make some operations on JSON-derived values easier.
Any other atomic valueXP in the input tree is serialized to a JSON string by outputting the result of applying the fn:string function to the item.
An empty sequence in the input tree is serialized to the JSON token null.
A sequence of length greater than one in the input tree will result in a serialization error [err:SERE0023].
Any item in the input tree of a type not specified in the above list will result in a serialization error [err:SERE0021].
[Definition: Whenever a value is serialized to a JSON string, the following procedure is applied to the supplied string:]
If the canonical parameter is true, then the string is output as specified in [RFC8785]. The string is first normalized if requested using the normalization-form parameter. The serialization parameters escape-solidus and use-character-maps are ignored.
Note:
The escaping rules are restated here for convenience:
If the Unicode value falls within the traditional ASCII control character range (U+0000 (NULL) through U+001F (IS1) ), it must be serialized using lowercase hexadecimal Unicode notation (\uhhhh) unless it is in the set of predefined JSON control characters U+0008 (BACKSPACE) , U+0009 (TAB) , U+000A (NEWLINE) , U+000C (FORM FEED) , or U+000D (CARRIAGE RETURN) , which must be serialized as \b, \t, \n, \f, and \r, respectively.
If the Unicode value is outside of the ASCII control character range, it must be serialized “as is” unless it is equivalent to U+005C (REVERSE SOLIDUS, BACKSLASH, \) or U+0022 (QUOTATION MARK, ") , which must be serialized as \\ and \", respectively.
Note that the C1 control characters (codepoints 127-159) are not included in this list.
Note:
In canonical JSON, property values within an object are sorted based on their UTF16 representation. This corresponds to the default sort order in languages such as Javascript, Java, and C#, but in the presence of codepoints above 65535, it is not the same as the order produced by Unicode codepoint collation.
Otherwise (when canonical is false):
Any character in the string for which character mapping is defined (see 11 Character Maps) is substituted by the replacement string defined in the character map.
Any other character in the input string (but not a character produced by character mapping) is a candidate for Unicode Normalization if requested by the normalization-form parameter, and JSON escaping. JSON escaping replaces the characters U+005C (REVERSE SOLIDUS, BACKSLASH, \) , U+0008 (BACKSPACE) , U+000C (FORM FEED) , U+000A (NEWLINE) , U+000D (CARRIAGE RETURN) , U+0009 (TAB) , or U+005C (REVERSE SOLIDUS, BACKSLASH, \) by the corresponding JSON escape sequences \", \b, \f, \n, \r, \t, or \\ respectively, and any other codepoint in the range 1-31 or 127-159 by an escape in the form \uHHHH where HHHH is the hexadecimal representation of the codepoint value. Escaping further replaces the solidus character (/) by the escape sequence \/ if the escape-solidus parameter is set to true, but not if it is set to false. Escaping is also applied to any characters that cannot be represented in the selected encoding.
The resulting string is enclosed in double quotation marks.
Finally, encoding, as controlled by the encoding parameter, converts the character stream produced by the preceding rules into an octet stream. The encoding parameter is ignored if canonical is true.
When the json-lines parameter is set to true, the serialized output is written in json-lines format, as defined in [JSON Lines].
If json-lines is set to true, then each item in the input value is serialized indepedently as a JSON text, and the resulting serializations are then concatenated using a single U+000A (NEWLINE) character as a separator.
If json-lines and indent are both set to true, then the serialization of each individual item may include added U+0020 (SPACE) and U+0009 (TAB) characters for formatting purposes, but it must not include U+000A (NEWLINE) or U+000D (CARRIAGE RETURN) characters.
The json-lines specification allows U+000D (CARRIAGE RETURN) characters to appear, and does not treat them as significant; however, they are likely to cause practical problems. For example processing such input using the XPath expression unparsed-text-lines($uri) ! parse-json() would fail. The serializer must not output any U+000D (CARRIAGE RETURN) characters, either at the end of a line or elsewhere.
The json-lines specification allows a terminating U+000A (NEWLINE) character after the last line. In the interests of interoperability, however, the serializer must not output such a terminator.
If the input value is anthe empty sequence, then it is serialized as a zero-length string (rather than as the string null).
The item-separator parameter has no effect.
Changes in 4.0 (next | previous)
The serialization of maps retains the order of entries. [Issue 1651 PR 1703 14 January 2025]
The output of QNames reflects the new syntax for QName literals. [Issue 2059 PR TODO 23 June 2025]
A JNode is represented as jnode(K: V) where K is its ·selector· property and V is its ·content· property. [Issues 2087 2186 PRs 2114 2226 22 June 2025]
The Adaptive output method serializes the input tree into a human readable form for the purposes of debugging query results. The intention of this is to allow any input value to be serialized without raising a serialization error. Sequence normalization is not performed for this output method.
Each item in the supplied sequence is serialized individually as follows, with an occurrence of the chosen item-separator between successive items.
A JNode is serialized as follows:
If the ·selector· property is absent (that is, for a parentless JNode), in the format jtree(V) where V is the adaptive serialization of the value of its ·content· property.
For example, the JNode selected by the expression jtree({ "a": [4, 5] }) is serialized as jtree({"a":[4,5]}).
If the ·selector· property is present, then in the format jnode(K: V) where K is the adaptive serialization of the value of its ·selector· property and V is the adaptive serialization of the value of its ·content· property.
For example, the JNode selected by the expression jtree({ "a": [4, 5] })/a is serialized as jnode("a":[4,5]).
A document, element, text, comment, or processing instruction node is serialized using the XML output method described in 5 XML Output Method.
An attribute or namespace node is serialized as if it had a containing element node. For example an attribute node might be serialized as the string xsi:type="xs:integer"; a namespace node might be serialized as xmlns:sns="http://example.com/sample-namespace".
Note:
This may result in output of QNames containing prefixes whose binding is not displayed.
An atomic itemXP is serialized as follows:
An instance of xs:boolean is serialized as true() or false().
An instance of xs:string, xs:untypedAtomic or xs:anyURI is serialized by enclosing the value in double quotation marks and doubling any quotes within the value; or optionally by enclosing the value in apostrophes and doubling any apostrophes within the value. The resulting value is then serialized using the Text output method described in 8 Text Output Method.
Note:
The Text output method will apply character expansion and encoding rules to this string as specified by the serialization parameters.
An instance of xs:integer or xs:decimal is serialized by converting the value to a string using the fn:string function.
An instance of xs:double is serialized by applying the function format-number(?, '0.0##########################e0') using the following default decimal format properties:
| Property name | Property value |
|---|---|
decimal-separator | U+002E (FULL STOP, PERIOD, .) |
exponent-separator | U+0065 (LATIN SMALL LETTER E, e) |
grouping-separator | U+002C (COMMA, ,) |
zero-digit | U+0030 (DIGIT ZERO, 0) |
digit | U+0023 (NUMBER SIGN, #) |
infinity | The string "INF" |
NaN | The string "NaN" |
minus-sign | U+002D (HYPHEN-MINUS, -) |
An instance of xs:NOTATION is serialized as a URI-qualified name (that is, in the form Q{uri}local).
An instance of xs:QName is serialized with a # character, followed by:
the local name if the name is in no namespace, or
the URI-qualified name otherwise (Q{uri}local).
An atomic item of any other type is serialized using the syntax of a constructor function: xs:TYPE("VAL") where TYPE is the name of the primitive type, and VAL is the result of applying the fn:string() function. For example, xs:date("2015-07-17"). The resulting string is then serialized using the Text output method described in 8 Text Output Method.
An array item is serialized using the syntax of a SquareArrayConstructorXP, that is as [member, member, ... ]. The members, which in general are sequences, are serialized in the form (item, item, ...) where the items are serialized by applying these rules recursively. The items are separated by commas (not by the item-separator character). The enclosing parentheses are optional if the sequence has length one.
Note:
The serializer should avoid outputting the parentheses if it is able to determine the length of the sequence before serializing the first item; but it is allowed to output parentheses around a singleton if this avoids buffering data in memory.
A map item is serialized using the syntax of a [XPath 4.0] section MapConstructor without the optional map keyword, that is in the format {key:value, key:value, ...}. The key/value pairs in the serialized output retain the entry orderDM of entries in the map. The key is serialized by applying the rules for serializing an atomic item. The values are serialized in the same way as the members of an array (see above).
A function item is serialized to the representation name#A where fn:name is a representation of the function name and A is the arity. If the function name is in one of the namespaces http://www.w3.org/2005/xpath-functions, http://www.w3.org/2005/xpath-functions/math, http://www.w3.org/2005/xpath-functions/map, http://www.w3.org/2005/xpath-functions/array or http://www.w3.org/2001/XMLSchema, then the name is output as a lexical QName using the conventional prefix fn, math, map, array, or xs as appropriate; if it is in any other namespace or in no namespace, then the name is output as a URI-qualified name (that is, Q{uri}local). If the function is anonymous, name is replaced by the string (anonymous-function).
Note:
The following examples illustrate this rule:
fn:exists#1 is serialized as function fn:exists#1
Q{http://www.w3.org/2005/xpath-functions}exists#1 is serialized as fn:exists#1
function($a) { $a } is serialized as (anonymous-function)#1
math:pi#0 is serialized as math:pi#0
Character maps are applied (a) when nodes are serialized using the XML output method, and (b) to any value represented as a string enclosed in quotation marks.
Optionally, in all the above constructs, characters whose visual representation is ambiguous (for example tab or non-breaking-space) may be represented in the form of an XML numeric character reference (for example 	 or  )
Note:
In many cases the serialization of an item conforms to the syntax of an XQuery expression whose result is that item. There are exceptions, however. For example, the syntax will not be valid XQuery in the case of free-standing attribute or namespace nodes, or QName values, or anonymous functions; and where it is valid XQuery, the result of evaluating the expression will not necessarily be identical to the original: for example, the distinction between strings and untypedAtomic items is lost.
If any value cannot be output because doing so would cause a serialization error, the behavior is implementation-defined.
If the output is sent to a destination that allows hyperlinks to be included in the generated text, then the serializer may include implementation-dependent hyperlinks to provide additional information for example:
to allow the type of atomic itemsXP to be ascertained.
to allow the namespace binding of prefixes to be ascertained.
to provide further information about the cause of error indicators.
The term array item is defined in [XDM 4.0] section 8.3 Array Items.
The term atomization is defined in [XPath 4.0] section 2.6.3 Atomization.
The term character is defined in [XDM 4.0] section 4.1.5 XML and XSD Versions.
The term codepoint is defined in [XDM 4.0] section 4.1.5 XML and XSD Versions.
The term content has the same meaning as the term ContentXML defined in 3.1 Start-Tags, End-Tags, and Empty-Element TagsXML of [XML10].
The following XHTML elements have an EMPTY content model: area, base, br, col, embed, hr, img, input, link, meta, basefont, frame, isindex, and param.
The term expanded QName is defined in [XPath 4.0] section 2 Basics. An expanded QName consists of an optional namespace URI and a local name. An expanded QName also retains its original namespace prefix (if any), to facilitate casting the expanded QName into a string.
An element node is expected to be empty if it is recognized as an HTML element and:
With HTML5, the element is a void element.
Prior to HTML5, the content model is EMPTY.
The term function item is defined in [XDM 4.0] section 8.1 Function Items.
A host language is another specification that includes, by reference, this specification and all of its requirements. A host language might be a programming language such as [XSLT 4.0] or [XQuery 4.0], or it might be an application programming interface (API) intended to be used by programs written in some other high-level programming language. The use of the term language is not intended to preclude the possibility that this specification might be referenced outside the context of a programming language specification.
The immediate content of an element is the part of the content of the element that is not also in the content of a child element of that element.
Implementation-defined indicates an aspect that may differ between serializers, but whose actual behavior must be specified either by another specification that sets conformance criteria for serialization (see 12 Conformance) or in documentation that accompanies the serializer.
Implementation-dependent indicates an aspect that may differ between serializers, and whose actual behavior is not required to be specified either by another specification that sets conformance criteria for serialization (see 12 Conformance) or in documentation that accompanies the serializer.
In general the output of the serializer will represent the items actually present in the input value, together with other items that are reachable from these, for example (in the case of nodes) their descendants. The complete set of items that are represented in the output of the serializer is referred to (without loss of generality) as the input tree.
The XDM value supplied as input to the serializer is referred to as the input value.
The term map item is defined in [XDM 4.0] section 8.2 Map Items.
the MathML namespace namespace, https://www.w3.org/1998/Math/MathML.
The term node is defined in [XPath 4.0] section 2 Basics. There are seven kinds of nodes in the data model: document, element, attribute, text, namespace, processing instruction, and comment.
An element or attribute that does not have a null namespace URI, is referred to as having a non-null namespace URI
An expanded-QName whose namespace part is anthe empty sequence, or an element or attribute whose name expands to such an expanded-QName, is referred to as having a null namespace URI
the Output declaration namespace, https://www.w3.org/2010/xslt-xquery-serialization
An output:serialization-parameters element node used to hold the settings of serialization parameters is referred to as a parameter document
During prefix normalization, any element node in the input tree that is in one of the XHTML namespace, the SVG namespace or the MathML namespace has its name replaced by the local part of its name. Such an element node is given a default namespace node whose value is the element’s namespace URI. Any namespace node for any of those three namespaces that was previously present on any element node in the input tree is also removed, unless the prefix that that namespace node declared is used as the prefix on the name of an attribute on that element or an ancestor of that element.
The term prior to HTML5 is used in this specification to qualify rules that apply only when the effective version of the html-version serialization parameter is less than 5.0.
An element node is recognized as an HTML element by the XHTML output method if either of the following conditions is true:
the element node is in the XHTML namespace; or
With HTML5: the element has a null namespace URI and the local part of the name is equal to the name of an element defined by HTML5 [HTML5], making the comparison without regard to case.
A reconstructed tree may be constructed by parsing the XML document and converting it into an document node as specified in [XDM 4.0].
The requested HTML version is the value of the html-version serialization parameter if present; otherwise the value of the version serialization parameter if present; otherwise 5.0.
The result of the sequence normalization process is a result tree.
The term sequence is defined in [XPath 4.0] section 2 Basics. A sequence is an ordered collection of zero or more items.
The purpose of sequence normalization is to create a sequence that can be serialized as a well-formed XML document or external general parsed entity, that also reflects the content of the input sequence to the extent possible.
In some instances, the input tree cannot be successfully converted into a sequence of octets given the set of serialization parameter (3 Serialization Parameters) values specified. A serialization error is said to occur in such an instance.
An element node is serialized as an HTML element if
the expanded QName of the element has a null namespace URI, or
the requested HTML version is 5.0 or greater, and the element node is in the XHTML namespace.
As is indicated in 12 Conformance, conformance criteria for serialization are determined by other specifications that refer to this specification. A serializer is software that implements some or all of the requirements of this specification in accordance with such conformance criteria.
The term string is defined in [XDM 4.0] section 4.1.5 XML and XSD Versions.
The term string value is defined in [XDM 4.0] section 7.6.12 string-value Accessor. Every node has a string value. For example, the string value of an element is the concatenation of the string values of all its descendant text nodes.
the SVG namespace, https://www.w3.org/2000/svg
Whenever a value is serialized to a JSON string, the following procedure is applied to the supplied string:
Unicode Normalization is the process of removing alternative representations of equivalent sequences from textual data, to convert the data into a form that can be binary-compared for equivalence, as specified in [UAX #15: Unicode Normalization Forms]. For specific recommendations for character normalization on the World Wide Web, see [Character Model for the World Wide Web 1.0: Normalization].
The values of attributes listed in D List of URI Attributes are URI attribute values. Attributes are not considered to be URI attributes simply because they are namespace declaration attributes or have the type annotation xs:anyURI.
URI escaping consists of the following three steps applied in sequence to the content of URI attribute values:
The void elements of HTML5 are area, base, br, col, embed, hr, img, input, keygen, link, meta, param, source, track and wbr.
A space character, TAB character, CR character or NL character is referred to as a whitespace character.
The term with HTML5 is used in this specification to qualify rules that apply only when the effective version of the html-version serialization parameter is 5.0.
Where this specification indicates that two strings are to be compared without regard to case, the serializermust translate any characters in the range U+0041 (LATIN CAPITAL LETTER A, A) through U+005A (LATIN CAPITAL LETTER Z, Z) inclusive, to the corresponding lower-case letters in the range U+0061 (LATIN SMALL LETTER A, a) through U+007A (LATIN SMALL LETTER Z, z) only for the purposes of making the comparison. The comparison succeeds if the two strings are the same length and the code point of each character in the first string is equal to the code point of the character in the corresponding position in the second string.
the XHTML namespace namespace, https://www.w3.org/1999/xhtml
The portion of the serialized document representing the result of serializing an element that is not to be serialized as an HTML element is known as an XML Island.
the XML namespace, https://www.w3.org/XML/1998/namespace