View Old View New View Both View Only Previous Next

This draft contains only sections that have differences from the version that it modified.

W3C

XPath and XQuery Functions and Operators 4.0

W3C Editor's Draft 23 February 2026

This version:
https://qt4cg.org/specifications/xpath-functions-40/
Latest version of XPath and XQuery Functions and Operators 4.0:
https://qt4cg.org/specifications/xpath-functions-40/
Most recent Recommendation of XPath and XQuery Functions and Operators:
https://www.w3.org/TR/2017/REC-xpath-functions-31-20170321/
Editor:
Michael Kay, Saxonica <http://www.saxonica.com/>

Please check the errata for any errors or issues reported since publication.

See also translations.

This document is also available in these non-normative formats: Specification in XML format and XML function catalog.


Abstract

This document defines constructor functions, operators, and functions on the datatypes defined in [XML Schema Part 2: Datatypes Second Edition] and the datatypes defined in [XQuery and XPath Data Model (XDM) 3.1]. It also defines functions and operators on nodes and node sequences as defined in the [XQuery and XPath Data Model (XDM) 3.1]. These functions and operators are defined for use in [XML Path Language (XPath) 4.0] and [XQuery 4.0: An XML Query Language] and [XSL Transformations (XSLT) Version 4.0] and other related XML standards. The signatures and summaries of functions defined in this document are available at: http://www.w3.org/2005/xpath-functions/.

A summary of changes since version 3.1 is provided at H Changes since 3.1.

Status of this Document

This version of the specification is work in progress. It is produced by the QT4 Working Group, officially the W3C XSLT 4.0 Extensions Community Group. Individual functions specified in the document may be at different stages of review, reflected in their History notes. Comments are invited, in the form of GitHub issues at https://github.com/qt4cg/qtspecs.

Dedication

The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).


15 Parsing and serializing

These functions convert between the lexical representation and XPath and XQuery data model representation of various file formats.

15.2 Functions on HTML Data

Changes in 4.0  

  1. A new function is available for processing input data in HTML format.   [Issues 74 850 1799 1889 1891 PRs 259 956 10 January 2023]

This function converts between the lexical representation of HTML and the XDM tree representation.

FunctionMeaning
fn:parse-htmlThis function takes as input an HTML document, and returns the document node at the root of an XDM tree representing the parsed document.
fn:html-docReads an external resource containing HTML, and returns the result of parsing the resource as HTML.

15.2.2 fn:parse-html

Changes in 4.0  

  1. New in 4.0  [Issues 74 850 1799 1889 1891 PRs 259 956 10 January 2023]

Summary

This function takes as input an HTML document, and returns the document node at the root of an XDM tree representing the parsed document.

Signature
fn:parse-html(
$valueas (xs:string | xs:hexBinary | xs:base64Binary)?,
$optionsas map(*)?:= {}
) as document-node(*:html)?
Properties

This function is nondeterministic, context-independent, and focus-independent.

Rules

If $value is the empty sequence the function returns the empty sequence.

In other cases, $value is expected to contain an HTML document supplied either as a string or a binary value.

The entries that may appear in the $options map are as follows:

record(
encoding?as xs:string?,
fail-on-error?as xs:boolean?,
include-template-content?as xs:boolean?
)
KeyValueMeaning

encoding?

The character encoding to use to decode a sequence of octets that represents an HTML document.

  • Type: xs:string?

fail-on-error?

Indicates whether the function should fail with a dynamic error if the input is not syntactically valid.

  • Type: xs:boolean?

  • Default: false()

false Parsing errors should be handled as described in [HTML: Living Standard] section 13.2.2, Parse Errors.
true A parsing error should result in the function failing with a dynamic error.

include-template-content?

Defines how to handle elements in the HTMLTemplateElement.content property.

If this option is true, the template element’s children are the children of the content property’s document fragment node.

If this option is false, the template element’s children are the empty sequence.

The default behaviour is implementation-defined.

Note:

This allows an implementation to support the behaviour defined in [HTML: Living Standard] section 4.12.3.1, Interaction of template elements with XSLT and XPath:

  1. This option would default to true for an XSLT processor operating on an HTML DOM constructed from an XHTML document.

  2. This option would default to false for an XPath processor using the [DOM: Living Standard] section 8, XPath APIs.

  • Type: xs:boolean?

The option parameter conventions apply.

If $value is not the empty sequence, an input byte stream is constructed as follows:

  1. If $value is an xs:string, then in principle no decoding is needed. Conceptually, however, the HTML parsing algorithm always starts by decoding an octet stream. The string is therefore first encoded using UTF-8, and the resulting octet stream is then passed to the HTML parser with a known definite encoding of UTF-8, as described in [HTML: Living Standard] section 13.2.3.1, Parsing with a known character encoding.

    If the first codepoint of the string is U+FEFF, this should be stripped, since it might otherwise lead to an incorrect encoding inference.

  2. If the type of $value is a sequence of octets (xs:hexBinary or xs:base64Binary) the encoding of the input byte stream is determined in a way consistent with [HTML: Living Standard] section 13.2.3.2, Determining the character encoding:

    1. The encoding key of $options is interpreted in step 2 of Determining the character encoding as the user instructing the user agent to override the document’s character encoding with the specified encoding.

    2. If the encoding key of $options is not specified, step 2 of Determining the character encoding is skipped.

The resulting byte stream is then used to construct an XDM representation of the HTML document in a way that is equivalent to:

  1. Tokenizing the byte stream according to the HTML parsing algorithm as described in [HTML: Living Standard] section 13.2.5, Tokenization.

  2. Constructing a HTMLDocument object for HTML documents, or an XMLDocument for XML/XHTML documents as described in [HTML: Living Standard] section 13.2.6, Tree construction.

  3. Building an XDM representation of the HTMLDocument or XMLDocument according to the rules in 15.2.1 XDM Mapping from HTML DOM Nodes.

The implementation should process any input HTML that adheres to the current practice of mainstream web browsers, as this evolves over time. Since this is defined by a “living standard” (see [HTML: Living Standard]), no specific version is prescribed. An implementation may define additional options to control aspects of the HTML parsing algorithm, including the selection of a specific HTML parsing library; it may also provide options to process alternative HTML versions or dialects.

The implementation should recognize and process XHTML (referred to in [HTML: Living Standard] as the XML concrete syntax of HTML).

The function is nondeterministic with respect to node identity: that is, if the function is called twice with the same arguments, it is implementation-dependent whether the same node is returned on both occasions.

Error Conditions

A dynamic error is raised [err:FODC0011] if the content of $value is not a well-formed HTML document.

A dynamic error is raised [err:FODC0012] if the method key of $options is not supported by the implementation.

A dynamic error is raised [err:FODC0012] if a key passed to $options, or the value of that key, is not supported by the implementation.

Notes

If the HTML parser accepts a string as the input then that may be used directly when $value is an xs:string instead of converting the string to a sequence of octets in an implementation-dependent encoding. The HTML parser must not perform character encoding processing on that input, treating the HTML string as being in a known character encoding that matches the encoding of the string.

The WHATWG Encoding specification defines the ISO 8859-1 (latin1) and ASCII encodings as aliases of the windows-1252 encoding.

Examples

The expression parse-html(()) returns ().

The expression parse-html("<p>Hello</p>") returns an XDM document node equivalent to the result of parsing the XML <html xmlns='http://www.w3.org/1999/xhtml'><head/><body><p>Hello</p></body></html>

The expression parse-html("<p>Hi</p>", method:="html") is equivalent to parse-html("<p>Hi</p>").

B Error codes

The error text provided with these errors is non-normative.

err:FOAP0001, Wrong number of arguments.

Raised when fn:apply is called and the arity of the supplied function is not the same as the number of members in the supplied array.

err:FOAR0001, Division by zero.

This error is raised whenever an attempt is made to divide by zero.

err:FOAR0002, Numeric operation overflow/underflow.

This error is raised whenever numeric operations result in an overflow or underflow.

err:FOAY0001, Array index out of bounds.

This error is raised when an integer used to select a member of an array is outside the range of values for that array.

err:FOAY0002, Negative array length.

This error is raised when the $length argument to array:subarray is negative.

err:FOCA0001, Input value too large for decimal.

Raised when casting to xs:decimal if the supplied value exceeds the implementation-defined limits for the datatype.

err:FOCA0002, Invalid lexical value.

Raised by fn:resolve-QName and fn:QName when a supplied value does not have the lexical form of a QName or URI respectively; and when casting to decimal, if the supplied value is NaN or Infinity.

err:FOCA0003, Input value too large for integer.

Raised when casting to xs:integer if the supplied value exceeds the implementation-defined limits for the datatype.

err:FOCA0005, NaN supplied as float/double value.

Raised when multiplying or dividing a duration by a number, if the number supplied is NaN.

err:FOCA0006, String to be cast to decimal has too many digits of precision.

Raised when casting a string to xs:decimal if the string has more digits of precision than the implementation can represent (the implementation also has the option of rounding).

err:FOCH0001, Codepoint not valid.

Raised by fn:codepoints-to-string if the input contains an integer that is not the codepoint of a permitted character.

err:FOCH0002, Unsupported collation.

Raised by any function that uses a collation if the requested collation is not recognized.

err:FOCH0003, Unsupported normalization form.

Raised by fn:normalize-unicode if the requested normalization form is not supported by the implementation.

err:FOCH0004, Collation does not support collation units.

Raised by functions such as fn:contains if the requested collation does not operate on a character-by-character basis.

err:FOCH0005, Unrecognized or invalid character name.

Raised by fn:char if the supplied character name is not recognized, or if it represents a codepoint that is not a permitted character.

err:FOCV0001, CSV field quoting error.

Raised when parsing CSV input if a syntax error in the input CSV is found.

err:FOCV0002, Invalid CSV delimiter error.

Raised when parsing CSV input if the field-separator, record-separator, or quote-character option is set to an invalid value.

err:FOCV0003, Duplicate CSV delimiter error.

Raised when parsing CSV input if the same delimiter character is assigned to more than one role.

err:FOCV0004, Argument supplied is not a known column name.

Raised by the function from the get entry of csv-columns-record, if its $key argument is an xs:string and is not one of the known column names.

err:FODC0001, No context document.

Raised by fn:id, fn:idref, and fn:element-with-id if the node that identifies the tree to be searched is a node in a tree whose root is not a document node.

err:FODC0002, Error retrieving resource.

Raised by fn:doc, fn:collection, and fn:uri-collection to indicate that either the supplied URI cannot be dereferenced to obtain a resource, or the resource that is returned is not parseable as XML.

err:FODC0003, Function not defined as deterministic.

Raised by fn:doc, fn:collection, and fn:uri-collection to indicate that it is not possible to return a result that is guaranteed deterministic.

err:FODC0004, Invalid collection URI.

Raised by fn:collection and fn:uri-collection if the argument is not a valid xs:anyURI.

err:FODC0005, Invalid URI reference.

Raised (optionally) by fn:doc if the argument is not a valid xs:anyURI.

err:FODC0006, String passed to fn:parse-xml is not a well-formed XML document.

Raised by fn:parse-xml if the supplied string is not a well-formed and namespace-well-formed XML document; or if DTD validation is requested and the document is not valid against its DTD.

err:FODC0007, String passed to fn:parse-xml is not a DTD-valid XML document.

Raised by fn:parse-xml if DTD validation is requested and the supplied string has no DTD or is not valid against the DTD.

err:FODC0008, Invalid value for the xsd-validation option of fn:parse-xml.

Raised when the xsd-validation option to fn:parse-xml is supplied, and the value is not one of the permitted values; for example if the option type Q{U}NNN is used, and Q{U}NNN does not identify a type in the static context.

err:FODC0009, Processor is not schema-aware.

Raised when the xsd-validation option to fn:parse-xml is set to a value other than skip, if the processor is not schema-aware.

err:FODC0010, The processor does not support serialization.

Raised when fn:serialize is called and the processor does not support serialization, in cases where the host language makes serialization an optional feature.

err:FODC0011, String passed to fn:parse-html is not a well-formed HTML document.

Raised by fn:parse-html if the supplied string is not a well-formed HTML document.

err:FODC0012, Unsupported HTML parser option.

Raised by fn:parse-html if a key passed to $options, or its value, is not supported by the implementation.

err:FODC0013, No validating XML parser available.

Raised when the dtd-validation option to fn:parse-xml is set, if no validating XML parser is available. Note: it is recommended that all processors should support the dtd-validation option, but there may be environments (such as web browsers) where this is not practically feasible.

err:FODC0014, String passed to fn:parse-xml is not a schema-valid XML document.

Raised by fn:parse-xml if XSD validation is requested and the XML document represented by the supplied string is not valid against the relevant XSD schema.

err:FODC0015, Unable to compile schema for fn:xsd-validator.

Raised by fn:xsd-validator if it is not possible to assemble a valid and consistent schema.

err:FODF1280, Invalid decimal format name.

This error is raised if the decimal format name supplied to fn:format-number is not a valid QName, or if the prefix in the QName is undeclared, or if there is no decimal format in the static context with a matching name.

err:FODF1290, Invalid decimal format property.

This error is raised if a decimal format value supplied to fn:format-number is not valid for the associated property, or if the properties of the decimal format resulting from a supplied map do not have distinct values.

err:FODF1310, Invalid decimal format picture string.

This error is raised if the picture string supplied to fn:format-number or fn:format-integer has invalid syntax.

err:FODT0001, Overflow/underflow in date/time operation.

Raised when casting to date/time datatypes, or performing arithmetic with date/time values, if arithmetic overflow or underflow occurs.

err:FODT0002, Overflow/underflow in duration operation.

Raised when casting to duration datatypes, or performing arithmetic with duration values, if arithmetic overflow or underflow occurs.

err:FODT0003, Invalid timezone value.

Raised by adjust-date-to-timezone and related functions if the supplied timezone is invalid.

err:FODT0004, No timezone data available

Raised by civil-timezone if no timezone data is available for the given date/time and place.

err:FOER0000, Unidentified error.

Error code used by fn:error when no other error code is provided.

err:FOFD1340, Invalid date/time formatting parameters.

This error is raised if the picture string or calendar supplied to fn:format-date, fn:format-time, or fn:format-dateTime has invalid syntax.

err:FOFD1350, Invalid date/time formatting component.

This error is raised if the picture string supplied to fn:format-date selects a component that is not present in a date, or if the picture string supplied to fn:format-time selects a component that is not present in a time.

err:FOHA0001, Invalid algorithm.

Raised by fn:hash if the effective value of the supplied algorithm is not one of the values supported by the implementation.

err:FOJS0001, JSON syntax error.

Raised by functions such as fn:json-doc, fn:parse-json or fn:json-to-xml if the string supplied as input does not conform to the JSON grammar (optionally with implementation-defined extensions).

err:FOJS0003, JSON duplicate keys.

Raised by functions such as map:merge, fn:json-doc, fn:parse-json or fn:json-to-xml if the input contains duplicate keys, when the chosen policy is to reject duplicates.

err:FOJS0004, JSON: not schema-aware.

Raised by fn:json-to-xml if validation is requested when the processor does not support schema validation or typed nodes.

err:FOJS0005, Invalid options.

Raised by functions such as map:merge, fn:parse-json, and fn:xml-to-json if the $options map contains an invalid entry.

err:FOJS0006, Invalid XML representation of JSON.

Raised by fn:xml-to-json if the XML input does not conform to the rules for the XML representation of JSON.

err:FOJS0007, Bad JSON escape sequence.

Raised by fn:xml-to-json if the XML input uses the attribute escaped="true" or escaped-key="true", and the corresponding string or key contains an invalid JSON escape sequence.

err:FOJS0008, Cannot convert element to map.

Raised by fn:element-to-map if the layout selected for converting elements of a given name is unsuitable for an element node with that name, or if the conversion plan explicitly defines the processing of a particular element as an error.

err:FONS0004, No namespace found for prefix.

Raised by fn:resolve-QName and analogous functions if a supplied QName has a prefix that has no binding to a namespace.

err:FONS0005, Base-uri not defined in the static context.

Raised by fn:resolve-uri if no base URI is available for resolving a relative URI.

err:FOPA0001, Origin node is not an ancestor of the target node.

Raised by fn:path if the node supplied in the origin option is not an ancestor of the $node whose relative path is required.

err:FOQM0001, Module URI is a zero-length string.

Raised by fn:load-xquery-module if the supplied module URI is zero-length.

err:FOQM0002, Module URI not found.

Raised by fn:load-xquery-module if no module can be found with the supplied module URI.

err:FOQM0003, Static error in dynamically loaded XQuery module.

Raised by fn:load-xquery-module if a static error (including a statically detected type error) is encountered when processing the library module.

err:FOQM0005, Parameter for dynamically loaded XQuery module has incorrect type.

Raised by fn:load-xquery-module if a value is supplied for the initial context item or for an external variable, and the value does not conform to the required type declared in the dynamically loaded module.

err:FOQM0006, No suitable XQuery processor available.

Raised by fn:load-xquery-module if no XQuery processor is available supporting the requested XQuery version (or if none is available at all).

err:FORG0001, Invalid value for cast/constructor.

A general-purpose error raised when casting, if a cast between two datatypes is allowed in principle, but the supplied value cannot be converted: for example when attempting to cast the string "nine" to an integer.

err:FORG0002, Invalid argument to fn:resolve-uri.

Raised when either argument to fn:resolve-uri is not a valid URI/IRI.

err:FORG0003, fn:zero-or-one called with a sequence containing more than one item.

Raised by fn:zero-or-one if the supplied value contains more than one item.

err:FORG0004, fn:one-or-more called with a sequence containing no items.

Raised by fn:one-or-more if the supplied value is an empty sequence.

err:FORG0005, fn:exactly-one called with a sequence containing zero or more than one item.

Raised by fn:exactly-one if the supplied value is not a singleton sequence.

err:FORG0006, Invalid argument type.

Raised by functions such as fn:max, fn:min, fn:avg, fn:sum if the supplied sequence contains values inappropriate to this function.

err:FORG0008, The two arguments to fn:dateTime have inconsistent timezones.

Raised by fn:dateTime if the two arguments both have timezones and the timezones are different.

err:FORG0009, Error in resolving a relative URI against a base URI in fn:resolve-uri.

A catch-all error for fn:resolve-uri, recognizing that the implementation can choose between a variety of algorithms and that some of these may fail for a variety of reasons.

err:FORG0010, Invalid date/time.

Raised when the input to fn:parse-ietf-date does not match the prescribed grammar, or when it represents an invalid date/time such as 31 February.

err:FORG0011, Invalid radix.

Raised when the radix supplied to fn:parse-integer is not in the range 2 to 36.

err:FORG0012, Invalid digits.

Raised when the digits in the string supplied to fn:parse-integer are not in the range appropriate to the chosen radix.

err:FORX0001, Invalid regular expression flags.

Raised by regular expression functions such as fn:matches and fn:replace if the regular expression flags contain a character other than i, m, q, s, or x.

err:FORX0002, Invalid regular expression.

Raised by regular expression functions such as fn:matches and fn:replace if the regular expression is syntactically invalid.

err:FORX0003, Regular expression matches zero-length string.

For functions such as fn:replace and fn:tokenize, raises an error if the supplied regular expression is capable of matching a zero length string.

err:FORX0003
(Error code unused?)
err:FORX0004, Invalid replacement string.

Raised by fn:replace to report errors in the replacement string.

err:FORX0005, Incompatible arguments for fn:replace.

Raised by fn:replace if both the $replacement and $action arguments are supplied.

err:FOTY0012, Argument to fn:data contains a node that does not have a typed value.

Raised by fn:data, or by implicit atomization, if applied to a node with no typed value, the main example being an element validated against a complex type that defines it to have element-only content.

err:FOTY0013, The argument to fn:data contains a function item.

Raised by fn:data, or by implicit atomization, if the sequence to be atomized contains a function item other than an array.

err:FOTY0014, The argument to fn:string is a function item.

Raised by fn:string, or by implicit string conversion, if the input sequence contains a function item.

err:FOUR0001, Invalid IPv6/IPvFuture authority

A dynamic error is raised if the authority component of a URI contains an open square bracket but no corresponding close square bracket.

err:FOUT1170, Invalid URI reference.

Raised by fn:unparsed-text or fn:unparsed-text-lines if the $source argument contains a fragment identifier, or if it cannot be resolved to an absolute URI (for example, because the base-URI property in the static context is absent), or if it cannot be used to retrieve the string representation of a resource.

err:FOUT1190, Cannot decode external resource.

Raised by fn:unparsed-text or fn:unparsed-text-lines if the $encoding argument is not a valid encoding name, if the processor does not support the specified encoding, if the string representation of the retrieved resource contains octets that cannot be decoded into Unicode characters using the specified encoding, or if the resulting characters are not permitted characters.

err:FOUT1200, Cannot infer encoding of external resource.

Raised by fn:unparsed-text or fn:unparsed-text-lines if the $encoding argument is absent and the processor cannot infer the encoding using external information and the encoding is not UTF-8.

err:FOXT0001, No suitable XSLT processor available

A dynamic error is raised if no XSLT processor suitable for evaluating a call on fn:transform is available.

err:FOXT0002, Invalid parameters to XSLT transformation

A dynamic error is raised if the parameters supplied to fn:transform are invalid, for example if two mutually exclusive parameters are supplied. If a suitable XSLT error code is available (for example in the case where the requested initial-template does not exist in the stylesheet), that error code should be used in preference.

err:FOXT0003, XSLT transformation failed

A dynamic error is raised if an XSLT transformation invoked using fn:transform fails with a static or dynamic error. The XSLT error code is used if available; this error code provides a fallback when no XSLT error code is returned, for example because the processor is an XSLT 1.0 processor.

err:FOXT0004, XSLT transformation has been disabled

A dynamic error is raised if the fn:transform function is invoked when XSLT transformation (or a specific transformation option) has been disabled for security or other reasons.

err:FOXT0006, XSLT output contains non-accepted characters

A dynamic error is raised if the result of the fn:transform function contains characters available only in XML 1.1 and the calling processor cannot handle such characters.