Please check the errata for any errors or issues reported since publication.
See also translations.
This document is also available in these non-normative formats: Specification in XML format and XML function catalog.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
This document defines constructor functions, operators, and functions on the datatypes defined in [XML Schema Part 2: Datatypes Second Edition] and the datatypes defined in [XQuery and XPath Data Model (XDM) 3.1]. It also defines functions and operators on nodes and node sequences as defined in the [XQuery and XPath Data Model (XDM) 3.1]. These functions and operators are defined for use in [XML Path Language (XPath) 4.0] and [XQuery 4.0: An XML Query Language] and [XSL Transformations (XSLT) Version 4.0] and other related XML standards. The signatures and summaries of functions defined in this document are available at: http://www.w3.org/2005/xpath-functions/.
A summary of changes since version 3.1 is provided at H Changes since 3.1.
This version of the specification is work in progress. It is produced by the QT4 Working Group, officially the W3C XSLT 4.0 Extensions Community Group. Individual functions specified in the document may be at different stages of review, reflected in their History notes. Comments are invited, in the form of GitHub issues at https://github.com/qt4cg/qtspecs.
The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).
These functions convert between the lexical representation and XPath and XQuery data model representation of various file formats.
This function converts between the lexical representation of HTML and the XDM tree representation.
| Function | Meaning |
|---|---|
fn:parse-html | This function takes as input an HTML document, and returns the document node at the root of an XDM tree representing the parsed document. |
fn:html-doc | Reads an external resource containing HTML, and returns the result of parsing the resource as HTML. |
This function takes as input an HTML document, and returns the document node at the root of an XDM tree representing the parsed document.
fn:parse-html( | ||
$value | as , | |
$options | as | := {} |
) as | ||
This function is nondeterministic, context-independent, and focus-independent.
If $value is the empty sequence the function returns the empty sequence.
In other cases, $value is expected to contain an HTML document supplied either as a string or a binary value.
The entries that may appear in the $options map are as follows:
record( | |
encoding? | as xs:string?, |
fail-on-error? | as xs:boolean?, |
include-template-content? | as xs:boolean? |
) | |
| Key | Value | Meaning |
|---|---|---|
| The character encoding to use to decode a sequence of octets that represents an HTML document.
| |
| Indicates whether the function should fail with a dynamic error if the input is not syntactically valid.
| |
false | Parsing errors should be handled as described in [HTML: Living Standard] section 13.2.2, Parse Errors. | |
true | A parsing error should result in the function failing with a dynamic error. | |
| Defines how to handle elements in the If this option is If this option is The default behaviour is implementation-defined. Note: This allows an implementation to support the behaviour defined in [HTML: Living Standard] section 4.12.3.1, Interaction of
| |
The option parameter conventions apply.
If $value is not the empty sequence, an input byte stream is constructed as follows:
If $value is an xs:string, then in principle no decoding is needed. Conceptually, however, the HTML parsing algorithm always starts by decoding an octet stream. The string is therefore first encoded using UTF-8, and the resulting octet stream is then passed to the HTML parser with a known definite encoding of UTF-8, as described in [HTML: Living Standard] section 13.2.3.1, Parsing with a known character encoding.
If the first codepoint of the string is U+FEFF, this should be stripped, since it might otherwise lead to an incorrect encoding inference.
If the type of $value is a sequence of octets (xs:hexBinary or xs:base64Binary) the encoding of the input byte stream is determined in a way consistent with [HTML: Living Standard] section 13.2.3.2, Determining the character encoding:
The encoding key of $options is interpreted in step 2 of Determining the character encoding as the user instructing the user agent to override the document’s character encoding with the specified encoding.
If the encoding key of $options is not specified, step 2 of Determining the character encoding is skipped.
The resulting byte stream is then used to construct an XDM representation of the HTML document in a way that is equivalent to:
Tokenizing the byte stream according to the HTML parsing algorithm as described in [HTML: Living Standard] section 13.2.5, Tokenization.
Constructing a HTMLDocument object for HTML documents, or an XMLDocument for XML/XHTML documents as described in [HTML: Living Standard] section 13.2.6, Tree construction.
Building an XDM representation of the HTMLDocument or XMLDocument according to the rules in 15.2.1 XDM Mapping from HTML DOM Nodes.
The implementation should process any input HTML that adheres to the current practice of mainstream web browsers, as this evolves over time. Since this is defined by a “living standard” (see [HTML: Living Standard]), no specific version is prescribed. An implementation may define additional options to control aspects of the HTML parsing algorithm, including the selection of a specific HTML parsing library; it may also provide options to process alternative HTML versions or dialects.
The implementation should recognize and process XHTML (referred to in [HTML: Living Standard] as the XML concrete syntax of HTML).
The function is nondeterministic with respect to node identity: that is, if the function is called twice with the same arguments, it is implementation-dependent whether the same node is returned on both occasions.
A dynamic error is raised [err:FODC0011] if the content of $value is not a well-formed HTML document.
A dynamic error is raised [err:FODC0012] if the method key of $options is not supported by the implementation.
A dynamic error is raised [err:FODC0012] if a key passed to $options, or the value of that key, is not supported by the implementation.
If the HTML parser accepts a string as the input then that may be used directly when $value is an xs:string instead of converting the string to a sequence of octets in an implementation-dependent encoding. The HTML parser must not perform character encoding processing on that input, treating the HTML string as being in a known character encoding that matches the encoding of the string.
The WHATWG Encoding specification defines the ISO 8859-1 (latin1) and ASCII encodings as aliases of the windows-1252 encoding.
The expression | |
The expression | |
The expression |
The error text provided with these errors is non-normative.
Raised when fn:apply is called and the arity of the supplied function is not the same as the number of members in the supplied array.
This error is raised whenever an attempt is made to divide by zero.
This error is raised whenever numeric operations result in an overflow or underflow.
This error is raised when an integer used to select a member of an array is outside the range of values for that array.
This error is raised when the $length argument to array:subarray is negative.
Raised when casting to xs:decimal if the supplied value exceeds the implementation-defined limits for the datatype.
Raised by fn:resolve-QName and fn:QName when a supplied value does not have the lexical form of a QName or URI respectively; and when casting to decimal, if the supplied value is NaN or Infinity.
Raised when casting to xs:integer if the supplied value exceeds the implementation-defined limits for the datatype.
Raised when multiplying or dividing a duration by a number, if the number supplied is NaN.
Raised when casting a string to xs:decimal if the string has more digits of precision than the implementation can represent (the implementation also has the option of rounding).
Raised by fn:codepoints-to-string if the input contains an integer that is not the codepoint of a permitted character.
Raised by any function that uses a collation if the requested collation is not recognized.
Raised by fn:normalize-unicode if the requested normalization form is not supported by the implementation.
Raised by functions such as fn:contains if the requested collation does not operate on a character-by-character basis.
Raised by fn:char if the supplied character name is not recognized, or if it represents a codepoint that is not a permitted character.
Raised when parsing CSV input if a syntax error in the input CSV is found.
Raised when parsing CSV input if the field-separator, record-separator, or quote-character option is set to an invalid value.
Raised when parsing CSV input if the same delimiter character is assigned to more than one role.
Raised by the function from the get entry of csv-columns-record, if its $key argument is an xs:string and is not one of the known column names.
Raised by fn:id, fn:idref, and fn:element-with-id if the node that identifies the tree to be searched is a node in a tree whose root is not a document node.
Raised by fn:doc, fn:collection, and fn:uri-collection to indicate that either the supplied URI cannot be dereferenced to obtain a resource, or the resource that is returned is not parseable as XML.
Raised by fn:doc, fn:collection, and fn:uri-collection to indicate that it is not possible to return a result that is guaranteed deterministic.
Raised by fn:collection and fn:uri-collection if the argument is not a valid xs:anyURI.
Raised (optionally) by fn:doc if the argument is not a valid xs:anyURI.
Raised by fn:parse-xml if the supplied string is not a well-formed and namespace-well-formed XML document; or if DTD validation is requested and the document is not valid against its DTD.
Raised by fn:parse-xml if DTD validation is requested and the supplied string has no DTD or is not valid against the DTD.
Raised when the xsd-validation option to fn:parse-xml is supplied, and the value is not one of the permitted values; for example if the option type Q{U}NNN is used, and Q{U}NNN does not identify a type in the static context.
Raised when the xsd-validation option to fn:parse-xml is set to a value other than skip, if the processor is not schema-aware.
Raised when fn:serialize is called and the processor does not support serialization, in cases where the host language makes serialization an optional feature.
Raised by fn:parse-html if the supplied string is not a well-formed HTML document.
Raised by fn:parse-html if a key passed to $options, or its value, is not supported by the implementation.
Raised when the dtd-validation option to fn:parse-xml is set, if no validating XML parser is available. Note: it is recommended that all processors should support the dtd-validation option, but there may be environments (such as web browsers) where this is not practically feasible.
Raised by fn:parse-xml if XSD validation is requested and the XML document represented by the supplied string is not valid against the relevant XSD schema.
Raised by fn:xsd-validator if it is not possible to assemble a valid and consistent schema.
This error is raised if the decimal format name supplied to fn:format-number is not a valid QName, or if the prefix in the QName is undeclared, or if there is no decimal format in the static context with a matching name.
This error is raised if a decimal format value supplied to fn:format-number is not valid for the associated property, or if the properties of the decimal format resulting from a supplied map do not have distinct values.
This error is raised if the picture string supplied to fn:format-number or fn:format-integer has invalid syntax.
Raised when casting to date/time datatypes, or performing arithmetic with date/time values, if arithmetic overflow or underflow occurs.
Raised when casting to duration datatypes, or performing arithmetic with duration values, if arithmetic overflow or underflow occurs.
Raised by adjust-date-to-timezone and related functions if the supplied timezone is invalid.
Raised by civil-timezone if no timezone data is available for the given date/time and place.
Error code used by fn:error when no other error code is provided.
This error is raised if the picture string or calendar supplied to fn:format-date, fn:format-time, or fn:format-dateTime has invalid syntax.
This error is raised if the picture string supplied to fn:format-date selects a component that is not present in a date, or if the picture string supplied to fn:format-time selects a component that is not present in a time.
Raised by fn:hash if the effective value of the supplied algorithm is not one of the values supported by the implementation.
Raised by functions such as fn:json-doc, fn:parse-json or fn:json-to-xml if the string supplied as input does not conform to the JSON grammar (optionally with implementation-defined extensions).
Raised by functions such as map:merge, fn:json-doc, fn:parse-json or fn:json-to-xml if the input contains duplicate keys, when the chosen policy is to reject duplicates.
Raised by fn:json-to-xml if validation is requested when the processor does not support schema validation or typed nodes.
Raised by functions such as map:merge, fn:parse-json, and fn:xml-to-json if the $options map contains an invalid entry.
Raised by fn:xml-to-json if the XML input does not conform to the rules for the XML representation of JSON.
Raised by fn:xml-to-json if the XML input uses the attribute escaped="true" or escaped-key="true", and the corresponding string or key contains an invalid JSON escape sequence.
Raised by fn:element-to-map if the layout selected for converting elements of a given name is unsuitable for an element node with that name, or if the conversion plan explicitly defines the processing of a particular element as an error.
Raised by fn:resolve-QName and analogous functions if a supplied QName has a prefix that has no binding to a namespace.
Raised by fn:resolve-uri if no base URI is available for resolving a relative URI.
Raised by fn:path if the node supplied in the origin option is not an ancestor of the $node whose relative path is required.
Raised by fn:load-xquery-module if the supplied module URI is zero-length.
Raised by fn:load-xquery-module if no module can be found with the supplied module URI.
Raised by fn:load-xquery-module if a static error (including a statically detected type error) is encountered when processing the library module.
Raised by fn:load-xquery-module if a value is supplied for the initial context item or for an external variable, and the value does not conform to the required type declared in the dynamically loaded module.
Raised by fn:load-xquery-module if no XQuery processor is available supporting the requested XQuery version (or if none is available at all).
A general-purpose error raised when casting, if a cast between two datatypes is allowed in principle, but the supplied value cannot be converted: for example when attempting to cast the string "nine" to an integer.
Raised when either argument to fn:resolve-uri is not a valid URI/IRI.
Raised by fn:zero-or-one if the supplied value contains more than one item.
Raised by fn:one-or-more if the supplied value is an empty sequence.
Raised by fn:exactly-one if the supplied value is not a singleton sequence.
Raised by functions such as fn:max, fn:min, fn:avg, fn:sum if the supplied sequence contains values inappropriate to this function.
Raised by fn:dateTime if the two arguments both have timezones and the timezones are different.
A catch-all error for fn:resolve-uri, recognizing that the implementation can choose between a variety of algorithms and that some of these may fail for a variety of reasons.
Raised when the input to fn:parse-ietf-date does not match the prescribed grammar, or when it represents an invalid date/time such as 31 February.
Raised when the radix supplied to fn:parse-integer is not in the range 2 to 36.
Raised when the digits in the string supplied to fn:parse-integer are not in the range appropriate to the chosen radix.
Raised by regular expression functions such as fn:matches and fn:replace if the regular expression flags contain a character other than i, m, q, s, or x.
Raised by regular expression functions such as fn:matches and fn:replace if the regular expression is syntactically invalid.
For functions such as fn:replace and fn:tokenize, raises an error if the supplied regular expression is capable of matching a zero length string.
Raised by fn:replace to report errors in the replacement string.
Raised by fn:replace if both the $replacement and $action arguments are supplied.
Raised by fn:data, or by implicit atomization, if applied to a node with no typed value, the main example being an element validated against a complex type that defines it to have element-only content.
Raised by fn:data, or by implicit atomization, if the sequence to be atomized contains a function item other than an array.
Raised by fn:string, or by implicit string conversion, if the input sequence contains a function item.
A dynamic error is raised if the authority component of a URI contains an open square bracket but no corresponding close square bracket.
Raised by fn:unparsed-text or fn:unparsed-text-lines if the $source argument contains a fragment identifier, or if it cannot be resolved to an absolute URI (for example, because the base-URI property in the static context is absent), or if it cannot be used to retrieve the string representation of a resource.
Raised by fn:unparsed-text or fn:unparsed-text-lines if the $encoding argument is not a valid encoding name, if the processor does not support the specified encoding, if the string representation of the retrieved resource contains octets that cannot be decoded into Unicode characters using the specified encoding, or if the resulting characters are not permitted characters.
Raised by fn:unparsed-text or fn:unparsed-text-lines if the $encoding argument is absent and the processor cannot infer the encoding using external information and the encoding is not UTF-8.
A dynamic error is raised if no XSLT processor suitable for evaluating a call on fn:transform is available.
A dynamic error is raised if the parameters supplied to fn:transform are invalid, for example if two mutually exclusive parameters are supplied. If a suitable XSLT error code is available (for example in the case where the requested initial-template does not exist in the stylesheet), that error code should be used in preference.
A dynamic error is raised if an XSLT transformation invoked using fn:transform fails with a static or dynamic error. The XSLT error code is used if available; this error code provides a fallback when no XSLT error code is returned, for example because the processor is an XSLT 1.0 processor.
A dynamic error is raised if the fn:transform function is invoked when XSLT transformation (or a specific transformation option) has been disabled for security or other reasons.
A dynamic error is raised if the result of the fn:transform function contains characters available only in XML 1.1 and the calling processor cannot handle such characters.