XPath and XQuery Functions and Operators 4.0

1 Introduction

Changes in 4.0 (next)

If a section of this specification has been updated since version 3.1, an overview of the changes is provided, along with links to navigate to the next or previous change.
Sections with significant changes are marked with a ✭ symbol in the table of contents. New functions are indicated by ✚.

The purpose of this document is to define functions and operators for inclusion in XPath 4.0, XQuery 4.0, and XSLT 4.0. The exact syntax used to call these functions and operators is specified in [XML Path Language (XPath) 4.0], [XQuery 4.0: An XML Query Language] and [XSL Transformations (XSLT) Version 4.0].

This document defines three classes of functions:

General purpose functions, available for direct use in user-written queries, stylesheets, and XPath expressions, whose arguments and results are values defined by the [XQuery and XPath Data Model (XDM) 4.0].
Constructor functions, used for creating instances of a datatype from values of (in general) a different datatype. These functions are also available for general use; they are named after the datatype that they return, and they always take a single argument.
Functions that specify the semantics of operators defined in [XML Path Language (XPath) 4.0] and [XQuery 4.0: An XML Query Language]. These exist for specification purposes only, and are not intended for direct calling from user-written code.

[XML Schema Part 2: Datatypes Second Edition] defines a number of primitive and derived datatypes, collectively known as built-in datatypes. This document defines functions and operations on these datatypes as well as the other types (for example, nodes and sequences of nodes) defined in 2.7 Schema Information ^DM31 of the [XQuery and XPath Data Model (XDM) 4.0]. These functions and operations are available for use in [XML Path Language (XPath) 4.0], [XQuery 4.0: An XML Query Language] and any other host language that chooses to reference them. In particular, they may be referenced in future versions of XSLT and related XML standards.

[XSD 1.1 Part 2] adds to the datatypes defined in [XML Schema Part 2: Datatypes Second Edition]. It introduces a new derived type xs:dateTimeStamp, and it incorporates as built-in types the two types xs:yearMonthDuration and xs:dayTimeDuration which were previously XDM additions to the type system. In addition, XSD 1.1 clarifies and updates many aspects of the definitions of the existing datatypes: for example, it extends the value space of xs:double to allow both positive and negative zero, and extends the lexical space to allow +INF; it modifies the value space of xs:Name to permit additional Unicode characters; it allows year zero and disallows leap seconds in xs:dateTime values; and it allows any character string to appear as the value of an xs:anyURI item. Implementations of this specification may support either XSD 1.0 or XSD 1.1 or both.

In some cases, this specification references XSD for the semantics of operations such as the effect of matching using regular expressions, or conversion of atomic items to strings. In most such cases there is no intended technical difference between the XSD 1.0 and XSD 1.1 specifications, but the 1.1 version often provides clearer explanations and sometimes also corrects technical errors. In such cases this specification often chooses to reference the XSD 1.1 specification. This should not be taken as implying that it is necessary to invoke an XSD 1.1 processor.

References to specific sections of some of the above documents are indicated by cross-document links in this document. Each such link consists of a pointer to a specific section followed a superscript specifying the linked document. The superscripts have the following meanings: XQ [XQuery 4.0: An XML Query Language], XT [XSL Transformations (XSLT) Version 4.0], XP [XML Path Language (XPath) 4.0], and DM [XQuery and XPath Data Model (XDM) 4.0].

1.2 Conformance

Changes in 4.0 (next | previous)

Higher-order functions are no longer an optional feature. [Issue 205 PR 326 1 February 2023]

This recommendation contains a set of function specifications. It defines conformance at the level of individual functions. An implementation of a function conforms to a function specification in this recommendation if all the following conditions are satisfied:

For all combinations of valid inputs to the function (both explicit arguments and implicit context dependencies), the result of the function meets the mandatory requirements of this specification.
For all invalid inputs to the function, the implementation raises (in some way appropriate to the calling environment) a dynamic error.
For a sequence of calls within the same execution scope, the requirements of this recommendation regarding the determinism of results are satisfied (see 1.9.5 Properties of functions).

Other recommendations (“host languages”) that reference this document may dictate:

Subsets or supersets of this set of functions to be available in particular environments;
Mechanisms for invoking functions, supplying arguments, initializing the static and dynamic context, receiving results, and handling errors;
A concrete realization of concepts such as execution scope;
Which versions of other specifications referenced herein (for example, XML, XSD, or Unicode) are to be used.

Any behavior that is discretionary (implementation-defined or implementation-dependent) in this specification may be constrained by a host language.

Note:

Adding such constraints in a host language, however, is discouraged because it makes it difficult to reuse implementations of the function library across host languages.

This specification allows flexibility in the choice of versions of specifications on which it depends:

It is implementation-defined which version of Unicode is supported, but it is recommended that the most recent version of Unicode be used.
It is implementation-defined whether the type system is based on XML Schema 1.0 or XML Schema 1.1.
It is implementation-defined whether definitions that rely on XML (for example, the set of valid XML characters) should use the definitions in XML 1.0 or XML 1.1.

Note:

The XML Schema 1.1 recommendation introduces one new concrete datatype: xs:dateTimeStamp; it also incorporates the types xs:dayTimeDuration, xs:yearMonthDuration, and xs:anyAtomicType which were previously defined in earlier versions of [XQuery and XPath Data Model (XDM) 4.0]. Furthermore, XSD 1.1 includes the option of supporting revised definitions of types such as xs:NCName based on the rules in XML 1.1 rather than 1.0.

The [XQuery and XPath Data Model (XDM) 4.0] allows flexibility in the repertoire of characters permitted during processing that goes beyond even what version of XML is supported. A processor may allow the user to construct nodes and atomic items that contain characters not allowed by any version of XML. [Definition: [Definition: [Definition] A permitted character is one within the repertoire accepted by the implementation.]

In this document, text labeled as an example or as a note is provided for explanatory purposes and is not normative.

1.7 Options

Changes in 4.0 (next | previous)

Use of an option keyword that is not defined in the specification and is not known to the implementation now results in a dynamic error; previously it was ignored. [Issue 1019 PR 1059 26 March 2024]

As a matter of convention, a number of functions defined in this document take a parameter whose value is a map, defining options controlling the detail of how the function is evaluated. Maps are a new datatype introduced in XPath 3.1.

For example, the function fn:xml-to-json has an options parameter allowing specification of whether the output is to be indented. A call might be written:

xml-to-json($input, { 'indent': true() })

[Definition: [Definition: [Definition] Functions that take an options parameter adopt common conventions on how the options are used. These are referred to as the option parameter conventions. These rules apply only to functions that explicitly refer to them.]

Where a function adopts the option parameter conventions, the following rules apply:

The value of the relevant argument must be a map. The entries in the map are referred to as options: the key of the entry is called the option name, and the associated value is the option value. Option names defined in this specification are always strings (single xs:string values). Option values may be of any type.
The type of the options parameter in the function signature is always given as map(*).
Although option names are described above as strings, the actual key may be any value that is the same key as the required string. For example, instances of xs:untypedAtomic or xs:anyURI are equally acceptable.
Note:
This means that the implementation of the function can check for the presence and value of particular options using the functions map:contains and/or map:get.
Implementations may attach an implementation-defined meaning to options in the map that are not described in this specification. These options should use values of type xs:QName as the option names, using an appropriate namespace.
If an option is present whose key is not described in the specification, then a type error [err:XPTY0004]^XPmust be raised unless either (a) the key is recognized by the implementation, or (b) the key is a value of type xs:QName with a non-absent namespace.
All entries in the options map are optional, and supplying the empty map has the same effect as omitting the relevant argument in the function call, assuming this is permitted.
The ordering of the options map is immaterial.
For each named option, the function specification defines a required type for the option value. The value that is actually supplied in the map is converted to this required type using the coercion rules^XP. This will result in an error (typically [err:XPTY0004]^XP or [err:FORG0001]^FO) if conversion of the supplied value to the required type is not possible. A type error also occurs if this conversion delivers a coerced function whose invocation fails with a type error. A dynamic error occurs if the supplied value after conversion is not one of the permitted values for the option in question: the error codes for this error are defined in the specification of each function.
Note:
It is the responsibility of each function implementation to invoke this conversion; it does not happen automatically as a consequence of the function-calling rules.
In cases where the value of an option is itself a map, the specification of the particular function must indicate whether or not these rules apply recursively to the contents of that map.

1.9 Terminology

Changes in 4.0 (next | previous)

The term atomic value has been replaced by atomic item. [Issue 1337 PR 1361 2 August 2024]

The terminology used to describe the functions and operators on types defined in [XML Schema Part 2: Datatypes Second Edition] is defined in the body of this specification. The terms defined in this section are used in building those definitions.

Note:

Following in the tradition of [XML Schema Part 2: Datatypes Second Edition], the terms type and datatype are used interchangeably.

1.9.1 Atomic items

The following definitions are adopted from [XQuery and XPath Data Model (XDM) 4.0].

[Definition: [Definition: [Definition] An atomic item is a pair (T, D) where T (the type annotation) is an atomic type, and D (the datum) is a point in the value space of T.]
[Definition: [Definition: [Definition] A primitive type is one of the 19 primitive atomic types defined in 3.2 Primitive datatypes^XS2 of [XML Schema Part 2: Datatypes Second Edition], or the type xs:untypedAtomic defined in [XQuery and XPath Data Model (XDM) 4.0].]
[Definition: [Definition: [Definition] The datum of an atomic item is a point in the value space of its type, which is also a point in the value space of the primitive type from which that type is derived.] There are 20 primitive atomic types (19 defined in XSD, plus xs:untypedAtomic), and these have non-overlapping value spaces, so each datum belongs to exactly one primitive atomic type.
[Definition: [Definition: [Definition] The type annotation of an atomic item is the most specific atomic type that it is an instance of (it is also an instance of every type from which that type is derived).]

Note:

The term value space is defined in [XSD 1.1 Part 2] as a set of values. The term datum is used here in preference to value, because value has a different meaning in this data model.

1.9.2 Strings, characters, and codepoints

This document uses the terms string, character, and codepoint with meanings that are normatively defined in [XQuery and XPath Data Model (XDM) 4.0], and which are paraphrased here for ease of reference:

[Definition: [Definition: [Definition] A character is an instance of the Char^XML production of [Extensible Markup Language (XML) 1.0 (Fifth Edition)].]

Note:

This definition excludes Unicode characters in the surrogate blocks as well as U+FFFE and U+FFFF, while including characters with codepoints greater than U+FFFF which some programming languages treat as two characters. The valid characters are defined by their codepoints, and include some whose codepoints have not been assigned by the Unicode consortium to any character.

[Definition: [Definition: [Definition] A string is a sequence of zero or more characters, or equivalently, a value in the value space of the xs:string datatype.]

[Definition: [Definition: [Definition] A codepoint is an integer assigned to a character by the Unicode consortium, or reserved for future assignment to a character.]

Note:

The set of codepoints is thus wider than the set of characters.

This specification spells “codepoint” as one word; the Unicode specification spells it as “code point”. Equivalent terms found in other specifications are “character number” or “code position”. See [Character Model for the World Wide Web 1.0: Fundamentals]

Because these terms appear so frequently, they are hyperlinked to the definition only when there is a particular desire to draw the reader’s attention to the definition; the absence of a hyperlink does not mean that the term is being used in some other sense.

It is implementation-defined which version of [The Unicode Standard] is supported, but it is recommended that the most recent version of Unicode be used.

This specification adopts the Unicode notation U+xxxx to refer to a codepoint by its hexadecimal value (always four to six hexadecimal digits). This is followed where appropriate by the official Unicode character name and its graphical representation: for example U+20AC (EURO SIGN, €) .

Unless explicitly stated, the functions in this document do not ensure that any returned xs:string values are normalized in the sense of [Character Model for the World Wide Web 1.0: Fundamentals].

Note:

In functions that involve character counting such as fn:substring, fn:string-length and fn:translate, what is counted is the number of XML characters in the string (or equivalently, the number of Unicode codepoints). Some implementations may represent a codepoint above U+FFFF using two 16-bit values known as a surrogate pair. A surrogate pair counts as one character, not two.

Wherever encoding names (such as UTF-8 and UTF-16) are used in this specification, they are compared without regard to case: the strings "UTF-8" and "utf-8" both refer to the same encoding.

1.9.3 Namespaces and URIs

This document uses the phrase “namespace URI” to identify the concept identified in [Namespaces in XML] as “namespace name”, and the phrase “local name” to identify the concept identified in [Namespaces in XML] as “local part”.

It also uses the term “expanded-QName” defined below.

[Definition: [Definition: [Definition] An expanded-QName is a value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 4.0]): that is, a triple containing namespace prefix (optional), namespace URI (optional), and local name. Two expanded QNames are equal if the namespace URIs are the same (or both absent) and the local names are the same. The prefix plays no part in the comparison, but is used only if the expanded QName needs to be converted back to a string.]

The term URI is used as follows:

[Definition: [Definition: [Definition] Within this specification, the term URI refers to Universal Resource Identifiers as defined in [RFC 3986] and extended in [RFC 3987] with a new name IRI. The term URI Reference, unless otherwise stated, refers to a string in the lexical space of the xs:anyURI datatype as defined in [XML Schema Part 2: Datatypes Second Edition].]

Note:

This means, in practice, that where this specification requires a “URI Reference”, an IRI as defined in [RFC 3987] will be accepted, provided that other relevant specifications also permit an IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as “Base URI” that are defined or referenced across the whole family of XML specifications. Note also that the definition of xs:anyURI is a wider definition than the definition in [RFC 3987]; for example it does not require non-ASCII characters to be escaped.

1.9.4 Conformance terminology

In this specification:

The auxiliary verb must, when rendered in small capitals, indicates a precondition for conformance.
- When the sentence relates to an implementation of a function (for example "All implementations must recognize URIs of the form ...") then an implementation is not conformant unless it behaves as stated.
- When the sentence relates to the result of a function (for example "The result must have the same type as $arg") then the implementation is not conformant unless it delivers a result as stated.
- When the sentence relates to the arguments to a function (for example "The value of $argmust be a valid regular expression") then the implementation is not conformant unless it enforces the condition by raising a dynamic error whenever the condition is not satisfied.
The auxiliary verb may, when rendered in small capitals, indicates optional or discretionary behavior. The statement “An implementation may do X” implies that it is implementation-dependent whether or not it does X.
The auxiliary verb should, when rendered in small capitals, indicates desirable or recommended behavior. The statement “An implementation should do X” implies that it is desirable to do X, but implementations may choose to do otherwise if this is judged appropriate.

[Definition: [Definition: [Definition] Where behavior is described as implementation-defined, variations between processors are permitted, but a conformant implementation must document the choices it has made.]

[Definition: [Definition: [Definition] Where behavior is described as implementation-dependent, variations between processors are permitted, and conformant implementations are not required to document the choices they have made.]

Note:

Where this specification states that something is implementation-defined or implementation-dependent, it is open to host languages to place further constraints on the behavior.

1.9.5 Properties of functions

This section is concerned with the question of whether two calls on a function, with the same arguments, may produce different results.

In this section the term function, unless otherwise specified, applies equally to function definitions^XP (which can be the target of a static function call) and function items^DM (which can be the target of a dynamic function call).

[Definition: [Definition: [Definition] An execution scope is a sequence of calls to the function library during which certain aspects of the state are required to remain invariant. For example, two calls to fn:current-dateTime within the same execution scope will return the same result. The execution scope is defined by the host language that invokes the function library.] In XSLT, for example, any two function calls executed during the same transformation are in the same execution scope (except that static expressions, such as those used in use-when attributes, are in a separate execution scope).

The following definition explains more precisely what it means for two function calls to return the same result:

[Definition: [Definition: [Definition] Two values $V1 and $V2 are defined to be identical if they contain the same number of items and the items are pairwise identical. Two items are identical if and only if one of the following conditions applies:]

Both items are atomic items, of precisely the same type, and the values are equal as defined using the eq operator, using the Unicode codepoint collation when comparing strings.
Both items are nodes, and represent the same node.
Both items are maps, both maps have the same number of entries, and for every entry E₁ in the first map there is an entry E₂ in the second map such that the keys of E₁ and E₂ are the same key, and the corresponding values V₁ and V₂ are identical.
Both items are arrays, both arrays have the same number of members, and the members are pairwise identical.
Both items are function items, neither item is a map or array, and the two function items have the same function identity. The concept of function identity is explained in [XQuery and XPath Data Model (XDM) 4.0] section 8.1 Function Items.

Some functions produce results that depend not only on their explicit arguments, but also on the static and dynamic context.

[Definition: [Definition: [Definition] A function definition^XP may have the property of being context-dependent: the result of such a function depends on the values of properties in the static and dynamic evaluation context of the caller as well as on the actual supplied arguments (if any). A function definition may be context-dependent for some arities in its arity range, and context-independent for others: for example fn:name#0 is context-dependent while fn:name#1 is context-independent.]

[Definition: [Definition: [Definition] A function definition^XP that is not context-dependent is called context-independent.]

The main categories of context-dependent functions are:

Functions that explicitly deliver the value of a component of the static or dynamic context, for example fn:static-base-uri, fn:default-collation, fn:position, or fn:last.
Functions with an optional parameter whose default value is taken from the static or dynamic context of the caller, usually either the context value (for example, fn:node-name) or the default collation (for example, fn:index-of).
Functions that use the static context of the caller to expand or disambiguate the values of supplied arguments: for example fn:doc expands its first argument using the static base URI of the caller, and xs:QName expands its first argument using the in-scope namespaces of the caller.

[Definition: [Definition: [Definition] A function is focus-dependent if its result depends on the focus^XP31 (that is, the context item, position, or size) of the caller.]

[Definition: [Definition: [Definition] A function that is not focus-dependent is called focus-independent.]

Note:

Some functions depend on aspects of the dynamic context that remain invariant within an execution scope, such as the implicit timezone. Formally this is treated in the same way as any other context dependency, but internally, the implementation may be able to take advantage of the fact that the value is invariant.

Note:

User-defined functions in XQuery and XSLT may depend on the static context of the function definition (for example, the in-scope namespaces) and also in a limited way on the dynamic context (for example, the values of global variables). However, the only way they can depend on the static or dynamic context of the caller — which is what concerns us here — is by defining optional parameters whose default values are context-dependent.

Note:

Because the focus is a specific part of the dynamic context, all focus-dependent functions are also context-dependent. A context-dependent function, however, may be either focus-dependent or focus-independent.

A function definition that is context-dependent can be used as the target of a named function reference, can be partially applied, and can be found using fn:function-lookup. The principle in such cases is that the static context used for the function evaluation is taken from the static context of the named function reference, partial function application, or the call on fn:function-lookup; and the dynamic context for the function evaluation is taken from the dynamic context of the evaluation of the named function reference, partial function application, or the call of fn:function-lookup. These constructs all deliver a function item^DM having a captured context based on the static and dynamic context of the construct that created the function item. This captured context forms part of the closure of the function item.

The result of a dynamic call to a function item never depends on the static or dynamic context of the dynamic function call, only (where relevant) on the captured context held within the function item itself.

The fn:function-lookup function is a special case because it is potentially dependent on everything in the static and dynamic context. This is because the static and dynamic context of the call to fn:function-lookupform the captured context of the function item that fn:function-lookup returns.

[Definition: [Definition: [Definition] A function that is guaranteed to produce identical results from repeated calls within a single execution scope if the explicit and implicit arguments are identical is referred to as deterministic.]

[Definition: [Definition: [Definition] A function that is not deterministic is referred to as nondeterministic.]

All functions defined in this specification are deterministic unless otherwise stated. Exceptions include the following:

[Definition: [Definition: [Definition] Some functions (such as fn:in-scope-prefixes, fn:load-xquery-module, and fn:unordered) produce result sequences or result maps in an implementation-defined or implementation-dependent order. In such cases two calls with the same arguments are not guaranteed to produce the results in the same order. These functions are said to be nondeterministic with respect to ordering.]
Some functions (such as fn:analyze-string, fn:parse-xml, fn:parse-xml-fragment, fn:parse-html, and fn:json-to-xml) construct a tree of nodes to represent their results. There is no guarantee that repeated calls with the same arguments will return the same identical node (in the sense of the is operator). However, if non-identical nodes are returned, their content will be the same in the sense of the fn:deep-equal function. Such a function is said to be nondeterministic with respect to node identity.
Some functions (such as fn:doc and fn:collection) create new nodes by reading external documents. Such functions are guaranteed to be deterministic by default (some such functions have an option "stable":false() that makes them nondeterministic as a user option, and implementations may also provide configuration options to change the default).

Where the results of a function are described as being (to a greater or lesser extent) implementation-defined or implementation-dependent, this does not by itself remove the requirement that the results should be deterministic: that is, that repeated calls with the same explicit and implicit arguments must return identical results.

[Definition: [Definition: [Definition] The function fn:concat is defined to be variadic: it accepts any number of arguments. No other function has this property.]

2 Processing sequences

A sequence is an ordered collection of zero or more items. An item is a node, an atomic item, or a function, such as a map or an array. The terms sequence and item are defined formally in [XQuery 4.0: An XML Query Language] and [XML Path Language (XPath) 4.0].

2.2 Comparison functions

The functions in this section perform comparisons between the items in one or more sequences.

Many of these functions require atomic items to be compared for equality.

[Definition: [Definition: [Definition] Two atomic items A and B are said to be contextually equal if the function call fn:compare(A, B) returns zero when evaluated with a specified or context-determined collation and implicit timezone.] If two values are not contextually equal, they are considered to be contextually unequal, even in the case when comparing them using fn:compare raises an error.

Note:

Except where explicitly stated otherwise, an appeal to contextual equality implies that NaN is treated as equal to NaN.

Function	Meaning
`fn:atomic-equal`	Determines whether two atomic items are equal, under the rules used for comparing keys in a map.
`fn:compare`	Returns `-1`, `0`, or `1`, depending on whether the first value is less than, equal to, or greater than the second value.
`fn:contains-subsequence`	Determines whether one sequence contains another as a contiguous subsequence, using a supplied callback function to compare items.
`fn:deep-equal`	This function assesses whether two sequences are deep-equal to each other. To be deep-equal, they must contain items that are pairwise deep-equal; and for two items to be deep-equal, they must either be atomic items that compare equal, or nodes of the same kind, with the same name, whose children are deep-equal, or maps with matching entries, or arrays with matching members.
`fn:distinct-values`	Returns the values that appear in a sequence, with duplicates eliminated.
`fn:duplicate-values`	Returns the values that appear in a sequence more than once.
`fn:ends-with-subsequence`	Determines whether one sequence ends with another, using a supplied callback function to compare items.
`fn:index-of`	Returns a sequence of positive integers giving the positions within the sequence `$input` of items that are contextually equal to `$target`.
`fn:starts-with-subsequence`	Determines whether one sequence starts with another, using a supplied callback function to compare items.

4 Processing numerics

This section specifies arithmetic operators on the numeric datatypes defined in [XML Schema Part 2: Datatypes Second Edition].

4.7 Formatting numbers

This section defines a function for formatting decimal and floating point numbers.

Function	Meaning
`fn:format-number`	Returns a string containing a number formatted according to a given picture string and decimal format.

Note:

This function can be used to format any numeric quantity, including an integer. For integers, however, the fn:format-integer function offers additional possibilities. Note also that the picture strings used by the two functions are not 100% compatible, though they share some options in common.

4.7.1 Defining a decimal format

Decimal formats are defined in the static context, and the way they are defined is therefore outside the scope of this specification. XSLT and XQuery both provide custom syntax for creating a decimal format.

The static context provides a set of decimal formats. One of the decimal formats is unnamed, the others (if any) are identified by a QName. There is always an unnamed decimal format available, but its contents are implementation-defined.

Each decimal format provides a set of named properties.

Note:

A phrase such as "The minus-sign^XP31 character" is to be read as “the character assigned to the minus-sign^XP31 property in the relevant decimal format”.

[Definition: [Definition: [Definition] The decimal digit family of a decimal format is the sequence of ten digits with consecutive Unicode codepoints starting with the character that is the value of the zero-digit^XP31 property.]

[Definition: [Definition: [Definition] The optional digit character is the character that is the value of the digit^XP31 property.]

For any decimal format, the properties representing characters used in a picture string must have distinct values. These properties are decimal-separator^XP31 , grouping-separator^XP31, exponent-separator^XP31, percent^XP31, per-mille^XP31, digit^XP31, and pattern-separator^XP31. Furthermore, none of these properties may be equal to any character in the decimal digit family.

4.7.3 Syntax of the picture string

Note:

This differs from the format-number function previously defined in XSLT 2.0 in that any digit can be used in the picture string to represent a mandatory digit: for example the picture strings "000", "001", and "999" are equivalent. The digits will all be from the same decimal digit family, specifically, the sequence of ten consecutive digits starting with the digit assigned to the zero-digit property. This change is to align format-number (which previously used "000") with format-dateTime (which used 001).

[Definition: [Definition: [Definition] The formatting of a number is controlled by a picture string. The picture string is a sequence of characters, in which the characters assigned to the properties decimal-separator^XP31 , exponent-separator^XP31, grouping-separator^XP31, digit^XP31, and pattern-separator^XP31 and the members of the decimal digit family, are classified as active characters, and all other characters (including the values of the properties percent^XP31 and per-mille^XP31) are classified as passive characters.]

A dynamic error is raised [err:FODF1310] if the picture string does not conform to the following rules. Note that in these rules the words "preceded" and "followed" refer to characters anywhere in the string; they are not to be read as "immediately preceded" and "immediately followed".

A picture-string consists either of a sub-picture, or of two sub-pictures separated by the pattern-separator^XP31 character. A picture-string must not contain more than one instance of the pattern-separator^XP31 character. If the picture-string contains two sub-pictures, the first is used for positive and unsigned zero values and the second for negative values.
A sub-picture must not contain more than one instance of the decimal-separator^XP31 character.
A sub-picture must not contain more than one instance of the percent^XP31 or per-mille^XP31 characters, and it must not contain one of each.
The mantissa part of a sub-picture (defined below) must contain at least one character that is either an optional digit character or a member of the decimal digit family.
A sub-picture must not contain a passive character that is preceded by an active character and that is followed by another active character.
A sub-picture must not contain a grouping-separator^XP31 character that appears adjacent to a decimal-separator^XP31 character, or in the absence of a decimal-separator^XP31 character, at the end of the integer part.
A sub-picture must not contain two adjacent instances of the grouping-separator^XP31 character.
The integer part of a sub-picture (defined below) must not contain a member of the decimal digit family that is followed by an instance of the optional digit character. The fractional part of a sub-picture (defined below) must not contain an instance of the optional digit character that is followed by a member of the decimal digit family.
A character that matches the exponent-separator^XP31 property is treated as an exponent-separator-sign if it is both preceded and followed within the sub-picture by an active character. Otherwise, it is treated as a passive character. A sub-picture must not contain more than one character that is treated as an exponent-separator-sign.
A sub-picture that contains a percent^XP31 or per-mille^XP31 character must not contain a character treated as an exponent-separator-sign.
If a sub-picture contains a character treated as an exponent-separator-sign then this must be followed by one or more characters that are members of the decimal digit family, and it must not be followed by any active character that is not a member of the decimal digit family.

The mantissa part of the sub-picture is defined as the part that appears to the left of the exponent-separator-sign if there is one, or the entire sub-picture otherwise. The exponent part of the subpicture is defined as the part that appears to the right of the exponent-separator-sign; if there is no exponent-separator-sign then the exponent part is absent.

The integer part of the sub-picture is defined as the part that appears to the left of the decimal-separator^XP31 character if there is one, or the entire mantissa part otherwise.

The fractional part of the sub-picture is defined as that part of the mantissa part that appears to the right of the decimal-separator^XP31 character if there is one, or the part that appears to the right of the rightmost active character otherwise. The fractional part may be zero-length.

5 Processing strings

This section specifies functions and operators on the [XML Schema Part 2: Datatypes Second Edition]xs:string datatype and the datatypes derived from it.

5.3 Comparison of strings

Function	Meaning
`fn:codepoint-equal`	Returns `true` if two strings are equal, considered codepoint-by-codepoint.
`fn:collation`	Constructs a collation URI with requested properties.
`fn:collation-available`	Asks whether a collation URI is recognized by the implementation, and whether it has required properties.
`fn:collation-key`	Given a string value and a collation, generates an internal value called a collation key, with the property that the matching and ordering of collation keys reflects the matching and ordering of strings under the specified collation.
`fn:contains-token`	Determines whether or not any of the supplied strings, when tokenized at whitespace boundaries, contains the supplied token, under the rules of the supplied collation.

5.3.1 Collations

[Definition: [Definition: [Definition] A collation is an algorithm that determines, for any two given strings S₁ and S₂, whether S₁ is less than, equal to, or greater than S₂. In this specification, a collation is identified by an absolute URI.]

The [Character Model for the World Wide Web 1.0: Fundamentals] observes that different applications may require different comparison and ordering behaviors. Similarly, different users with different linguistic expectations may require different behaviors. Consequently, the collation must be taken into account when comparing strings.

Collations can indicate that two different codepoints are to be considered equal for comparison purposes (for example, “v” and “w” are considered equivalent in some Swedish collations). Strings can be compared codepoint-by-codepoint or in a linguistically appropriate manner.

Note:

Some sources, for example [UTS #10] use the term collation to refer more generically to a set of sorting rules that can be further parameterized or “tailored”. In this specification the term is always used for a specific algorithm in which all such parameters have defined values.

This specification defines some collation URIs that provide interoperable sorting behavior across applications. Other collation URIs are defined only partially (leaving some aspects implementation-defined). Implementations may define further collation URIs, or may allow users or third parties to define them.

The Unicode codepoint collation is available in every implementation. This collation sorts based on codepoint values. For further details see 5.3.3 The Unicode Codepoint Collation.

Collations may or may not perform Unicode normalization on strings before comparing them.

This specification allows a collation name to be provided as an argument to many string functions. Although collations are defined to be URIs, they are supplied as instances of xs:string.

The XQuery/XPath static context supplies a default collation for use when the collation argument is not specified. (see 2.1.1 Static Context ^XP31). If the default collation is not specified by the user or the system, the default collation is the Unicode codepoint collation.

If the collation is specified using a relative URI reference, it is resolved relative to an implementation-defined base URI.

Note:

Previous versions of this specification stated that it must be resolved against the Static Base URI^XP, but this is not always operationally convenient. It is recommended that processors should provide a means of setting the base URI for resolving collation URIs independently of the Static Base URI^XP, though for backwards compatibility, the Static Base URI^XP or Executable Base URI^XP should be used as a default.

This specification does not define whether or not the collation URI is dereferenced. The collation URI may be an abstract identifier, or it may refer to an actual resource describing the collation. If it refers to a resource, this specification does not define the nature of that resource. One possible candidate is that the resource is a locale description expressed using the Locale Data Markup Language: see [UTS #35].

Note:

The ability to access external resources depends on whether the calling code is trusted^XP.

Note:

XML allows elements to specify the xml:lang attribute to indicate the language associated with the content of such an element. This specification does not use xml:lang to identify the default collation because using xml:lang does not produce desired effects when the two strings to be compared have different xml:lang values or when a string is multilingual.

5.3.3 The Unicode Codepoint Collation

[Definition: [Definition: [Definition] The collation URI http://www.w3.org/2005/xpath-functions/collation/codepoint identifies a collation which must be recognized by every implementation: it is referred to as the Unicode codepoint collation (not to be confused with the Unicode collation algorithm).]

The Unicode codepoint collation does not perform any normalization on the supplied strings.

The collation is defined as follows. Each of the two strings is converted to a sequence of integers using the fn:string-to-codepoints function. These two sequences $A and $B are then compared as follows:

If both sequences are empty, the strings are equal.
If one sequence is empty and the other is not, then the string corresponding to the empty sequence is less than the other string.
If the first integer in $A is less than the first integer in $B, then the string corresponding to $A is less than the string corresponding to $B.
If the first integer in $A is greater than the first integer in $B, then the string corresponding to $A is greater than the string corresponding to $B.
Otherwise (the first pair of integers are equal), the result is obtained by applying the same rules recursively to fn:tail($A) and fn:tail($B)

Note:

While the Unicode codepoint collation does not produce results suitable for quality publishing of printed indexes or directories, it is adequate for many purposes where a restricted alphabet is used, such as sorting of vehicle registrations.

Note:

The Unicode codepoint collation differs from the default sort order used in programming languages that sort strings based on UTF-16 code units, which may include surrogate pairs.

5.5 Functions based on substring matching

The functions described in this section examine a string $arg1 to see whether it contains another string $arg2 as a substring. The result depends on whether $arg2 is a substring of $arg1, and if so, on the range of characters in $arg1 which $arg2 matches.

When the Unicode codepoint collation is used, this simply involves determining whether $arg1 contains a contiguous sequence of characters whose codepoints are the same, one for one, with the codepoints of the characters in $arg2.

When a collation is specified, the rules are more complex.

All collations support the capability of deciding whether two strings are considered equal, and if not, which of the strings should be regarded as preceding the other. For functions such as fn:compare, this is all that is required. For other functions, such as fn:contains, the collation needs to support an additional property: it must be able to decompose the string into a sequence of collation units, each unit consisting of one or more characters, such that two strings can be compared by pairwise comparison of these units.

[Definition: [Definition: [Definition] The term collation unit as used in this specification is equivalent to the term collation element used in [UTS #10].]

The string Q is then considered to contain P as a substring if the sequence of collation units corresponding to P is a subsequence of the sequence of collation units corresponding to Q. The characters in P that match are the characters corresponding to these collation units.

This rule may occasionally lead to surprises. For example, consider a collation that treats "Jaeger" and "Jäger" as equal. It might do this by treating "ä" as representing two collation units, in which case the expression fn:contains("Jäger", "eg") will return true. Alternatively, a collation might treat "ae" as a single collation unit, in which case the expression fn:contains("Jaeger", "eg") will return false. The results of these functions thus depend strongly on the properties of the collation that is used.

In addition, collations may specify that some collation units should be ignored during matching. If hyphen is an ignored collation unit, then fn:contains("code-point", "codepoint") will be true, and fn:contains("codepoint", "-") will also be true.

In the rules for the functions defined in this section, we use the following terms taken from [UTS #10]:

[Definition: [Definition: [Definition] The term match is used in the sense of definition DS2 from [UTS #10].]
[Definition: [Definition: [Definition] The term minimal match is used in the sense of definition DS4 from [UTS #10].]

In the definitions in [UTS #10], these rules involve a number of parameters. In the context of the functions defined in this section, these parameters are interpreted as follows:

C is the collation; that is, the value of the $collation argument if specified, otherwise the default collation.
P is the (candidate) substring, the value of the $substring argument to the function.
Q is the (candidate) containing string, the value of the $value argument to the function.
The boundary condition B is satisfied at the start and end of a string, and between any two characters that belong to different collation units (“collation elements” in the language of [UTS #10]). It is not satisfied between two characters that belong to the same collation unit.

It is possible to define collations that do not have the ability to decompose a string into units suitable for substring matching. An argument to a function defined in this section may be a URI that identifies a collation that is able to compare two strings, but that does not have the capability to split the string into collation units. Such a collation may cause the function to fail, or to give unexpected results, or it may be rejected as an unsuitable argument. The ability to decompose strings into collation units is an implementation-defined property of the collation. The fn:collation-available function can be used to ask whether a particular collation has this property.

Function	Meaning
`fn:contains`	Returns `true` if the string `$value` contains `$substring` as a substring, taking collations into account.
`fn:starts-with`	Returns `true` if the string `$value` contains `$substring` as a leading substring, taking collations into account.
`fn:ends-with`	Returns `true` if the string `$value` contains `$substring` as a trailing substring, taking collations into account.
`fn:substring-before`	Returns the part of `$value` that precedes the first occurrence of `$substring`, taking collations into account.
`fn:substring-after`	Returns the part of `$value` that follows the first occurrence of `$substring`, taking collations into account.

6 Regular expressions

The functions described in this section make use of a regular expression syntax for pattern matching. The syntax and semantics of regular expressions are defined in this section.

6.1 Regular expression syntax

Changes in 4.0 (next | previous)

Regular expressions can include comments (starting and ending with #) if the c flag is set. [Issue 999 PR 1022 20 February 2024]
Word boundaries can be matched. Lookahead and lookbehind assertions are supported. Assertions (including ^ and $) can no longer be followed by a quantifier. [Issues 998 1006 PR 1856]

The regular expression syntax used by these functions is defined in terms of the regular expression syntax specified in XSD 1.1 (see [XSD 1.1 Part 2]), which in turn is based on the established conventions of languages such as Perl. However, because XML Schema uses regular expressions only for validity checking, it omits some facilities that are widely used with other languages. XPath, therefore, extends the XML Schema regular expression syntax to reinstate some of these capabilities.

Note:

Implementers should consult [UTS #18] for information on using regular expression processing on Unicode characters.

The regular expression syntax and semantics are identical to those defined in [XSD 1.1 Part 2] with the additions described in the following subsections.

Note:

In [XSD 1.1 Part 2] there are no substantive technical changes to the syntax or semantics of regular expressions relative to [XML Schema Part 2: Datatypes Second Edition], but a number of errors and ambiguities have been resolved. For example, the rules for the interpretation of hyphens within square brackets in a regular expression have been clarified; and the semantics of regular expressions are no longer tied to a specific version of Unicode.

XSD 1.1 is therefore used as the specification baseline, even for processors that only support XSD 1.0.

6.1.1 Processing model for regular expressions

As well as extending the XSD 1.1 syntax for regular expressions, this specification also extends the processing model.

In XSD, a regular expression is defined to denote a set of strings, and the only functionality offered is to test whether a string matches a regular expression: that is, whether it is a member of the set of strings denoted by the regular expression.

In this specification, matching a string S against a regular expression delivers a more complex outcome.

First some terminology:

[Definition: [Definition: [Definition] A string of length N has N+1character positions: one immediately before each character in the string, and one after the last character. In interfaces where character positions are exposed, they are numbered from 1 to N+1.]
[Definition: [Definition: [Definition] A segment of a string S is a sequence of zero or more contiguous characters starting at a given character position within S.] Segments of a string are uniquely identified by their start position and length. The sequence of characters making up a segment is referred to as the string value of the segment.
[Definition: [Definition: [Definition] The end position of a segment is the start position of the segment plus its length.]

The operation of matching a string S against a regular expression delivers:

A set of matching segments. The string S as a whole is said to match the regular expression if the set of matching segments is non-empty.
For each matching segmentM, a collection of captured groups. This is a mapping from positive integers to segments. The integer is called the group number, and corresponds to the ordinal sequence of opening parentheses of capturing subexpressions within the regular expression, as explained below. The corresponding segment is always a segment of S, but in the case of capturing expressions within lookahead assertions, it is not necessarily a segment of M.

The semantics of particular constructs in a regular expression are affected by a set of flags. The available flags and their effect are defined in 6.2 Flags.

The different functions available, such as fn:replace and fn:tokenize, are defined in terms of this outcome. For example:

The function fn:matches returns true if the set of matching segments is non-empty.
The function fn:replace replaces matching segments of the input string with a replacement string.
The function fn:tokenize returns the segments of the input string that appear between the matching segments.

In principle the set of segments that match a regular expression can be determined by enumerating all the segments of the input string and examining each one independently to establish whether it matches. In practice, however:

If several matching segments have the same starting position, then only one of them is returned. This is chosen as follows:
- In the case of a choice (operator "|") the first matching branch is chosen.
- In the case of a repetition with a greedy quantifier (for example "+" or "*") the longest matching segment is chosen.
- In the case of a repetition with a reluctant quantifier (for example "+?" or "*?") the shortest matching segment is chosen.
A matching segment is not included in the result if it overlaps an earlier matching segment: specifically, a segment with start position S₁ is excluded if there is a segment that has start position S₀ and length L₀, where S₀ < S₁ < S₀+L₀.

Note:

Two segments can be adjacent: that is, the start position of one segment can be equal to the end position of the previous segment. This is true even when the second segment is zero-length (the two segments are not considered to be overlapping, even though they have the same end position). This means, for example, that the regular expression a*(?=x) has two non-overlapping matches against the string aaax, one at position 1 and the other at position 4.

[Definition: [Definition: [Definition] The disjoint matching segments obtained by applying a regular expression R to a string S in the presence of a set of flags F are the segments of S that match R (using flags F), after elimination of overlapping segments.]

The semantics of a regular expression are thus defined by stating which segments of an input string it matches, and what the captured groups corresponding to this match are. This is defined recursively for each construct that may appear within a regular expression, in terms of the outcome of applying its subexpressions.

For constructs defined in XSD 1.1 (branch, piece, NormalChar, charClass), XSD defines a set of strings denoted by the construct. The corresponding semantics for this specification are that the segments matched by such a construct are the segments whose string value is contained in this set.

For constructs added to the XSD 1.1 baseline by this specification, the semantics are defined in the sections that follow.

6.1.5 Captured groups

The regular expression syntax defined by [XML Schema Part 2: Datatypes Second Edition] allows a regular expression to contain parenthesized subexpressions, but attaches no special significance to them. Some operations associated with regular expressions (for example, back-references, and the fn:replace function) allow access to the parts of the input string that matched a parenthesized subexpression (called captured groups).

[Definition: [Definition: [Definition] A left parenthesis is recognized as a capturing left parenthesis provided it is not immediately followed by ? or * (see below), is not within a character group (square brackets), and is not escaped with a backslash. The sub-expression enclosed by a capturing left parenthesis and its matching right parenthesis is referred to as a capturing subexpression.]

More specifically, the capturing subexpression enclosed by the Nth capturing left parenthesis within the regular expression (determined by its character position in left-to-right order, and counting from one) is referred to as the Nth capturing subexpression.

For example, in the regular expression A(BC(?:D(EF(GH[()])))), the subexpression BC(?:D(EF(GH[()]))) is capturing subexpression 1, the string subexpression EF(GH[()]) is capturing subexpression 2, and the subexpression GH[()] is capturing subexpression 3.

When, in the course of evaluating a regular expression, a particular segment of the input matches a capturing subexpression, that segment becomes available as a captured group. The segment matched by the Nth capturing subexpression is referred to as the Nth captured group. By convention, the segment captured by the entire regular expression is treated as captured group 0 (zero).

When a capturing subexpression is matched more than once (because it is within a construct that allows repetition), then only the last substring that it matched will be captured. Note that this rule is not sufficient in all cases to ensure an unambiguous result, especially in cases where (a) the regular expression contains nested repeating constructs, and/or (b) the repeating construct matches a zero-length string. In such cases it is implementation-dependent which substring is captured. For example given the regular expression (a*)+ and the input string "aaaa", an implementation might legitimately capture either "aaaa" or a zero length string as the content of the captured subgroup.

Parentheses that are required to group terms within the regular expression, but which are not required for capturing of substrings, can be represented using the syntax (?:xxxx).

In the absence of back-references (see below), the presence of the optional ?: has no effect on the set of strings that match the regular expression, but causes the left parenthesis not to be counted by operations (such as fn:replace and back-references) that number the capturing sub-expressions within a regular expression.

9 Processing dates and times

This section defines operations on the [XML Schema Part 2: Datatypes Second Edition] date and time types.

See [Working With Timezones] for a disquisition on working with date and time values with and without timezones.

9.1 Date and time types

[Definition: [Definition: [Definition] The eight primitive types xs:dateTime, xs:date, xs:time, xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth, xs:gDay are referred to collectively as the Gregorian types.]

This section describes operations on atomic items of these types.

Values of these types are modeled as comprising one or more of the seven components year, month, day, hour, minute, second, and timezone.

The only operations defined on xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth, and xs:gDay values are equality comparison and component extraction. For other types, further operations are provided, including order comparisons, arithmetic, formatted display, and timezone adjustment.

9.9 Formatting dates and times

Function	Meaning
`fn:format-dateTime`	Returns a string containing an `xs:dateTime` value formatted for display.
`fn:format-date`	Returns a string containing an `xs:date` value formatted for display.
`fn:format-time`	Returns a string containing an `xs:time` value formatted for display.

Three functions are provided to represent dates and times as a string, using the conventions of a selected calendar, language, and country. The functions are presented in their customary fashion, except for the rules and examples, which are described en bloc at 9.9.4 The date/time formatting functions and 9.9.5 Examples of date and time formatting.

9.9.4 The date/time formatting functions

The fn:format-dateTime, fn:format-date, and fn:format-time functions format $value as a string using the picture string specified by the $picture argument, the calendar specified by the $calendar argument, the language specified by the $language argument, and the country or other place name specified by the $place argument. The result of the function is the formatted string representation of the supplied xs:dateTime, xs:date, or xs:time value.

[Definition: [Definition: [Definition] The three functions fn:format-dateTime, fn:format-date, and fn:format-time are referred to collectively as the date formatting functions.]

If $value is the empty sequence, the function returns the empty sequence.

Calling the two-argument form of each of the three functions is equivalent to calling the five-argument form with each of the last three arguments set to the empty sequence.

For details of the $language, $calendar, and $place arguments, see 9.9.4.8 The language, calendar, and place arguments.

In general, the use of an invalid $picture, $language, $calendar, or $place argument results in a dynamic error [err:FOFD1340]. By contrast, use of an option in any of these arguments that is valid but not supported by the implementation is not an error, and in these cases the implementation is required to output the value in a fallback representation. More detailed rules are given below.

13 Processing function items

The functions included in this section operate on function items, that is, values referring to a function.

[Definition: [Definition: [Definition] Functions that accept functions among their arguments, or that return functions in their result, are described in this specification as higher-order functions.]

Note:

Some functions such as fn:parse-json allow the option of supplying a callback function for example to define exception behavior. Where this is not essential to the use of the function, the function has not been classified as higher-order for this purpose; in applications where function items cannot be created, these particular options will not be available.

Function	Meaning
`fn:function-lookup`	Returns a function item having a given name and arity, if there is one.
`fn:function-name`	Returns the name of the function identified by a function item.
`fn:function-arity`	Returns the arity of the function identified by a function item.
`fn:function-identity`	Returns a string representing the identity of a function item.
`fn:function-annotations`	Returns the annotations of the function item.

14 Processing maps

Maps were introduced as a new datatype in XDM 3.1. This section describes functions that operate on maps.

A map is a kind of item.

[Definition: [Definition: [Definition] A map consists of a sequence of entries, also known as key-value pairs. Each entry comprises a key which is an arbitrary atomic item, and an arbitrary sequence called the associated value.]

[Definition: [Definition: [Definition] Within a map, no two entries have the same key. Two atomic items K1 and K2 are the same key for this purpose if the function call fn:atomic-equal($K1, $K2) returns true.]

It is not necessary that all the keys in a map should be of the same type (for example, they can include a mixture of integers and strings).

Maps are immutable, and have no identity separate from their content. For example, the map:remove function returns a map that differs from the supplied map by the omission (typically) of one entry, but the supplied map is not changed by the operation. Two calls on map:remove with the same arguments return maps that are indistinguishable from each other; there is no way of asking whether these are “the same map”.

A map can also be viewed as a function from keys to associated values. To achieve this, a map is also a function item. The function corresponding to the map has the signature function($key as xs:anyAtomicValue) as item()*. Calling the function has the same effect as calling the map:get function: the expression $map($key) returns the same result as get($map, $key). For example, if $books-by-isbn is a map whose keys are ISBNs and whose assocated values are book elements, then the expression $books-by-isbn("0470192747") returns the book element with the given ISBN. The fact that a map is a function item allows it to be passed as an argument to higher-order functions that expect a function item as one of their arguments.

14.2 Composing and Decomposing Maps

It is often useful to decompose a map into a sequence of entries, or key-value pairs (in which the key is an atomic item and the value is an arbitrary sequence). Subsequently it may be necessary to reconstruct a map from these components, typically after modification.

There are two conventional ways of representing a map as a sequence of key-value pairs, each with its own advantages and disadvantages. These are described below:

A map can be represented as a sequence of single-entry maps.
[Definition: [Definition: [Definition] A single-entry map is a map containing a single entry.]
It is possible to decompose any map into a sequence of single-entry maps, and to construct a map from a sequence of single-entry maps.
For example the map { "x": 1, "y": 2 } can be decomposed to the sequence ({ "x": 1 }, { "y": 2 }).
A map can be represented as a sequence of JNodes.
A JNode holds the map key in its ·selector· property and the corresponding value in its ·content· property.

The following table summarizes the way in which these two representations can be used to compose and decompose maps:

Operation	Single-Entry Maps	JNodes
Decompose a map	`map:entries($map)`	`$map/child::*`
Compose a map	`map:merge($entries)`	`map:build($jnodes, jnode-selector#1, jnode-content#1)`
Create a single entry	`map:entry($key, $value)`	`{$key : $value}/child::*`
Extract the key part of a single entry	`map:keys($entry)`	`jnode-selector($jnode)`
Extract the value part of a single entry	`map:items($entry)`	`jnode-content($jnode)`

It is also possible to decompose a map using:

The function map:for-each
The expression for key $k value $v in $map return ....

Example: Reordering the entries in a map

The examples below show several ways of constructing a map with the same entries as an input map, but with the entries sorted by key.

Using map:entries and map:merge:

map:entries($map) => sort-by({'key': map:keys#1}) => map:merge()

Using JNodes:

$map/* => sort-by({'key': jnode-selector#1}) => map:build(jnode-selector#1, jnode-content#1)

Using map:for-each:

map:merge( map:for-each($map, map:entry#2) => sort-by({'key': map:keys#1}) )

Using an XQuery FLWOR expression:

map:merge( for key $k value $v order by $k return {$k : $v} )

17 External resources and data formats

These functions in this section access resources external to a query or stylesheet, and convert between external file formats and their XPath and XQuery data model representation.

17.5 Functions on CSV Data

Changes in 4.0 (next | previous)

New functions are available for processing input data in CSV (comma separated values) format. [Issue 413 PRs 533 719 834]

This section describes functions that parse CSV data.

[Definition: [Definition: [Definition] The term comma separated values or CSV refers to a wide variety of plain-text tabular data formats with fields and records separated by standard character delimiters (often, but not invariably, commas).]

A CSV is a 2-dimensional tabular data structure consisting of multiple rows (also known as records). Each row contains multiple fields. Fields occupying the same position in successive rows constitute a column. Columns are identified by position and optionally by name. Column names can be assigned within a CSV using an optional header row.

CSV has developed informally for decades, and many variations are found. This specification refers to [RFC 4180], which provides a standardized grammar. This specification extends the grammar defined in [RFC 4180] as follows:

This specification uses the term row where RFC 4180 uses record.
Line endings are normalized: specifically, the character sequences U+000D (CARRIAGE RETURN) , or U+000D (CARRIAGE RETURN) followed by U+000A (NEWLINE) , are converted to a single U+000A (NEWLINE) character. This applies whether or not the line ending appears within a quoted string, and whether or not U+000A (NEWLINE) is the chosen row delimiter.
Row delimiters other than newline are recognized.
Field delimiters other than U+002C (COMMA, ,) are recognized.
Quote characters other than U+0022 (QUOTATION MARK, ") are recognized.
Non-ASCII characters are recognized.

This specification defines a mapping from this extended grammar to constructs in the XDM model, and provides illustrative examples of how these constructs can be combined with other language features to process CSV data.

Function	Meaning
`fn:csv-to-arrays`	Parses CSV data supplied as a string, returning the results in the form of a sequence of arrays of strings.
`fn:parse-csv`	Parses CSV data, returning the results in the form of a record containing information about the names in the header, as well as the data itself.
`fn:csv-doc`	Reads an external resource containing CSV, and returns the results as a record containing information about the names in the header, as well as the data itself.
`fn:csv-to-xml`	Parses CSV data supplied as a string, returning the results as an XML document, as described by 17.5.9 Representing CSV data as XML.

The most basic function for parsing CSV is fn:csv-to-arrays which recognizes the delimiters for rows and fields and returns a sequence of arrays each corresponding to one row. The fields within each array are represented as instances of xs:string.

The other two functions recognize column names, and make it easier to address individual fields using these names. The parse-csv function delivers this capability using XDM maps and functions, while csv-to-xml function represents the information using XDM element nodes.

G Implementation-defined features (Non-Normative)

It is implementation-defined which version of Unicode is supported, but it is recommended that the most recent version of Unicode be used. (See Conformance.)
It is implementation-defined whether the type system is based on XML Schema 1.0 or XML Schema 1.1. (See Conformance.)
It is implementation-defined whether definitions that rely on XML (for example, the set of valid XML characters) should use the definitions in XML 1.0 or XML 1.1. (See Conformance.)
Implementations may attach an implementation-defined meaning to options in the map that are not described in this specification. These options should use values of type xs:QName as the option names, using an appropriate namespace. (See Options.)
It is implementation-defined which version of [The Unicode Standard] is supported, but it is recommended that the most recent version of Unicode be used. (See Strings, characters, and codepoints.)
[Definition] [Definition: Some functions (such as fn:in-scope-prefixes, fn:load-xquery-module, and fn:unordered) produce result sequences or result maps in an implementation-defined or implementation-dependent order. In such cases two calls with the same arguments are not guaranteed to produce the results in the same order. These functions are said to be nondeterministic with respect to ordering.] (See Properties of functions.)
Where the results of a function are described as being (to a greater or lesser extent) implementation-defined or implementation-dependent, this does not by itself remove the requirement that the results should be deterministic: that is, that repeated calls with the same explicit and implicit arguments must return identical results. (See Properties of functions.)
They may provide an implementation-defined mechanism that allows users to choose between raising an error and returning a result that is modulo the largest representable integer value. See [ISO 10967]. (See Arithmetic operators on numeric values.)
For xs:decimal values, let N be the number of digits of precision supported by the implementation, and let M (M <= N) be the minimum limit on the number of digits required for conformance (18 digits for XSD 1.0, 16 digits for XSD 1.1). Then for addition, subtraction, and multiplication operations, the returned result should be accurate to N digits of precision, and for division and modulus operations, the returned result should be accurate to at least M digits of precision. The actual precision is implementation-defined. If the number of digits in the mathematical result exceeds the number of digits that the implementation retains for that operation, the result is truncated or rounded in an implementation-defined manner. (See Arithmetic operators on numeric values.)
The [IEEE 754-2019] specification also describes handling of two exception conditions called divideByZero and invalidOperation. The IEEE divideByZero exception is raised not only by a direct attempt to divide by zero, but also by operations such as log(0). The IEEE invalidOperation exception is raised by attempts to call a function with an argument that is outside the function’s domain (for example, sqrt(-1) or log(-1)). Although IEEE defines these as exceptions, it also defines “default non-stop exception handling” in which the operation returns a defined result, typically positive or negative infinity, or NaN. With this function library, these IEEE exceptions do not cause a dynamic error at the application level; rather they result in the relevant function or operator returning the defined non-error result. The underlying IEEE exception may be notified to the application or to the user by some implementation-defined warning condition, but the observable effect on an application using the functions and operators defined in this specification is simply to return the defined result (typically -INF, +INF, or NaN) with no error. (See Arithmetic operators on numeric values.)
The [IEEE 754-2019] specification distinguishes two NaN values: a quiet NaN and a signaling NaN. These two values are not distinguishable in the XDM model: the value spaces of xs:float and xs:double each include only a single NaN value. This does not prevent the implementation distinguishing them internally, and triggering different implementation-defined warning conditions, but such distinctions do not affect the observable behavior of an application using the functions and operators defined in this specification. (See Arithmetic operators on numeric values.)
The implementation may adopt a different algorithm provided that it is equivalent to this formulation in all cases where implementation-dependent or implementation-defined behavior does not affect the outcome, for example, the implementation-defined precision of the result of xs:decimal division. (See op:numeric-integer-divide.)
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation. (See fn:divide-decimals.)
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation. (See fn:round.)
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation. (See fn:round-half-to-even.)
XSD 1.1 allows the string +INF as a representation of positive infinity; XSD 1.0 does not. It is implementation-defined whether XSD 1.1 is supported. (See fn:number.)
Any other format token, which indicates a numbering sequence in which that token represents the number 1 (one) (but see the note below). It is implementation-defined which numbering sequences, additional to those listed above, are supported. If an implementation does not support a numbering sequence represented by the given token, it must use a format token of 1. (See fn:format-integer.)
For all format tokens other than a digit-pattern, there may be implementation-defined lower and upper bounds on the range of numbers that can be formatted using this format token; indeed, for some numbering sequences there may be intrinsic limits. For example, the format token U+2460 (CIRCLED DIGIT ONE, ①) has a range imposed by the Unicode character repertoire — zero to 20 in Unicode versions prior to 3.2, or zero to 50 in subsequent versions. For the numbering sequences described above any upper bound imposed by the implementation must not be less than 1000 (one thousand) and any lower bound must not be greater than 1. Numbers that fall outside this range must be formatted using the format token 1. (See fn:format-integer.)
The set of languages for which numbering is supported is implementation-defined. If the $language argument is absent, or is set to the empty sequence, or is invalid, or is not a language supported by the implementation, then the number is formatted using the default language from the dynamic context. (See fn:format-integer.)
...either a or t, to indicate alphabetic or traditional numbering respectively, the default being implementation-defined. (See fn:format-integer.)
The string of characters between the parentheses, if present, is used to select between other possible variations of cardinal or ordinal numbering sequences. The interpretation of this string is implementation-defined. No error occurs if the implementation does not define any interpretation for the defined string. (See fn:format-integer.)
It is implementation-defined what combinations of values of the format token, the language, and the cardinal/ordinal modifier are supported. If ordinal numbering is not supported for the combination of the format token, the language, and the string appearing in parentheses, the request is ignored and cardinal numbers are generated instead. (See fn:format-integer.)
The use of the a or t modifier disambiguates between numbering sequences that use letters. In many languages there are two commonly used numbering sequences that use letters. One numbering sequence assigns numeric values to letters in alphabetic sequence, and the other assigns numeric values to each letter in some other manner traditional in that language. In English, these would correspond to the numbering sequences specified by the format tokens a and i. In some languages, the first member of each sequence is the same, and so the format token alone would be ambiguous. In the absence of the a or t modifier, the default is implementation-defined. (See fn:format-integer.)
The static context provides a set of decimal formats. One of the decimal formats is unnamed, the others (if any) are identified by a QName. There is always an unnamed decimal format available, but its contents are implementation-defined. (See Defining a decimal format.)
IEEE states that the preferred quantum is language-defined. In this specification, it is implementation-defined. (See Trigonometric and exponential functions.)
IEEE defines various rounding algorithms for inexact results, and states that the choice of rounding direction, and the mechanisms for influencing this choice, are language-defined. In this specification, the rounding direction and any mechanisms for influencing it are implementation-defined. (See Trigonometric and exponential functions.)
The map returned by the fn:random-number-generator function may contain additional entries beyond those specified here, but it must match the record type defined above. The meaning of any additional entries is implementation-defined. To avoid conflict with any future version of this specification, the keys of any such entries should start with an underscore character. (See fn:random-number-generator.)
It is no longer automatically an error if the input contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. (See fn:codepoints-to-string.)
If two query parameters use the same keyword then the last one wins. If a query parameter uses a keyword or value which is not defined in this specification then the meaning is implementation-defined. If the implementation recognizes the meaning of the keyword and value then it should interpret it accordingly; if it does not recognize the keyword or value then if the fallback parameter is present with the value no it should reject the collation as unsupported, otherwise it should ignore the unrecognized parameter. (See The Unicode Collation Algorithm.)
The following query parameters are defined. If any parameter is absent, the default is implementation-defined except where otherwise stated. The meaning given for each parameter is non-normative; the normative specification is found in [UTS #35]. (See The Unicode Collation Algorithm.)
Because the set of collations that are supported is implementation-defined, an implementation has the option to support all collation URIs, in which case it will never raise this error. (See Choosing a collation.)
The properties available are as defined for the Unicode Collation Algorithm (see 5.3.4 The Unicode Collation Algorithm). Additional implementation-defined properties may be specified as described in the rules for UCA collation URIs. (See fn:collation.)
It is possible to define collations that do not have the ability to generate collation keys. Supplying such a collation will cause the function to fail. The ability to generate collation keys is an implementation-defined property of the collation. (See fn:collation-key.)
Conforming implementations must support normalization form NFC and may support normalization forms NFD, NFKC, NFKD, and FULLY-NORMALIZED. They may also support other normalization forms with implementation-defined semantics. (See fn:normalize-unicode.)
It is implementation-defined which version of Unicode (and therefore, of the normalization algorithms and their underlying data) is supported by the implementation. See [UAX #15] for details of the stability policy regarding changes to the normalization rules in future versions of Unicode. If the input string contains codepoints that are unassigned in the relevant version of Unicode, or for which no normalization rules are defined, the fn:normalize-unicode function leaves such codepoints unchanged. If the implementation supports the requested normalization form then it must be able to handle every input string without raising an error. (See fn:normalize-unicode.)
It is possible to define collations that do not have the ability to decompose a string into units suitable for substring matching. An argument to a function defined in this section may be a URI that identifies a collation that is able to compare two strings, but that does not have the capability to split the string into collation units. Such a collation may cause the function to fail, or to give unexpected results, or it may be rejected as an unsuitable argument. The ability to decompose strings into collation units is an implementation-defined property of the collation. The fn:collation-available function can be used to ask whether a particular collation has this property. (See Functions based on substring matching.)
The result of the function will always be such that validation against this schema would succeed. However, it is implementation-defined whether the result is typed or untyped, that is, whether the elements and attributes in the returned tree have type annotations that reflect the result of validating against this schema. (See fn:analyze-string.)
Some URI schemes are hierarchical and some are non-hierarchical. Implementations must treat the following schemes as non-hierarchical: jar, mailto, news, tag, tel, and urn. Whether additional schemes are known to be non-hierarchical implementation-defined. If a scheme is not known to be non-hierarchical, it must be treated as hierarchical. (See Parsing and building URIs.)
If the omit-default-ports option is true, the port is discarded and set to the empty sequence if the port number is the same as the default port for the given scheme. Implementations should recognize the default ports for http (80), https (443), ftp (21), and ssh (22). Exactly which ports are recognized is implementation-defined. (See fn:parse-uri.)
If the omit-default-ports option is true then the $port is set to the empty sequence if the port number is the same as the default port for the given scheme. Implementations should recognize the default ports for http (80), https (443), ftp (21), and ssh (22). Exactly which ports are recognized is implementation-defined. (See fn:build-uri.)
Processors may support a greater range and/or precision. The limits are implementation-defined. (See Limits and precision.)
Similarly, a processor may be unable accurately to represent the result of dividing a duration by 2, or multiplying a duration by 0.5. A processor that limits the precision of the seconds component of duration values must deliver a result that is as close as possible to the mathematically precise result, given these limits; if two values are equally close, the one that is chosen is implementation-defined. (See Limits and precision.)
All conforming processors must support year values in the range 1 to 9999, and a minimum fractional second precision of 1 millisecond or three digits (that is, s.sss). However, processors may set larger implementation-defined limits on the maximum number of digits they support in these two situations. Processors may also choose to support the year 0 and years with negative values. The results of operations on dates that cross the year 0 are implementation-defined. (See Limits and precision.)
Similarly, a processor that limits the precision of the seconds component of date and time or duration values may need to deliver a rounded result for arithmetic operations. Such a processor must deliver a result that is as close as possible to the mathematically precise result, given these limits: if two values are equally close, the one that is chosen is implementation-defined. (See Limits and precision.)
...the format token n, N, or Nn, indicating that the value of the component is to be output by name, in lower-case, upper-case, or title-case respectively. Components that can be output by name include (but are not limited to) months, days of the week, timezones, and eras. If the processor cannot output these components by name for the chosen calendar and language then it must use an implementation-defined fallback representation. (See The picture string.)
...indicates alphabetic or traditional numbering respectively, the default being implementation-defined. This has the same meaning as in the second argument of fn:format-integer. (See The picture string.)
The sequence of characters in the (adjusted) first presentation modifier is reversed (for example, 999'### becomes ###'999). If the result is not a valid decimal digit pattern, then the output is implementation-defined. (See Formatting Fractional Seconds.)
The output for these components is entirely implementation-defined. The default presentation modifier for these components is n, indicating that they are output as names (or conventional abbreviations), and the chosen names will in many cases depend on the chosen language: see 9.9.4.8 The language, calendar, and place arguments. (See Formatting Other Components.)
The set of languages, calendars, and places that are supported in the date formatting functions is implementation-defined. When any of these arguments is omitted or is the empty sequence, an implementation-defined default value is used. (See The language, calendar, and place arguments.)
The choice of the names and abbreviations used in any given language is implementation-defined. For example, one implementation might abbreviate July as Jul while another uses Jly. In German, one implementation might represent Saturday as Samstag while another uses Sonnabend. Implementations may provide mechanisms allowing users to control such choices. (See The language, calendar, and place arguments.)
The choice of the names and abbreviations used in any given language for calendar units such as days of the week and months of the year is implementation-defined. (See The language, calendar, and place arguments.)
The calendar value if present must be a valid EQName (dynamic error: [err:FOFD1340]). If it is a lexical QName then it is expanded into an expanded QName using the statically known namespaces; if it has no prefix then it represents an expanded-QName in no namespace. If the expanded QName is in no namespace, then it must identify a calendar with a designator specified below (dynamic error: [err:FOFD1340]). If the expanded QName is in a namespace then it identifies the calendar in an implementation-defined way. (See The language, calendar, and place arguments.)
At least one of the above calendars must be supported. It is implementation-defined which calendars are supported. (See The language, calendar, and place arguments.)
If the arguments to fn:function-lookup identify a function that is present in the static context of the function call, the function will always return the same function that a static reference to this function would bind to. If there is no such function in the static context, then the results depend on what is present in the dynamic context, which is implementation-defined. (See fn:function-lookup.)
It is to some extent implementation-defined whether two maps or arrays have the same function identity. Processors should ensure as a minimum that when a variable $m is bound to a map or array, calling jtree($m) more than once (with the same variable reference) will deliver the same JNode each time. (See fn:jtree.)
The requirement to deliver a deterministic result has performance implications, and for this reason implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call of the function must either return a deterministic result or must raise a dynamic error [err:FODC0003]. (See fn:doc.)
Various aspects of this processing are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:... (See fn:doc.)
It is implementation-defined whether DTD validation and/or schema validation is applied to the source document. (See fn:doc.)
The effect of a fragment identifier in the supplied URI is implementation-defined. One possible interpretation is to treat the fragment identifier as an ID attribute value, and to return a document node having the element with the selected ID value as its only child. (See fn:doc.)
By default, this function is deterministic. This means that repeated calls on the function with the same argument will return the same result. However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call to this function must either return a deterministic result or must raise a dynamic error [err:FODC0003]. (See fn:collection.)
By default, this function is deterministic. This means that repeated calls on the function with the same argument will return the same result. However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call to this function must either return a deterministic result or must raise a dynamic error [err:FODC0003]. (See fn:uri-collection.)
It is no longer automatically an error if the resource (after decoding) contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. (See fn:unparsed-text.)
...the encoding inferred from the initial octets of the resource, or from implementation-defined heuristics as defined by the rules of the bin:infer-encoding function. (See fn:unparsed-text.)
The fact that the resolution of URIs is defined by a mapping in the dynamic context means that in effect, various aspects of the behavior of this function are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:... (See fn:unparsed-text.)
The fact that the resolution of URIs is defined by a mapping in the dynamic context means that in effect, various aspects of the behavior of this function are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:... (See fn:unparsed-binary.)
The collation used for matching names is implementation-defined, but must be the same as the collation used to ensure that the names of all environment variables are unique. (See fn:environment-variable.)
Except to the extent defined by these options, the precise process used to construct the XDM instance is implementation-defined. In particular, it is implementation-defined whether an XML 1.0 or XML 1.1 parser is used. (See fn:parse-xml.)
Options set in $options may be supplemented or modified based on configuration options defined externally using implementation-defined mechanisms. (See fn:parse-xml.)
Except as explicitly defined, the precise process used to construct the XDM instance is implementation-defined. In particular, it is implementation-defined whether an XML 1.0 or XML 1.1 parser is used. (See fn:parse-xml-fragment.)
If the second argument is omitted, or is supplied in the form of an output:serialization-parameters element, then the values of any serialization parameters that are not explicitly specified is implementation-defined, and may depend on the context. (See fn:serialize.)
A list of target namespaces identifying schema components to be used for validation. The way in which the processor locates schema components for the specified target namespaces is implementation-defined. A zero-length string denotes a no-namespace schema.... (See fn:xsd-validator.)
Set to the decimal value 1.0 or 1.1 to indicate which version of XSD is to be used. The default is implementation-defined. A processor may use a later version of XSD than the version requested, but must not use an earlier version.... (See fn:xsd-validator.)
The XSD specification allows a schema to be used for validation even when it contains unresolved references to absent schema components. It is implementation-defined whether this function allows the schema to be incomplete in this way. For example, some processors might allow validation using a schema in which an element declaration contains a reference to a type declaration that is not present in the schema, provided that the element declaration is never needed in the course of a particular validation episode. (See fn:xsd-validator.)
...error-details as map(*)*. This field is present only when (a) the option return-error-details was set to true, and (b) the supplied document was found to be invalid. The value is a sequence of maps, each containing details of one invalidity that was found. The precise details of the invalidities are implementation-defined, but they may include the following fields, if the information is available:... (See fn:xsd-validator.)
Because the [DOM: Living Standard] and [HTML: Living Standard] are not fixed, it is implementation-defined which versions are used. (See XDM Mapping from HTML DOM Nodes.)
If an implementation allows these nodes to be passed in via an API or similar mechanism, their behaviour is implementation-defined. (See XDM Mapping from HTML DOM Nodes.)
If the local name contains a character that is not a valid XML NameStartChar or NameChar, then an implementation-defined replacement string is used. The result must be a valid NCName. (See node-name Accessor.)
If the local name contains a character that is not a valid XML NameStartChar or NameChar, then an implementation-defined replacement string is used. The result must be a valid NCName. (See node-name Accessor.)
The input may contain deviations from the grammar of [RFC 7159], which are handled in an implementation-defined way. (Note: some popular extensions include allowing quotes on keys to be omitted, allowing a comma to appear after the last item in an array, allowing leading zeroes in numbers, and allowing control characters such as tab and newline to be present in unescaped form.) Since the extensions accepted are implementation-defined, an error may be raised [err:FOJS0001] if the input does not conform to the grammar. (See fn:parse-json.)
The supplied function is called to process the string value of any JSON number in the input. By default, numbers are processed by converting to xs:double using the XPath casting rules. Supplying the value xs:decimal#1 will instead convert to xs:decimal (which potentially retains more precision, but disallows exponential notation), while supplying a function that casts to (xs:decimal | xs:double) will treat the value as xs:decimal if there is no exponent, or as xs:double otherwise. Supplying the value fn:identity#1 causes the value to be retained unchanged as an xs:untypedAtomic. If the liberal option is false (the default), then the supplied number-parser is called if and only if the value conforms to the JSON grammar for numbers (for example, a leading plus sign and redundant leading zeroes are not allowed). If the liberal option is true then it is also called if the value conforms to an implementation-defined extension of this grammar. (See fn:parse-json.)
It is no longer automatically an error if the input contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. (See fn:json-doc.)
The input may contain deviations from the grammar of [RFC 7159], which are handled in an implementation-defined way. (Note: some popular extensions include allowing quotes on keys to be omitted, allowing a comma to appear after the last item in an array, allowing leading zeroes in numbers, and allowing control characters such as tab and newline to be present in unescaped form.) Since the extensions accepted are implementation-defined, an error may be raised (see below) if the input does not conform to the grammar. (See fn:json-to-xml.)
Default: Implementation-defined. (See fn:json-to-xml.)
Indicates that the resulting XDM instance must be typed; that is, the element and attribute nodes must carry the type annotations that result from validation against the schema given at D.2 Schema for the result of fn:json-to-xml, or against an implementation-defined schema if the liberal option has the value true. (See fn:json-to-xml.)
The result of the function will always be such that validation against this schema would succeed. However, it is implementation-defined whether the result is typed or untyped, that is, whether the elements and attributes in the returned tree have type annotations that reflect the result of validating against this schema. (See fn:csv-to-xml.)
Additional, implementation-defined options may be available, for example, to control aspects of the XML serialization, to specify the grammar start symbol, or to produce output formats other than XML. (See fn:invisible-xml.)
Default: The version given in the prolog of the library module; or implementation-defined if this is absent. (See fn:load-xquery-module.)
A sequence of URIs (in the form of xs:string values) which may be used or ignored in an implementation-defined way.... (See fn:load-xquery-module.)
Values for vendor-defined configuration options for the XQuery processor used to process the request. The key is the name of an option, expressed as a QName: the namespace URI of the QName should be a URI controlled by the vendor of the XQuery processor. The meaning of the associated value is implementation-defined. Implementations should ignore options whose names are in an unrecognized namespace. The option parameter conventions do not apply to this contained map.... (See fn:load-xquery-module.)
It is implementation-defined whether constructs in the library module are evaluated in the same execution scope as the calling module. (See fn:load-xquery-module.)
The library module that is loaded may import schema declarations using an import schema declaration. It is implementation-defined whether schema components in the in-scope schema definitions of the calling module are automatically added to the in-scope schema definitions of the dynamically loaded module. The in-scope schema definitions of the calling and called modules must be consistent, according to the rules defined in 2.2.5 Consistency Constraints ^XQ31. (See fn:load-xquery-module.)
The serialized result is written to persistent storage. This means that the fn:transform function has side-effects and becomes nondeterministic, so the option should be used with care, and the precise behavior may be implementation-defined. When this option is used, the URIs used for the base-output-uri and the URIs of any secondary result documents must be writable locations. (See fn:transform.)
Indicates whether any xsl:message instructions in the stylesheet are to be evaluated. The destination and formatting of any such messages is implementation-defined. (See fn:transform.)
Default: Implementation-defined. (See fn:transform.)
Default: Implementation-defined. (See fn:transform.)
If the implementation provides a way of writing or invoking functions with side-effects, this post-processing function might be used to save a copy of the result document to persistent storage. For example, if the implementation provides access to the EXPath File library [EXPath], then a serialized document might be written to filestore by calling the file:write function. Similar mechanisms might be used to issue an HTTP POST request that posts the result to an HTTP server, or to send the document to an email recipient. The semantics of calling functions with side-effects are entirely implementation-defined. (See fn:transform.)
Calls to fn:transform can potentially have side-effects even in the absence of the post-processing option, because the XSLT specification allows a stylesheet to invoke extension functions that have side-effects. The semantics in this case are implementation-defined. (See fn:transform.)
A string intended to be used as the static base URI of the principal stylesheet module. This value must be used if no other static base URI is available. If the supplied stylesheet already has a base URI (which will generally be the case if the stylesheet is supplied using stylesheet-node or stylesheet-location) then it is implementation-defined whether this parameter has any effect. If the value is a relative reference, it is resolved against the executable base URI^XP of the fn:transform function call.... (See fn:transform.)
Values for vendor-defined configuration options for the XSLT processor used to process the request. The key is the name of an option, expressed as a QName: the namespace URI of the QName should be a URI controlled by the vendor of the XSLT processor. The meaning of the associated value is implementation-defined. Implementations should ignore options whose names are in an unrecognized namespace. Default is the empty map.... (See fn:transform.)
It is implementation-defined whether the XSLT transformation is executed within the same execution scope as the calling code. (See fn:transform.)
XSLT 1.0 does not define any error codes, so this is the likely outcome with an XSLT 1.0 processor. XSLT 2.0 and 3.0 do define error codes, but some APIs do not expose them. If multiple errors are signaled by the transformation (which is most likely to happen with static errors) then the error code should where possible be that of one of these errors, chosen arbitrarily; the processor may make details of additional errors available to the application in an implementation-defined way. (See fn:transform.)
In addition, the values of $input, typically serialized and converted to an xs:string, and $label (if supplied and non-empty) may be output to an implementation-defined destination. (See fn:trace.)
Supposing that $v is an xs:decimal with the value 124.84, the function returns the value 124.84, while outputting a message such as "the value of $v is: 124.84" to an implementation-defined destination. The format of the message is also implementation-defined. (See fn:trace.)
Similar to fn:trace, the values of $input, typically serialized and converted to an xs:string, and $label (if supplied and non-empty) may be output to an implementation-defined destination. (See fn:message.)
If ST is xs:float or xs:double, then TV is the xs:decimal value, within the set of xs:decimal values that the implementation is capable of representing, that is numerically closest to SV. If two values are equally close, then the one that is closest to zero is chosen. If SV is too large to be accommodated as an xs:decimal, (see [XML Schema Part 2: Datatypes Second Edition] for implementation-defined limits on numeric values) a dynamic error is raised [err:FOCA0001]. If SV is one of the special xs:float or xs:double values NaN, INF, or -INF, a dynamic error is raised [err:FOCA0002]. (See Casting to xs:decimal.)
In casting to xs:decimal or to a type derived from xs:decimal, if the value is not too large or too small but nevertheless cannot be represented accurately with the number of decimal digits available to the implementation, the implementation may round to the nearest representable value or may raise a dynamic error [err:FOCA0006]. The choice of rounding algorithm and the choice between rounding and error behavior is implementation-defined. (See Casting from xs:string and xs:untypedAtomic.)
If ST is xs:decimal, xs:float or xs:double, then TV is SV with the fractional part discarded and the value converted to xs:integer. Thus, casting 3.1456 returns 3 while -17.89 returns -17. Casting 3.124E1 returns 31. If SV is too large to be accommodated as an integer, (see [XML Schema Part 2: Datatypes Second Edition] for implementation-defined limits on numeric values) a dynamic error is raised [err:FOCA0003]. If SV is one of the special xs:float or xs:double values NaN, INF, or -INF, a dynamic error is raised [err:FOCA0002]. (See Casting to xs:integer.)
The tz timezone database, available at http://www.iana.org/time-zones. It is implementation-defined which version of the database is used. (See IANA Timezone Database.)
Unicode Standard Annex #15: Unicode Normalization Forms. Ed. Mark Davis and Ken Whistler, Unicode Consortium. The current version is 16.0.0, dated 2024-08-14. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr15/. (See UAX #15.)
Unicode Standard Annex #29: Unicode Text Segmentation. Ed. Josh Hadley, Unicode Consortium. The current version is 16.0.0, dated 2024-08-28. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr29/. (See UAX #29.)
The Unicode Consortium, Reading, MA, Addison-Wesley, 2016. The Unicode Standard as updated from time to time by the publication of new versions. See http://www.unicode.org/standard/versions/ for the latest version and additional information on versions of the standard and of the Unicode Character Database. The version of Unicode to be used is implementation-defined, but implementations are recommended to use the latest Unicode version; currently, Version 9.0.0. (See The Unicode Standard.)
Unicode Technical Standard #10: Unicode Collation Algorithm. Ed. Mark Davis and Ken Whistler, Unicode Consortium. The current version is 16.0.0, dated 2024-08-22. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr10/. (See UTS #10.)
Unicode Technical Standard #35: Unicode Locale Data Markup Language. Ed Mark Davis et al, Unicode Consortium. The current version is 47, dated 2025-03-11. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr35/. (See UTS #35.)

XPath and XQuery Functions and Operators 4.0

W3C Editor's Draft 312 March 2026

Abstract

Status of this Document

Dedication

1 Introduction

1.2 Conformance

1.7 Options

1.9 Terminology

1.9.1 Atomic items

1.9.2 Strings, characters, and codepoints

1.9.3 Namespaces and URIs

1.9.4 Conformance terminology

1.9.5 Properties of functions

2 Processing sequences

2.2 Comparison functions

4 Processing numerics

4.7 Formatting numbers

4.7.1 Defining a decimal format

4.7.3 Syntax of the picture string

5 Processing strings

5.3 Comparison of strings

5.3.1 Collations

5.3.3 The Unicode Codepoint Collation

5.5 Functions based on substring matching

6 Regular expressions

6.1 Regular expression syntax

6.1.1 Processing model for regular expressions

6.1.5 Captured groups

9 Processing dates and times

9.1 Date and time types

9.9 Formatting dates and times

9.9.4 The date/time formatting functions

13 Processing function items

14 Processing maps

14.2 Composing and Decomposing Maps

17 External resources and data formats

17.5 Functions on CSV Data

G Implementation-defined features (Non-Normative)