XPath and XQuery Functions and Operators 4.0

1 Introduction

Changes in 4.0 (next)

If a section of this specification has been updated since version 3.1, an overview of the changes is provided, along with links to navigate to the next or previous change.
Sections with significant changes are marked with a ✭ symbol in the table of contents. New functions are indicated by ✚.

The purpose of this document is to define functions and operators for inclusion in XPath 4.0, XQuery 4.0, and XSLT 4.0. The exact syntax used to call these functions and operators is specified in [XML Path Language (XPath) 4.0], [XQuery 4.0: An XML Query Language] and [XSL Transformations (XSLT) Version 4.0].

This document defines three classes of functions:

General purpose functions, available for direct use in user-written queries, stylesheets, and XPath expressions, whose arguments and results are values defined by the [XQuery and XPath Data Model (XDM) 4.0].
Constructor functions, used for creating instances of a datatype from values of (in general) a different datatype. These functions are also available for general use; they are named after the datatype that they return, and they always take a single argument.
Functions that specify the semantics of operators defined in [XML Path Language (XPath) 4.0] and [XQuery 4.0: An XML Query Language]. These exist for specification purposes only, and are not intended for direct calling from user-written code.

[XML Schema Part 2: Datatypes Second Edition] defines a number of primitive and derived datatypes, collectively known as built-in datatypes. This document defines functions and operations on these datatypes as well as the other types (for example, nodes and sequences of nodes) defined in 2.7 Schema Information ^DM31 of the [XQuery and XPath Data Model (XDM) 4.0]. These functions and operations are available for use in [XML Path Language (XPath) 4.0], [XQuery 4.0: An XML Query Language] and any other host language that chooses to reference them. In particular, they may be referenced in future versions of XSLT and related XML standards.

[XSD 1.1 Part 2] adds to the datatypes defined in [XML Schema Part 2: Datatypes Second Edition]. It introduces a new derived type xs:dateTimeStamp, and it incorporates as built-in types the two types xs:yearMonthDuration and xs:dayTimeDuration which were previously XDM additions to the type system. In addition, XSD 1.1 clarifies and updates many aspects of the definitions of the existing datatypes: for example, it extends the value space of xs:double to allow both positive and negative zero, and extends the lexical space to allow +INF; it modifies the value space of xs:Name to permit additional Unicode characters; it allows year zero and disallows leap seconds in xs:dateTime values; and it allows any character string to appear as the value of an xs:anyURI item. Implementations of this specification may support either XSD 1.0 or XSD 1.1 or both.

In some cases, this specification references XSD for the semantics of operations such as the effect of matching using regular expressions, or conversion of atomic items to strings. In most such cases there is no intended technical difference between the XSD 1.0 and XSD 1.1 specifications, but the 1.1 version often provides clearer explanations and sometimes also corrects technical errors. In such cases this specification often chooses to reference the XSD 1.1 specification. This should not be taken as implying that it is necessary to invoke an XSD 1.1 processor.

References to specific sections of some of the above documents are indicated by cross-document links in this document. Each such link consists of a pointer to a specific section followed a superscript specifying the linked document. The superscripts have the following meanings: XQ [XQuery 4.0: An XML Query Language], XT [XSL Transformations (XSLT) Version 4.0], XP [XML Path Language (XPath) 4.0], and DM [XQuery and XPath Data Model (XDM) 4.0].

1.2 Conformance

Changes in 4.0 (next | previous)

Higher-order functions are no longer an optional feature. [Issue 205 PR 326 1 February 2023]

This recommendation contains a set of function specifications. It defines conformance at the level of individual functions. An implementation of a function conforms to a function specification in this recommendation if all the following conditions are satisfied:

For all combinations of valid inputs to the function (both explicit arguments and implicit context dependencies), the result of the function meets the mandatory requirements of this specification.
For all invalid inputs to the function, the implementation raises (in some way appropriate to the calling environment) a dynamic error.
For a sequence of calls within the same execution scope, the requirements of this recommendation regarding the determinism of results are satisfied (see 1.9.5 Properties of functions).

Other recommendations (“host languages”) that reference this document may dictate:

Subsets or supersets of this set of functions to be available in particular environments;
Mechanisms for invoking functions, supplying arguments, initializing the static and dynamic context, receiving results, and handling errors;
A concrete realization of concepts such as execution scope;
Which versions of other specifications referenced herein (for example, XML, XSD, or Unicode) are to be used.

Any behavior that is discretionary (implementation-defined or implementation-dependent) in this specification may be constrained by a host language.

Note:

Adding such constraints in a host language, however, is discouraged because it makes it difficult to reuse implementations of the function library across host languages.

This specification allows flexibility in the choice of versions of specifications on which it depends:

It is implementation-defined which version of Unicode is supported, but it is recommended that the most recent version of Unicode be used.
It is implementation-defined whether the type system is based on XML Schema 1.0 or XML Schema 1.1.
It is implementation-defined whether definitions that rely on XML (for example, the set of valid XML characters) should use the definitions in XML 1.0 or XML 1.1.

Note:

The XML Schema 1.1 recommendation introduces one new concrete datatype: xs:dateTimeStamp; it also incorporates the types xs:dayTimeDuration, xs:yearMonthDuration, and xs:anyAtomicType which were previously defined in earlier versions of [XQuery and XPath Data Model (XDM) 4.0]. Furthermore, XSD 1.1 includes the option of supporting revised definitions of types such as xs:NCName based on the rules in XML 1.1 rather than 1.0.

The [XQuery and XPath Data Model (XDM) 4.0] allows flexibility in the repertoire of characters permitted during processing that goes beyond even what version of XML is supported. A processor may allow the user to construct nodes and atomic items that contain characters not allowed by any version of XML. [Definition: [Definition: [Definition] A permitted character is one within the repertoire accepted by the implementation.]

In this document, text labeled as an example or as a note is provided for explanatory purposes and is not normative.

1.7 Options

Changes in 4.0 (next | previous)

Use of an option keyword that is not defined in the specification and is not known to the implementation now results in a dynamic error; previously it was ignored. [Issue 1019 PR 1059 26 March 2024]

As a matter of convention, a number of functions defined in this document take a parameter whose value is a map, defining options controlling the detail of how the function is evaluated. Maps are a new datatype introduced in XPath 3.1.

For example, the function fn:xml-to-json has an options parameter allowing specification of whether the output is to be indented. A call might be written:

xml-to-json($input, { 'indent': true() })

[Definition: [Definition: [Definition] Functions that take an options parameter adopt common conventions on how the options are used. These are referred to as the option parameter conventions. These rules apply only to functions that explicitly refer to them.]

Where a function adopts the option parameter conventions, the following rules apply:

The value of the relevant argument must be a map. The entries in the map are referred to as options: the key of the entry is called the option name, and the associated value is the option value. Option names defined in this specification are always strings (single xs:string values). Option values may be of any type.
The type of the options parameter in the function signature is always given as map(*).
Although option names are described above as strings, the actual key may be any value that is the same key as the required string. For example, instances of xs:untypedAtomic or xs:anyURI are equally acceptable.
Note:
This means that the implementation of the function can check for the presence and value of particular options using the functions map:contains and/or map:get.
Implementations may attach an implementation-defined meaning to options in the map that are not described in this specification. These options should use values of type xs:QName as the option names, using an appropriate namespace.
If an option is present whose key is not described in the specification, then a type error [err:XPTY0004]^XPmust be raised unless either (a) the key is recognized by the implementation, or (b) the key is a value of type xs:QName with a non-absent namespace.
All entries in the options map are optional, and supplying the empty map has the same effect as omitting the relevant argument in the function call, assuming this is permitted.
The ordering of the options map is immaterial.
For each named option, the function specification defines a required type for the option value. The value that is actually supplied in the map is converted to this required type using the coercion rules^XP. This will result in an error (typically [err:XPTY0004]^XP or [err:FORG0001]^FO) if conversion of the supplied value to the required type is not possible. A type error also occurs if this conversion delivers a coerced function whose invocation fails with a type error. A dynamic error occurs if the supplied value after conversion is not one of the permitted values for the option in question: the error codes for this error are defined in the specification of each function.
Note:
It is the responsibility of each function implementation to invoke this conversion; it does not happen automatically as a consequence of the function-calling rules.
In cases where the value of an option is itself a map, the specification of the particular function must indicate whether or not these rules apply recursively to the contents of that map.

1.9 Terminology

Changes in 4.0 (next | previous)

The term atomic value has been replaced by atomic item. [Issue 1337 PR 1361 2 August 2024]

The terminology used to describe the functions and operators on types defined in [XML Schema Part 2: Datatypes Second Edition] is defined in the body of this specification. The terms defined in this section are used in building those definitions.

Note:

Following in the tradition of [XML Schema Part 2: Datatypes Second Edition], the terms type and datatype are used interchangeably.

1.9.1 Atomic items

The following definitions are adopted from [XQuery and XPath Data Model (XDM) 4.0].

[Definition: [Definition: [Definition] An atomic item is a pair (T, D) where T (the type annotation) is an atomic type, and D (the datum) is a point in the value space of T.]
[Definition: [Definition: [Definition] A primitive type is one of the 19 primitive atomic types defined in 3.2 Primitive datatypes^XS2 of [XML Schema Part 2: Datatypes Second Edition], or the type xs:untypedAtomic defined in [XQuery and XPath Data Model (XDM) 4.0].]
[Definition: [Definition: [Definition] The datum of an atomic item is a point in the value space of its type, which is also a point in the value space of the primitive type from which that type is derived.] There are 20 primitive atomic types (19 defined in XSD, plus xs:untypedAtomic), and these have non-overlapping value spaces, so each datum belongs to exactly one primitive atomic type.
[Definition: [Definition: [Definition] The type annotation of an atomic item is the most specific atomic type that it is an instance of (it is also an instance of every type from which that type is derived).]

Note:

The term value space is defined in [XSD 1.1 Part 2] as a set of values. The term datum is used here in preference to value, because value has a different meaning in this data model.

1.9.2 Strings, characters, and codepoints

This document uses the terms string, character, and codepoint with meanings that are normatively defined in [XQuery and XPath Data Model (XDM) 4.0], and which are paraphrased here for ease of reference:

[Definition: [Definition: [Definition] A character is an instance of the Char^XML production of [Extensible Markup Language (XML) 1.0 (Fifth Edition)].]

Note:

This definition excludes Unicode characters in the surrogate blocks as well as U+FFFE and U+FFFF, while including characters with codepoints greater than U+FFFF which some programming languages treat as two characters. The valid characters are defined by their codepoints, and include some whose codepoints have not been assigned by the Unicode consortium to any character.

[Definition: [Definition: [Definition] A string is a sequence of zero or more characters, or equivalently, a value in the value space of the xs:string datatype.]

[Definition: [Definition: [Definition] A codepoint is an integer assigned to a character by the Unicode consortium, or reserved for future assignment to a character.]

Note:

The set of codepoints is thus wider than the set of characters.

This specification spells “codepoint” as one word; the Unicode specification spells it as “code point”. Equivalent terms found in other specifications are “character number” or “code position”. See [Character Model for the World Wide Web 1.0: Fundamentals]

Because these terms appear so frequently, they are hyperlinked to the definition only when there is a particular desire to draw the reader’s attention to the definition; the absence of a hyperlink does not mean that the term is being used in some other sense.

It is implementation-defined which version of [The Unicode Standard] is supported, but it is recommended that the most recent version of Unicode be used.

This specification adopts the Unicode notation U+xxxx to refer to a codepoint by its hexadecimal value (always four to six hexadecimal digits). This is followed where appropriate by the official Unicode character name and its graphical representation: for example U+20AC (EURO SIGN, €) .

Unless explicitly stated, the functions in this document do not ensure that any returned xs:string values are normalized in the sense of [Character Model for the World Wide Web 1.0: Fundamentals].

Note:

In functions that involve character counting such as fn:substring, fn:string-length and fn:translate, what is counted is the number of XML characters in the string (or equivalently, the number of Unicode codepoints). Some implementations may represent a codepoint above U+FFFF using two 16-bit values known as a surrogate pair. A surrogate pair counts as one character, not two.

Wherever encoding names (such as UTF-8 and UTF-16) are used in this specification, they are compared without regard to case: the strings "UTF-8" and "utf-8" both refer to the same encoding.

1.9.3 Namespaces and URIs

This document uses the phrase “namespace URI” to identify the concept identified in [Namespaces in XML] as “namespace name”, and the phrase “local name” to identify the concept identified in [Namespaces in XML] as “local part”.

It also uses the term “expanded-QName” defined below.

[Definition: [Definition: [Definition] An expanded-QName is a value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 4.0]): that is, a triple containing namespace prefix (optional), namespace URI (optional), and local name. Two expanded QNames are equal if the namespace URIs are the same (or both absent) and the local names are the same. The prefix plays no part in the comparison, but is used only if the expanded QName needs to be converted back to a string.]

The term URI is used as follows:

[Definition: [Definition: [Definition] Within this specification, the term URI refers to Universal Resource Identifiers as defined in [RFC 3986] and extended in [RFC 3987] with a new name IRI. The term URI Reference, unless otherwise stated, refers to a string in the lexical space of the xs:anyURI datatype as defined in [XML Schema Part 2: Datatypes Second Edition].]

Note:

This means, in practice, that where this specification requires a “URI Reference”, an IRI as defined in [RFC 3987] will be accepted, provided that other relevant specifications also permit an IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as “Base URI” that are defined or referenced across the whole family of XML specifications. Note also that the definition of xs:anyURI is a wider definition than the definition in [RFC 3987]; for example it does not require non-ASCII characters to be escaped.

1.9.4 Conformance terminology

In this specification:

The auxiliary verb must, when rendered in small capitals, indicates a precondition for conformance.
- When the sentence relates to an implementation of a function (for example "All implementations must recognize URIs of the form ...") then an implementation is not conformant unless it behaves as stated.
- When the sentence relates to the result of a function (for example "The result must have the same type as $arg") then the implementation is not conformant unless it delivers a result as stated.
- When the sentence relates to the arguments to a function (for example "The value of $argmust be a valid regular expression") then the implementation is not conformant unless it enforces the condition by raising a dynamic error whenever the condition is not satisfied.
The auxiliary verb may, when rendered in small capitals, indicates optional or discretionary behavior. The statement “An implementation may do X” implies that it is implementation-dependent whether or not it does X.
The auxiliary verb should, when rendered in small capitals, indicates desirable or recommended behavior. The statement “An implementation should do X” implies that it is desirable to do X, but implementations may choose to do otherwise if this is judged appropriate.

[Definition: [Definition: [Definition] Where behavior is described as implementation-defined, variations between processors are permitted, but a conformant implementation must document the choices it has made.]

[Definition: [Definition: [Definition] Where behavior is described as implementation-dependent, variations between processors are permitted, and conformant implementations are not required to document the choices they have made.]

Note:

Where this specification states that something is implementation-defined or implementation-dependent, it is open to host languages to place further constraints on the behavior.

1.9.5 Properties of functions

This section is concerned with the question of whether two calls on a function, with the same arguments, may produce different results.

In this section the term function, unless otherwise specified, applies equally to function definitions^XP (which can be the target of a static function call) and function items^DM (which can be the target of a dynamic function call).

[Definition: [Definition: [Definition] An execution scope is a sequence of calls to the function library during which certain aspects of the state are required to remain invariant. For example, two calls to fn:current-dateTime within the same execution scope will return the same result. The execution scope is defined by the host language that invokes the function library.] In XSLT, for example, any two function calls executed during the same transformation are in the same execution scope (except that static expressions, such as those used in use-when attributes, are in a separate execution scope).

The following definition explains more precisely what it means for two function calls to return the same result:

[Definition: [Definition: [Definition] Two values $V1 and $V2 are defined to be identical if they contain the same number of items and the items are pairwise identical. Two items are identical if and only if one of the following conditions applies:]

Both items are atomic items, of precisely the same type, and the values are equal as defined using the eq operator, using the Unicode codepoint collation when comparing strings.
Both items are nodes, and represent the same node.
Both items are maps, both maps have the same number of entries, and for every entry E₁ in the first map there is an entry E₂ in the second map such that the keys of E₁ and E₂ are the same key, and the corresponding values V₁ and V₂ are identical.
Both items are arrays, both arrays have the same number of members, and the members are pairwise identical.
Both items are function items, neither item is a map or array, and the two function items have the same function identity. The concept of function identity is explained in [XQuery and XPath Data Model (XDM) 4.0] section 8.1 Function Items.

Some functions produce results that depend not only on their explicit arguments, but also on the static and dynamic context.

[Definition: [Definition: [Definition] A function definition^XP may have the property of being context-dependent: the result of such a function depends on the values of properties in the static and dynamic evaluation context of the caller as well as on the actual supplied arguments (if any). A function definition may be context-dependent for some arities in its arity range, and context-independent for others: for example fn:name#0 is context-dependent while fn:name#1 is context-independent.]

[Definition: [Definition: [Definition] A function definition^XP that is not context-dependent is called context-independent.]

The main categories of context-dependent functions are:

Functions that explicitly deliver the value of a component of the static or dynamic context, for example fn:static-base-uri, fn:default-collation, fn:position, or fn:last.
Functions with an optional parameter whose default value is taken from the static or dynamic context of the caller, usually either the context value (for example, fn:node-name) or the default collation (for example, fn:index-of).
Functions that use the static context of the caller to expand or disambiguate the values of supplied arguments: for example fn:doc expands its first argument using the static base URI of the caller, and xs:QName expands its first argument using the in-scope namespaces of the caller.

[Definition: [Definition: [Definition] A function is focus-dependent if its result depends on the focus^XP31 (that is, the context item, position, or size) of the caller.]

[Definition: [Definition: [Definition] A function that is not focus-dependent is called focus-independent.]

Note:

Some functions depend on aspects of the dynamic context that remain invariant within an execution scope, such as the implicit timezone. Formally this is treated in the same way as any other context dependency, but internally, the implementation may be able to take advantage of the fact that the value is invariant.

Note:

User-defined functions in XQuery and XSLT may depend on the static context of the function definition (for example, the in-scope namespaces) and also in a limited way on the dynamic context (for example, the values of global variables). However, the only way they can depend on the static or dynamic context of the caller — which is what concerns us here — is by defining optional parameters whose default values are context-dependent.

Note:

Because the focus is a specific part of the dynamic context, all focus-dependent functions are also context-dependent. A context-dependent function, however, may be either focus-dependent or focus-independent.

A function definition that is context-dependent can be used as the target of a named function reference, can be partially applied, and can be found using fn:function-lookup. The principle in such cases is that the static context used for the function evaluation is taken from the static context of the named function reference, partial function application, or the call on fn:function-lookup; and the dynamic context for the function evaluation is taken from the dynamic context of the evaluation of the named function reference, partial function application, or the call of fn:function-lookup. These constructs all deliver a function item^DM having a captured context based on the static and dynamic context of the construct that created the function item. This captured context forms part of the closure of the function item.

The result of a dynamic call to a function item never depends on the static or dynamic context of the dynamic function call, only (where relevant) on the captured context held within the function item itself.

The fn:function-lookup function is a special case because it is potentially dependent on everything in the static and dynamic context. This is because the static and dynamic context of the call to fn:function-lookupform the captured context of the function item that fn:function-lookup returns.

[Definition: [Definition: [Definition] A function that is guaranteed to produce identical results from repeated calls within a single execution scope if the explicit and implicit arguments are identical is referred to as deterministic.]

[Definition: [Definition: [Definition] A function that is not deterministic is referred to as nondeterministic.]

All functions defined in this specification are deterministic unless otherwise stated. Exceptions include the following:

[Definition: [Definition: [Definition] Some functions (such as fn:in-scope-prefixes, fn:load-xquery-module, and fn:unordered) produce result sequences or result maps in an implementation-defined or implementation-dependent order. In such cases two calls with the same arguments are not guaranteed to produce the results in the same order. These functions are said to be nondeterministic with respect to ordering.]
Some functions (such as fn:analyze-string, fn:parse-xml, fn:parse-xml-fragment, fn:parse-html, and fn:json-to-xml) construct a tree of nodes to represent their results. There is no guarantee that repeated calls with the same arguments will return the same identical node (in the sense of the is operator). However, if non-identical nodes are returned, their content will be the same in the sense of the fn:deep-equal function. Such a function is said to be nondeterministic with respect to node identity.
Some functions (such as fn:doc and fn:collection) create new nodes by reading external documents. Such functions are guaranteed to be deterministic by default (some such functions have an option "stable":false() that makes them nondeterministic as a user option, and implementations may also provide configuration options to change the default).

Where the results of a function are described as being (to a greater or lesser extent) implementation-defined or implementation-dependent, this does not by itself remove the requirement that the results should be deterministic: that is, that repeated calls with the same explicit and implicit arguments must return identical results.

[Definition: [Definition: [Definition] The function fn:concat is defined to be variadic: it accepts any number of arguments. No other function has this property.]

2 Processing sequences

A sequence is an ordered collection of zero or more items. An item is a node, an atomic item, or a function, such as a map or an array. The terms sequence and item are defined formally in [XQuery 4.0: An XML Query Language] and [XML Path Language (XPath) 4.0].

2.2 Comparison functions

The functions in this section perform comparisons between the items in one or more sequences.

Many of these functions require atomic items to be compared for equality.

[Definition: [Definition: [Definition] Two atomic items A and B are said to be contextually equal if the function call fn:compare(A, B) returns zero when evaluated with a specified or context-determined collation and implicit timezone.] If two values are not contextually equal, they are considered to be contextually unequal, even in the case when comparing them using fn:compare raises an error.

Note:

Except where explicitly stated otherwise, an appeal to contextual equality implies that NaN is treated as equal to NaN.

Function	Meaning
`fn:atomic-equal`	Determines whether two atomic items are equal, under the rules used for comparing keys in a map.
`fn:compare`	Returns `-1`, `0`, or `1`, depending on whether the first value is less than, equal to, or greater than the second value.
`fn:contains-subsequence`	Determines whether one sequence contains another as a contiguous subsequence, using a supplied callback function to compare items.
`fn:deep-equal`	This function assesses whether two sequences are deep-equal to each other. To be deep-equal, they must contain items that are pairwise deep-equal; and for two items to be deep-equal, they must either be atomic items that compare equal, or nodes of the same kind, with the same name, whose children are deep-equal, or maps with matching entries, or arrays with matching members.
`fn:distinct-values`	Returns the values that appear in a sequence, with duplicates eliminated.
`fn:duplicate-values`	Returns the values that appear in a sequence more than once.
`fn:ends-with-subsequence`	Determines whether one sequence ends with another, using a supplied callback function to compare items.
`fn:index-of`	Returns a sequence of positive integers giving the positions within the sequence `$input` of items that are contextually equal to `$target`.
`fn:starts-with-subsequence`	Determines whether one sequence starts with another, using a supplied callback function to compare items.

2.3 Asserting cardinality

The following functions assert the cardinality of their sequence arguments.

Function	Meaning
`fn:exactly-one`	Returns `$input` if it contains exactly one item. Otherwise, raises an error.
`fn:one-or-more`	Returns `$input` if it contains one or more items. Otherwise, raises an error.
`fn:zero-or-one`	Returns `input` if it contains zero or one items. Otherwise, raises an error.

The functions fn:zero-or-one, fn:one-or-more, and fn:exactly-one defined in this section, check that the cardinality of a sequence is in the expected range. These functions were originally defined for use with processors that enforced strict static typing. For example, the function call fn:remove($seq, fn:index-of($seq2, 'abc')) requires the result of the call on fn:index-of to be a singleton integer, but the static type system could not infer this; writing the expression as fn:remove($seq, fn:exactly-one(fn:index-of($seq2, 'abc'))) would provide a suitable static type at query analysis time, and ensure that the length of the sequence is correct with a dynamic check at query execution time.

The 4.0 specifications no longer define strict static typing as an option, so the utility of these functions has declined. They may still serve a purpose, however, as assertions signaling expected preconditions both to the processor and to anyone reading the code.

The type signatures for these functions deliberately declare the argument type as item()*, permitting a sequence of any length. A more restrictive signature would defeat the purpose of the function, which is to defer cardinality checking until query execution time.

2.3.1 fn:exactly-one

Summary

Returns $input if it contains exactly one item. Otherwise, raises an error.

Signature

`fn:exactly-one`(
`$input`	`as` `item()*`
) `as` `item()`

Properties

This function is deterministic, context-independent, and focus-independent.

Rules

Except in error cases, the function returns $input unchanged.

Formal Equivalent

The effect of the function is equivalent to the result of the following XPath expression, except in error cases.

if (count($input) eq 1) 
then $input 
else error(parse-QName('Q{http://www.w3.org/2005/xqt-errors}FORG0005'))

if (count($input) eq 1) 
then $input 
else error(#Q{http://www.w3.org/2005/xqt-errors}FORG0005))

Error Conditions

A dynamic error is raised [err:FORG0005] if $input is an empty sequence or a sequence containing more than one item.

2.3.2 fn:one-or-more

Summary

Returns $input if it contains one or more items. Otherwise, raises an error.

Signature

`fn:one-or-more`(
`$input`	`as` `item()*`
) `as` `item()+`

Properties

This function is deterministic, context-independent, and focus-independent.

Rules

Except in error cases, the function returns $input unchanged.

Formal Equivalent

The effect of the function is equivalent to the result of the following XPath expression, except in error cases.

if (count($input) ge 1) 
then $input 
else error(parse-QName('Q{http://www.w3.org/2005/xqt-errors}FORG0004'))

if (count($input) ge 1) 
then $input 
else error(#Q{http://www.w3.org/2005/xqt-errors}FORG0004))

Error Conditions

A dynamic error is raised [err:FORG0004] if $input is an empty sequence.

2.3.3 fn:zero-or-one

Summary

Returns input if it contains zero or one items. Otherwise, raises an error.

Signature

`fn:zero-or-one`(
`$input`	`as` `item()*`
) `as` `item()?`

Properties

This function is deterministic, context-independent, and focus-independent.

Rules

Except in error cases, the function returns $input unchanged.

Formal Equivalent

The effect of the function is equivalent to the result of the following XPath expression, except in error cases.

if (count($input) le 1) 
then $input 
else error(parse-QName('Q{http://www.w3.org/2005/xqt-errors}FORG0003'))

if (count($input) le 1) 
then $input 
else error(#Q{http://www.w3.org/2005/xqt-errors}FORG0003))

Error Conditions

A dynamic error is raised [err:FORG0003] if $input contains more than one item.

2.5 Basic higher-order functions

The following functions take function items as an argument.

Function	Meaning
`fn:apply`	Makes a dynamic call on a function with an argument list supplied in the form of an array.
`fn:do-until`	Processes a supplied value repeatedly, continuing when some condition is false, and returning the value that satisfies the condition.
`fn:every`	Returns `true` if every item in the input sequence matches a supplied predicate.
`fn:filter`	Returns those items from the sequence `$input` for which the supplied function `$predicate` returns `true`.
`fn:fold-left`	Processes the supplied sequence from left to right, applying the supplied function repeatedly to each item in turn, together with an accumulated result value.
`fn:fold-right`	Processes the supplied sequence from right to left, applying the supplied function repeatedly to each item in turn, together with an accumulated result value.
`fn:for-each`	Applies the function item `$action` to every item from the sequence `$input` in turn, returning the concatenation of the resulting sequences in order.
`fn:for-each-pair`	Applies the function item `$action` to successive pairs of items taken one from `$input1` and one from `$input2`, returning the concatenation of the resulting sequences in order.
`fn:highest`	Returns a value that is greater than or equal to every other value appearing in the input sequence.
`fn:index-where`	Returns the positions in an input sequence of items that match a supplied predicate.
`fn:lowest`	Returns those items from a supplied sequence that have the lowest value of a sort key, where the sort key can be computed using a caller-supplied function.
`fn:partial-apply`	Performs partial application of a function item by binding values to selected arguments.
`fn:partition`	Partitions a sequence of items into a sequence of non-empty arrays containing the same items, starting a new partition when a supplied condition is true.
`fn:scan-left`	Produces the sequence of successive partial results from the evaluation of `fn:fold-left` with the same arguments.
`fn:scan-right`	Produces the sequence of successive partial results from the evaluation of `fn:fold-right` with the same arguments.
`fn:some`	Returns `true` if at least one item in the input sequence matches a supplied predicate.
`fn:sort`	Sorts a supplied sequence, based on the value of a sort key supplied as a function.
`fn:sort-by`	Sorts a supplied sequence, based on the value of a number of sort keys supplied as functions.
`fn:sort-with`	Sorts a supplied sequence, according to the order induced by the supplied comparator functions.
`fn:subsequence-where`	Returns a contiguous sequence of items from `$input`, with the start and end points located by applying predicates.
`fn:take-while`	Returns items from the input sequence prior to the first one that fails to match a supplied predicate.
`fn:transitive-closure`	Returns all the GNodes reachable from a given start GNode by applying a supplied function repeatedly.
`fn:while-do`	Processes a supplied value repeatedly, continuing while some condition remains true, and returning the first value that does not satisfy the condition.

With all these functions, if the caller-supplied function fails with a dynamic error, this error is propagated as an error from the higher-order function itself.

2.5.18 fn:sort-by

Changes in 4.0 (next | previous)

New in 4.0. [Issue 1085 PR 2001 19 May 2025]

Summary

Sorts a supplied sequence, based on the value of a number of sort keys supplied as functions.

Signature

`fn:sort-by`(
`$input`	`as` `item()*`,
`$keys`	`as` `record(key? as (fn(item()) as xs:anyAtomicType)?, collation? as xs:string?, order? as enum('ascending', 'descending')?)`
) `as` `item()*`

Properties

This function is deterministic, context-dependent, and focus-independent. It depends on collations.

Rules

The result of the function is a sequence that contains all the items from $input, typically in a different order, the order being defined by the supplied sort key definitions.

A sort key definition is a record with three parts:

key: A sort key function, which is applied to each item in the input sequence to determine a sort key value. If no function is supplied, the default is fn:data#1, which atomizes the item.
collation: A collation, which is used when comparing sort key values that are of type xs:string or xs:untypedAtomic. If no collation is supplied, the default collation from the static context is used.
When comparing values of types other than xs:string or xs:untypedAtomic, the collation is ignored (but an error may be reported if it is invalid). For more information see 5.3.7 Choosing a collation.
order: An order direction, either "ascending" or "descending". The default is "ascending".

The number of sort key definitions is determined by the number of records supplied in the $keys argument. If the argument is absent or empty, the default is a single sort key definition using the function data#1, using the default collation from the static context, and with order ascending.

The result of the fn:sort-by function is obtained as follows:

The result sequence contains the same items as the input sequence $input, but generally in a different order.
The sort key definitions are established as described above. The sort key definitions are in major-to-minor order. That is, the position of two items $A and $B in the result sequence is determined first by the relative magnitude of their primary sort key values, which are computed by evaluating the sort key function in the first sort key definition. If those two sort key values are equal, then the position is determined by the relative magnitude of their secondary sort key values, computed by evaluating the sort key function in the second sort key definition, and so on.
When a pair of corresponding sort key values of $A and $B are found to be not equal, then $A precedes $B in the result sequence if both the following conditions are true, or if both conditions are false:
1. The sort key value for $A is less than the sort key value for $B, as defined below.
2. The order direction in the corresponding sort key definition is "ascending".
If all the sort key values for $A and $B are pairwise equal, then $A precedes $B in the result sequence if and only if $A precedes $B in the input sequence.
Note:
That is, the sort is stable.
Each sort key value for a given item is obtained by applying the sort key function of the corresponding sort key definition to that item. The result of this function is in the general case a sequence of atomic items. Two sort key values $a and $b are compared as follows:
1. Let $C be the collation in the corresponding sort key definition.
2. Let $REL be the result of evaluating op:lexicographic-compare($key($A), $key($B), $C) where op:lexicographic-compare($a, $b, $C) is defined as follows:
```
if (empty($a) and empty($b)) then 0 
else if (empty($a)) then -1
else if (empty($b)) then +1
else let $rel = op:simple-compare(head($a), head($b), $C)
     return if ($rel eq 0)
            then op:lexicographic-compare(tail($a), tail($b), $C)
            else $rel
```
3. Here op:simple-compare($k1, $k2) is defined as follows:
```
if ($k1 instance of (xs:string | xs:anyURI | xs:untypedAtomic)
    and $k2 instance of (xs:string | xs:anyURI | xs:untypedAtomic))
then compare($k1, $k2, $C)
else if ($k1 instance of xs:numeric and $k2 instance of xs:numeric)
then compare($k1, $k2)
else if ($k1 eq $k2) then 0
else if ($k2 lt $k2) then -1
else +1
```
  Note:
  This raises an error if two keys are not comparable, for example if one is a string and the other is a number, or if both belong to a non-ordered type such as xs:QName.
4. If $REL is zero, then the two sort key values are deemed equal; if $REL is -1 then $a is deemed less than $b, and if $REL is +1 then $a is deemed greater than $b

Error Conditions

If the set of computed sort keys contains values that are not comparable using the lt operator then the sort operation will fail with a type error ([err:XPTY0004]^XP).

Notes

The function is a generalization of the fn:sort function available in 3.1, which is retained for compatibility. The enhancements allow multiple sort keys to be defined, each potentially with a different collation, and allow sorting in descending order.

If the sort key for an item evaluates to the empty sequence, the effect of the rules is that this item precedes any value for which the key is non-empty. This is equivalent to the effect of the XQuery option empty least. The effect of the option empty greatest can be achieved by adding an extra sort key definition with { 'key': fn { empty(K(.) } }: when comparing boolean sort keys, false precedes true.

Examples

Expression:	`sort-by((1, 4, 6, 5, 3), ())`
Result:	`1, 3, 4, 5, 6`
Expression:	`sort-by((1, 4, 4e0, 6, 5, 3), { 'order': 'descending' })`
Result:	`6, 5, 4, 4e0, 3, 1`
Expression:	`sort-by((1, -2, 5, 10, -10, 10, 8), { 'key': abs#1 })`
Result:	`1, -2, 5, 8, 10, -10, 10`
Expression:	let $SWEDISH := collation({ 'lang': 'se' }) return sort-by($in, { 'collation': $SWEDISH }) let $SWEDISH := collation({ 'lang': 'se' }) return sort-by(//name, { 'collation': $SWEDISH })
Result:	The names in `//name` sorted using Swedish collation.
Expression:	sort-by(//employee, { 'key': fn { name ! (last, first) } })
Result:	Sorts a sequence of employees by last name as the major sort key and first name as the minor sort key, using the default collation
Expression:	sort-by( $employees, ({ 'key': fn { name/last }, 'collation': collation({ 'lang': 'se' }) }, { 'key': fn { xs:decimal(salary) }, 'order': 'descending' })) sort-by( //employee, ({ 'key': fn { name/last }, 'collation': collation({ 'lang': 'se' }) }, { 'key': fn { xs:decimal(salary) }, 'order': 'descending' }))
Result:	Sorts a sequence of employees first by increasing last name (using Swedish collation order) and then by decreasing salary

4 Processing numerics

This section specifies arithmetic operators on the numeric datatypes defined in [XML Schema Part 2: Datatypes Second Edition].

4.6 Formatting integers

Function	Meaning
`fn:format-integer`	Formats an integer according to a given picture string, using the conventions of a given natural language if specified.

4.6.1 fn:format-integer

Changes in 4.0 (next | previous)

The function has been extended to allow output in a radix other than 10, for example in hexadecimal. [Issue 241 PR 434 7 April 2023]

Summary

Formats an integer according to a given picture string, using the conventions of a given natural language if specified.

Signature

`fn:format-integer`(
`$value`	`as` `xs:integer?`,
`$picture`	`as` `xs:string`,
`$language`	`as` `xs:string?`	`:=` `()`
) `as` `xs:string`

Properties

The two-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on default language.

The three-argument form of this function is deterministic, context-independent, and focus-independent.

Rules

If $value is the empty sequence, the function returns a zero-length string.

In all other cases, the $picture argument describes the format in which $value is output.

The rules that follow describe how non-negative numbers are output. If the value of $value is negative, the rules below are applied to the absolute value of $value, and a minus sign is prepended to the result.

The value of $picture consists of the following, in order:

An optional radix, which is an integer in the range 2 to 36, written using ASCII digits (0-9) without any leading zero;
A circumflex (^), which is present if the radix is present, and absent otherwise.
A circumflex is recognized as marking the presence of a radix only if (a) it is immediately preceded by an integer in the range 2 to 36, and (b) it is followed (somewhere within the primary format token) by an "X" or "x". In other cases, the circumflex is treated as a grouping separator. For example, the picture 9^000 outputs the number 2345 as "2^345", whereas 9^XXX outputs "3185". This rule is to ensure backwards compatibility.
A primary format token. This is always present and must not be zero-length.
An optional format modifier.
If the string contains one or more semicolons then the last semicolon is taken as terminating the primary format token, and everything that follows is taken as the format modifier; if the string contains no semicolon then the format modifier is taken to be absent (which is equivalent to supplying a zero-length string).

If a radix is present, then the primary format token must follow the rules for a digit-pattern.

The primary format token is classified as one of the following:

A digit-pattern made up of optional-digit-signs, mandatory-digit-signs, and grouping-separator-signs.
- The optional-digit-sign is the character #.
- If the radix is absent, then a mandatory-digit-sign is a character in Unicode category Nd. All mandatory-digit-signs within the format token must be from the same digit family, where a digit family is a sequence of ten consecutive characters in Unicode category Nd, having digit values 0 through 9. Within the format token, these digits are interchangeable: a three-digit number may thus be indicated equivalently by 000, 001, or 999.
  If the primary format token contains at least one Unicode digit, then the primary format token is taken as a decimal digit pattern, and in this case it must match the regular expression ^((\p{Nd}|#|[^\p{N}\p{L}])+?)$. If it contains a digit but does not match this pattern, a dynamic error is raised [err:FODF1310].
- If the radix (call it R) is present (including the case where an explicit radix of 10 is used), then the character used as the mandatory-digit-sign is either "x" or "X". If any mandatory-digit-sign is upper-case "X", then all mandatory-digit-signs must be upper-case "X". The digit family used in the output comprises the first R characters of the alphabet 0123456789abcdefghijklmnopqrstuvwxyz, but using upper-case letters in place of lower-case if an upper-case "X" is used as the mandatory-digit-sign.
  In this case the primary format token must match the regular expression ^(([Xx#]|[^\p{N}\p{L}])+?)$
- a grouping-separator-sign is a non-alphanumeric character, that is a character whose Unicode category is other than Nd, Nl, No, Lu, Ll, Lt, Lm or Lo.
Note:
If a semicolon is to be used as a grouping separator, then the primary format token as a whole must be followed by another semicolon, to ensure that the grouping separator is not mistaken as a separator between the primary format token and the format modifier.
There must be at least one mandatory-digit-sign. There may be zero or more optional-digit-signs, and (if present) these must precede all mandatory-digit-signs. There may be zero or more grouping-separator-signs. A grouping-separator-signmust not appear at the start or end of the digit-pattern, nor adjacent to another grouping-separator-sign.
The corresponding output is a number in the specified radix, using this digit family, with at least as many digits as there are mandatory-digit-signs in the format token. Thus:
- A format token 1 generates the sequence 0 1 2 ... 10 11 12 ...
- A format token 01 (or equivalently, 00 or 99) generates the sequence 00 01 02 ... 09 10 11 12 ... 99 100 101
- A format token of U+0661 (ARABIC-INDIC DIGIT ONE, ١) generates the sequence ١ then ٢ then ٣ ...
- A format token of 16^xx generates the sequence 00 01 02 03 ... 08 09 0a 0b 0c 0d 0e 0f 10 11 ...
- A format token of 16^X generates the sequence 0 1 2 3 ... 8 9 A B C D E F 10 11 ...
The grouping-separator-signs are handled as follows:
1. The position of grouping separators within the format token, counting backwards from the last digit, indicates the position of grouping separators to appear within the formatted number, and the character used as the grouping-separator-sign within the format token indicates the character to be used as the corresponding grouping separator in the formatted number.
2. More specifically, the position of a grouping separator is the number of optional-digit-signs and mandatory-digit-signs appearing between the grouping separator and the right-hand end of the primary format token.
3. Grouping separators are defined to be regular if the following conditions apply:
  1. There is at least one grouping separator.
  2. Every grouping separator is the same character (call it C).
  3. There is a positive integer G (the grouping size) such that:
    1. The position of every grouping separator is an integer multiple of G, and
    2. Every positive integer multiple of G that is less than the number of optional-digit-signs and mandatory-digit-signs in the primary format token is the position of a grouping separator.
4. The grouping separator template is a (possibly infinite) set of (position, character) pairs.
5. If grouping separators are regular, then the grouping separator template contains one pair of the form (n×G, C) for every positive integer n where G is the grouping size and C is the grouping character.
6. Otherwise (when grouping separators are not regular), the grouping separator template contains one pair of the form (P, C) for every grouping separator found in the primary formatting token, where C is the grouping separator character and P is its position.
7. Note:
  If there are no grouping separators, then the grouping separator template is the empty set.
The number is formatted as follows:
1. Let S₁ be the result of formatting the supplied number in the appropriate radix: for radix 10 this will be the value obtained by casting it to xs:string.
2. Let S₂ be the result of padding S₁ on the left with as many leading zeroes as are needed to ensure that it contains at least as many digits as the number of mandatory-digit-signs in the primary format token.
3. Let S₃ be the result of replacing all decimal digits (0-9) in S₂ with the corresponding digits from the selected digit family. (This has no effect when the selected digit family uses ASCII digits (0-9), which will always be the case if a radix is specified.)
4. Let S₄ be the result of inserting grouping separators into S₃: for every (position P, character C) pair in the grouping separator template where P is less than the number of digits in S₃, insert character C into S₃ at position P, counting from the right-hand end.
5. Let S₅ be the result of converting S₄ into ordinal form, if an ordinal modifier is present, as described below.
6. The result of the function is then S₅.
The format token A, which generates the sequence A B C ... Z AA AB AC....
The format token a, which generates the sequence a b c ... z aa ab ac....
The format token i, which generates the sequence i ii iii iv v vi vii viii ix x ....
The format token I, which generates the sequence I II III IV V VI VII VIII IX X ....
The format token w, which generates numbers written as lower-case words, for example in English, one two three four ...
The format token W, which generates numbers written as upper-case words, for example in English, ONE TWO THREE FOUR ...
The format token Ww, which generates numbers written as title-case words, for example in English, One Two Three Four ...
Any other format token, which indicates a numbering sequence in which that token represents the number 1 (one) (but see the note below). It is implementation-defined which numbering sequences, additional to those listed above, are supported. If an implementation does not support a numbering sequence represented by the given token, it must use a format token of 1.
Note:
In some traditional numbering sequences additional signs are added to denote that the letters should be interpreted as numbers, for example, in ancient Greek U+0374 (DEXIA KERAIA, ʹ) and sometimes U+0375 (ARISTERI KERAIA, ͵) . These should not be included in the format token.

For all format tokens other than a digit-pattern, there may be implementation-defined lower and upper bounds on the range of numbers that can be formatted using this format token; indeed, for some numbering sequences there may be intrinsic limits. For example, the format token U+2460 (CIRCLED DIGIT ONE, ①) has a range imposed by the Unicode character repertoire — zero to 20 in Unicode versions prior to 3.2, or zero to 50 in subsequent versions. For the numbering sequences described above any upper bound imposed by the implementation must not be less than 1000 (one thousand) and any lower bound must not be greater than 1. Numbers that fall outside this range must be formatted using the format token 1.

The above expansions of numbering sequences for format tokens such as a and i are indicative but not prescriptive. There are various conventions in use for how alphabetic sequences continue when the alphabet is exhausted, and differing conventions for how roman numerals are written (for example, IV versus IIII as the representation of the number 4). Sometimes alphabetic sequences are used that omit letters such as i and o. This specification does not prescribe the detail of any sequence other than those sequences consisting entirely of decimal digits.

Many numbering sequences are language-sensitive. This applies especially to the sequence selected by the tokens w, W, and Ww. It also applies to other sequences, for example different languages using the Cyrillic alphabet use different sequences of characters, each starting with the letter U+0410 (CYRILLIC CAPITAL LETTER A, А) . In such cases, the $language argument specifies which language conventions are to be used. If the argument is specified, the value should be either the empty sequence or a value that would be valid for the xml:lang attribute (see [Extensible Markup Language (XML) 1.0 (Fifth Edition)]). Note that this permits the identification of sublanguages based on country codes (from ISO 3166-1) as well as identification of dialects and regions within a country.

The set of languages for which numbering is supported is implementation-defined. If the $language argument is absent, or is set to the empty sequence, or is invalid, or is not a language supported by the implementation, then the number is formatted using the default language from the dynamic context.

The format modifier must be a string that matches the regular expression ^([co]($.+$)?)?[at]?$. That is, if it is present it must consist of one or more of the following, in order:

either c or o, optionally followed by a sequence of characters enclosed between parentheses, to indicate cardinal or ordinal numbering respectively, the default being cardinal numbering
either a or t, to indicate alphabetic or traditional numbering respectively, the default being implementation-defined.

If the o modifier is present, this indicates a request to output ordinal numbers rather than cardinal numbers. For example, in English, when used with the format token 1, this outputs the sequence 1st 2nd 3rd 4th ..., and when used with the format token w outputs the sequence first second third fourth ....

The string of characters between the parentheses, if present, is used to select between other possible variations of cardinal or ordinal numbering sequences. The interpretation of this string is implementation-defined. No error occurs if the implementation does not define any interpretation for the defined string.

It is implementation-defined what combinations of values of the format token, the language, and the cardinal/ordinal modifier are supported. If ordinal numbering is not supported for the combination of the format token, the language, and the string appearing in parentheses, the request is ignored and cardinal numbers are generated instead.

The use of the a or t modifier disambiguates between numbering sequences that use letters. In many languages there are two commonly used numbering sequences that use letters. One numbering sequence assigns numeric values to letters in alphabetic sequence, and the other assigns numeric values to each letter in some other manner traditional in that language. In English, these would correspond to the numbering sequences specified by the format tokens a and i. In some languages, the first member of each sequence is the same, and so the format token alone would be ambiguous. In the absence of the a or t modifier, the default is implementation-defined.

Error Conditions

A dynamic error is raised [err:FODF1310] if the format token is invalid, that is, if it violates any mandatory rules (indicated by an emphasized must or required keyword in the above rules). For example, the error is raised if the primary format token contains a digit but does not match the required regular expression.

Notes

Note the careful distinction between conditions that are errors and conditions where fallback occurs. The principle is that an error in the syntax of the format picture will be reported by all processors, while a construct that is recognized by some implementations but not others will never result in an error, but will instead cause a fallback representation of the integer to be used.
The following notes apply when a digit-pattern is used:
1. If grouping-separator-signs appear at regular intervals within the format token, then the sequence is extrapolated to the left, so grouping separators will be used in the formatted number at every multiple of N. For example, if the format token is 0'000 then the number one million will be formatted as 1'000'000, while the number fifteen will be formatted as 0'015.
2. The only purpose of optional-digit-signs is to mark the position of grouping-separator-signs. For example, if the format token is #'##0 then the number one million will be formatted as 1'000'000, while the number fifteen will be formatted as 15. A grouping separator is included in the formatted number only if there is a digit to its left, which will only be the case if either (a) the number is large enough to require that digit, or (b) the number of mandatory-digit-signs in the format token requires insignificant leading zeros to be present.
3. Grouping separators are not designed for effects such as formatting a US telephone number as (365)123-9876. In general they are not suitable for such purposes because (a) only single characters are allowed, and (b) they cannot appear at the beginning or end of the number.
4. Numbers will never be truncated. Given the digit-pattern01, the number three hundred will be output as 300, despite the absence of any optional-digit-sign.
The following notes apply when ordinal numbering is selected using the o modifier.
In some languages, the form of numbers (especially ordinal numbers) varies depending on the grammatical context: they may have different genders and may decline with the noun that they qualify. In such cases the string appearing in parentheses after the letter c or o may be used to indicate the variation of the cardinal or ordinal number required.
The way in which the variation is indicated will depend on the conventions of the language.
For inflected languages that vary the ending of the word, the approach recommended in the previous version of this specification was to indicate the required ending, preceded by a hyphen: for example in German, appropriate values might be o(-e), o(-er), o(-es), o(-en).
Another approach, which might usefully be adopted by an implementation based on the open-source ICU localization library [ICU], or any other library making use of the Unicode Common Locale Data Repository [Unicode CLDR], is to allow the value in parentheses to be the name of a registered numbering rule set for the language in question, conventionally prefixed with a percent sign: for example, o(%spellout-ordinal-masculine), or c(%spellout-cardinal-year).
The following notes apply when the primary format token is neither a digit-pattern nor one of the seven other defined format tokens (A, a, i, I, w, W, Ww), but is an arbitrary token representing the number 1:
Unexpected results may occur for traditional numbering. For example, in an implementation that supports traditional numbering system in Greek, the example format-integer(19, "α;t") might return δπιιιι or ιθ, depending upon whether the ancient acrophonic or late antique alphabetic system is supported.
Unexpected results may also occur for alphabetic numbering. For example, in an implementation that supports alphabetic numbering system in Greek, someone writing format-integer(19, "α;a") might expect the nineteenth Greek letter, U+03C4 (GREEK SMALL LETTER TAU, τ) , but the implementation might return the eighteenth one, U+03C3 (GREEK SMALL LETTER SIGMA, σ) , because the latter is the nineteenth item in the sequence of lowercase Greek letters in Unicode (the sequence is interrupted because of the final form of the sigma, U+03C2 (GREEK SMALL LETTER FINAL SIGMA, ς) ). Because Greek never had a final capital sigma, Unicode has marked U+03A2, the eighteenth codepoint in the sequence of Greek capital letters, as reserved, to ensure that every Greek uppercase letter is always 32 codepoints less than its lowercase counterpart. Therefore, someone writing format-integer(18, "Α;a") might expect the eighteenth Greek capital letter, U+03A3 (GREEK CAPITAL LETTER SIGMA, Σ) , but an implementation might return U+03A2, the eighteenth position in the sequence of Greek capital letters, but unassigned to any character.

Examples

Expression	Result
`format-integer(123, '0000')`	`"0123"`
`format-integer(123, 'w')`	Depending on the default language, the expression might return the string `"one hundred and twenty-three"`
`format-integer(21, '1;o', 'en')`	`"21st"`
`format-integer(14, 'Ww;o(-e)', 'de')`	If supported, might return the string `"Vierzehnte".`.
`1 to 10 ! format-integer(., "1;o(-º)", language:="it")` `(1 to 10) ! format-integer(., "1;o(-º)", language:="it")`	This requests ordinal numbering in Italian: if supported, this should produce the sequence: `1º 2º 3º 4º ...`
`1 to 10 ! format-integer(., "Ww;o", language:="it")` `(1 to 10) ! format-integer(., "Ww;o", language:="it")`	This requests ordinal numbering in Italian, spelled out as words: if supported, this should produce the sequence: `Primo Secondo Terzo Quarto Quinto ...`
`format-integer(7, 'a')`	`"g"`
`format-integer(27, 'a')`	`"aa"`
`format-integer(57, 'I')`	`"LVII"`
`format-integer(1234, '#;##0;')`	`"1;234"`
`format-integer(1234, '16^xxxx')`	`"04d2"`
`format-integer(1234, '16^X')`	`"4D2"`
`format-integer(12345678, '16^xxxx_xxxx')`	`"00bc_614e"`
`format-integer(12345678, '16^#_xxxx')`	`"bc_614e"`
`format-integer(255, '2^xxxx xxxx')`	`"1111 1111"`
`format-integer(1023, '32^XXXX')`	`"00VV"`
`format-integer(1023, '10^XXXX')`	`"1023"`
`format-integer(1023, '10^00')`	`"10^23"`
`format-integer(-5, '001')`	`"-005"`
`format-integer(-5, 'a')`	`"-e"`
`format-integer(-12, '16^XX')`	`"-0C"`

4.7 Formatting numbers

This section defines a function for formatting decimal and floating point numbers.

Function	Meaning
`fn:format-number`	Returns a string containing a number formatted according to a given picture string and decimal format.

Note:

This function can be used to format any numeric quantity, including an integer. For integers, however, the fn:format-integer function offers additional possibilities. Note also that the picture strings used by the two functions are not 100% compatible, though they share some options in common.

4.7.1 Defining a decimal format

Decimal formats are defined in the static context, and the way they are defined is therefore outside the scope of this specification. XSLT and XQuery both provide custom syntax for creating a decimal format.

The static context provides a set of decimal formats. One of the decimal formats is unnamed, the others (if any) are identified by a QName. There is always an unnamed decimal format available, but its contents are implementation-defined.

Each decimal format provides a set of named properties.

Note:

A phrase such as "The minus-sign^XP31 character" is to be read as “the character assigned to the minus-sign^XP31 property in the relevant decimal format”.

[Definition: [Definition: [Definition] The decimal digit family of a decimal format is the sequence of ten digits with consecutive Unicode codepoints starting with the character that is the value of the zero-digit^XP31 property.]

[Definition: [Definition: [Definition] The optional digit character is the character that is the value of the digit^XP31 property.]

For any decimal format, the properties representing characters used in a picture string must have distinct values. These properties are decimal-separator^XP31 , grouping-separator^XP31, exponent-separator^XP31, percent^XP31, per-mille^XP31, digit^XP31, and pattern-separator^XP31. Furthermore, none of these properties may be equal to any character in the decimal digit family.

4.7.3 Syntax of the picture string

Note:

This differs from the format-number function previously defined in XSLT 2.0 in that any digit can be used in the picture string to represent a mandatory digit: for example the picture strings "000", "001", and "999" are equivalent. The digits will all be from the same decimal digit family, specifically, the sequence of ten consecutive digits starting with the digit assigned to the zero-digit property. This change is to align format-number (which previously used "000") with format-dateTime (which used 001).

[Definition: [Definition: [Definition] The formatting of a number is controlled by a picture string. The picture string is a sequence of characters, in which the characters assigned to the properties decimal-separator^XP31 , exponent-separator^XP31, grouping-separator^XP31, digit^XP31, and pattern-separator^XP31 and the members of the decimal digit family, are classified as active characters, and all other characters (including the values of the properties percent^XP31 and per-mille^XP31) are classified as passive characters.]

A dynamic error is raised [err:FODF1310] if the picture string does not conform to the following rules. Note that in these rules the words "preceded" and "followed" refer to characters anywhere in the string; they are not to be read as "immediately preceded" and "immediately followed".

A picture-string consists either of a sub-picture, or of two sub-pictures separated by the pattern-separator^XP31 character. A picture-string must not contain more than one instance of the pattern-separator^XP31 character. If the picture-string contains two sub-pictures, the first is used for positive and unsigned zero values and the second for negative values.
A sub-picture must not contain more than one instance of the decimal-separator^XP31 character.
A sub-picture must not contain more than one instance of the percent^XP31 or per-mille^XP31 characters, and it must not contain one of each.
The mantissa part of a sub-picture (defined below) must contain at least one character that is either an optional digit character or a member of the decimal digit family.
A sub-picture must not contain a passive character that is preceded by an active character and that is followed by another active character.
A sub-picture must not contain a grouping-separator^XP31 character that appears adjacent to a decimal-separator^XP31 character, or in the absence of a decimal-separator^XP31 character, at the end of the integer part.
A sub-picture must not contain two adjacent instances of the grouping-separator^XP31 character.
The integer part of a sub-picture (defined below) must not contain a member of the decimal digit family that is followed by an instance of the optional digit character. The fractional part of a sub-picture (defined below) must not contain an instance of the optional digit character that is followed by a member of the decimal digit family.
A character that matches the exponent-separator^XP31 property is treated as an exponent-separator-sign if it is both preceded and followed within the sub-picture by an active character. Otherwise, it is treated as a passive character. A sub-picture must not contain more than one character that is treated as an exponent-separator-sign.
A sub-picture that contains a percent^XP31 or per-mille^XP31 character must not contain a character treated as an exponent-separator-sign.
If a sub-picture contains a character treated as an exponent-separator-sign then this must be followed by one or more characters that are members of the decimal digit family, and it must not be followed by any active character that is not a member of the decimal digit family.

The mantissa part of the sub-picture is defined as the part that appears to the left of the exponent-separator-sign if there is one, or the entire sub-picture otherwise. The exponent part of the subpicture is defined as the part that appears to the right of the exponent-separator-sign; if there is no exponent-separator-sign then the exponent part is absent.

The integer part of the sub-picture is defined as the part that appears to the left of the decimal-separator^XP31 character if there is one, or the entire mantissa part otherwise.

The fractional part of the sub-picture is defined as that part of the mantissa part that appears to the right of the decimal-separator^XP31 character if there is one, or the part that appears to the right of the rightmost active character otherwise. The fractional part may be zero-length.

5 Processing strings

This section specifies functions and operators on the [XML Schema Part 2: Datatypes Second Edition]xs:string datatype and the datatypes derived from it.

5.3 Comparison of strings

Function	Meaning
`fn:codepoint-equal`	Returns `true` if two strings are equal, considered codepoint-by-codepoint.
`fn:collation`	Constructs a collation URI with requested properties.
`fn:collation-available`	Asks whether a collation URI is recognized by the implementation, and whether it has required properties.
`fn:collation-key`	Given a string value and a collation, generates an internal value called a collation key, with the property that the matching and ordering of collation keys reflects the matching and ordering of strings under the specified collation.
`fn:contains-token`	Determines whether or not any of the supplied strings, when tokenized at whitespace boundaries, contains the supplied token, under the rules of the supplied collation.

5.3.1 Collations

[Definition: [Definition: [Definition] A collation is an algorithm that determines, for any two given strings S₁ and S₂, whether S₁ is less than, equal to, or greater than S₂. In this specification, a collation is identified by an absolute URI.]

The [Character Model for the World Wide Web 1.0: Fundamentals] observes that different applications may require different comparison and ordering behaviors. Similarly, different users with different linguistic expectations may require different behaviors. Consequently, the collation must be taken into account when comparing strings.

Collations can indicate that two different codepoints are to be considered equal for comparison purposes (for example, “v” and “w” are considered equivalent in some Swedish collations). Strings can be compared codepoint-by-codepoint or in a linguistically appropriate manner.

Note:

Some sources, for example [UTS #10] use the term collation to refer more generically to a set of sorting rules that can be further parameterized or “tailored”. In this specification the term is always used for a specific algorithm in which all such parameters have defined values.

This specification defines some collation URIs that provide interoperable sorting behavior across applications. Other collation URIs are defined only partially (leaving some aspects implementation-defined). Implementations may define further collation URIs, or may allow users or third parties to define them.

The Unicode codepoint collation is available in every implementation. This collation sorts based on codepoint values. For further details see 5.3.3 The Unicode Codepoint Collation.

Collations may or may not perform Unicode normalization on strings before comparing them.

This specification allows a collation name to be provided as an argument to many string functions. Although collations are defined to be URIs, they are supplied as instances of xs:string.

The XQuery/XPath static context supplies a default collation for use when the collation argument is not specified. (see 2.1.1 Static Context ^XP31). If the default collation is not specified by the user or the system, the default collation is the Unicode codepoint collation.

If the collation is specified using a relative URI reference, it is resolved relative to an implementation-defined base URI.

Note:

Previous versions of this specification stated that it must be resolved against the Static Base URI^XP, but this is not always operationally convenient. It is recommended that processors should provide a means of setting the base URI for resolving collation URIs independently of the Static Base URI^XP, though for backwards compatibility, the Static Base URI^XP or Executable Base URI^XP should be used as a default.

This specification does not define whether or not the collation URI is dereferenced. The collation URI may be an abstract identifier, or it may refer to an actual resource describing the collation. If it refers to a resource, this specification does not define the nature of that resource. One possible candidate is that the resource is a locale description expressed using the Locale Data Markup Language: see [UTS #35].

Note:

The ability to access external resources depends on whether the calling code is trusted^XP.

Note:

XML allows elements to specify the xml:lang attribute to indicate the language associated with the content of such an element. This specification does not use xml:lang to identify the default collation because using xml:lang does not produce desired effects when the two strings to be compared have different xml:lang values or when a string is multilingual.

5.3.3 The Unicode Codepoint Collation

[Definition: [Definition: [Definition] The collation URI http://www.w3.org/2005/xpath-functions/collation/codepoint identifies a collation which must be recognized by every implementation: it is referred to as the Unicode codepoint collation (not to be confused with the Unicode collation algorithm).]

The Unicode codepoint collation does not perform any normalization on the supplied strings.

The collation is defined as follows. Each of the two strings is converted to a sequence of integers using the fn:string-to-codepoints function. These two sequences $A and $B are then compared as follows:

If both sequences are empty, the strings are equal.
If one sequence is empty and the other is not, then the string corresponding to the empty sequence is less than the other string.
If the first integer in $A is less than the first integer in $B, then the string corresponding to $A is less than the string corresponding to $B.
If the first integer in $A is greater than the first integer in $B, then the string corresponding to $A is greater than the string corresponding to $B.
Otherwise (the first pair of integers are equal), the result is obtained by applying the same rules recursively to fn:tail($A) and fn:tail($B)

Note:

While the Unicode codepoint collation does not produce results suitable for quality publishing of printed indexes or directories, it is adequate for many purposes where a restricted alphabet is used, such as sorting of vehicle registrations.

Note:

The Unicode codepoint collation differs from the default sort order used in programming languages that sort strings based on UTF-16 code units, which may include surrogate pairs.

5.5 Functions based on substring matching

The functions described in this section examine a string $arg1 to see whether it contains another string $arg2 as a substring. The result depends on whether $arg2 is a substring of $arg1, and if so, on the range of characters in $arg1 which $arg2 matches.

When the Unicode codepoint collation is used, this simply involves determining whether $arg1 contains a contiguous sequence of characters whose codepoints are the same, one for one, with the codepoints of the characters in $arg2.

When a collation is specified, the rules are more complex.

All collations support the capability of deciding whether two strings are considered equal, and if not, which of the strings should be regarded as preceding the other. For functions such as fn:compare, this is all that is required. For other functions, such as fn:contains, the collation needs to support an additional property: it must be able to decompose the string into a sequence of collation units, each unit consisting of one or more characters, such that two strings can be compared by pairwise comparison of these units.

[Definition: [Definition: [Definition] The term collation unit as used in this specification is equivalent to the term collation element used in [UTS #10].]

The string Q is then considered to contain P as a substring if the sequence of collation units corresponding to P is a subsequence of the sequence of collation units corresponding to Q. The characters in P that match are the characters corresponding to these collation units.

This rule may occasionally lead to surprises. For example, consider a collation that treats "Jaeger" and "Jäger" as equal. It might do this by treating "ä" as representing two collation units, in which case the expression fn:contains("Jäger", "eg") will return true. Alternatively, a collation might treat "ae" as a single collation unit, in which case the expression fn:contains("Jaeger", "eg") will return false. The results of these functions thus depend strongly on the properties of the collation that is used.

In addition, collations may specify that some collation units should be ignored during matching. If hyphen is an ignored collation unit, then fn:contains("code-point", "codepoint") will be true, and fn:contains("codepoint", "-") will also be true.

In the rules for the functions defined in this section, we use the following terms taken from [UTS #10]:

[Definition: [Definition: [Definition] The term match is used in the sense of definition DS2 from [UTS #10].]
[Definition: [Definition: [Definition] The term minimal match is used in the sense of definition DS4 from [UTS #10].]

In the definitions in [UTS #10], these rules involve a number of parameters. In the context of the functions defined in this section, these parameters are interpreted as follows:

C is the collation; that is, the value of the $collation argument if specified, otherwise the default collation.
P is the (candidate) substring, the value of the $substring argument to the function.
Q is the (candidate) containing string, the value of the $value argument to the function.
The boundary condition B is satisfied at the start and end of a string, and between any two characters that belong to different collation units (“collation elements” in the language of [UTS #10]). It is not satisfied between two characters that belong to the same collation unit.

It is possible to define collations that do not have the ability to decompose a string into units suitable for substring matching. An argument to a function defined in this section may be a URI that identifies a collation that is able to compare two strings, but that does not have the capability to split the string into collation units. Such a collation may cause the function to fail, or to give unexpected results, or it may be rejected as an unsuitable argument. The ability to decompose strings into collation units is an implementation-defined property of the collation. The fn:collation-available function can be used to ask whether a particular collation has this property.

Function	Meaning
`fn:contains`	Returns `true` if the string `$value` contains `$substring` as a substring, taking collations into account.
`fn:starts-with`	Returns `true` if the string `$value` contains `$substring` as a leading substring, taking collations into account.
`fn:ends-with`	Returns `true` if the string `$value` contains `$substring` as a trailing substring, taking collations into account.
`fn:substring-before`	Returns the part of `$value` that precedes the first occurrence of `$substring`, taking collations into account.
`fn:substring-after`	Returns the part of `$value` that follows the first occurrence of `$substring`, taking collations into account.

6 Regular expressions

The functions described in this section make use of a regular expression syntax for pattern matching. The syntax and semantics of regular expressions are defined in this section.

6.1 Regular expression syntax

Changes in 4.0 (next | previous)

Regular expressions can include comments (starting and ending with #) if the c flag is set. [Issue 999 PR 1022 20 February 2024]
Word boundaries can be matched. Lookahead and lookbehind assertions are supported. Assertions (including ^ and $) can no longer be followed by a quantifier. [Issues 998 1006 PR 1856]

The regular expression syntax used by these functions is defined in terms of the regular expression syntax specified in XSD 1.1 (see [XSD 1.1 Part 2]), which in turn is based on the established conventions of languages such as Perl. However, because XML Schema uses regular expressions only for validity checking, it omits some facilities that are widely used with other languages. XPath, therefore, extends the XML Schema regular expression syntax to reinstate some of these capabilities.

Note:

Implementers should consult [UTS #18] for information on using regular expression processing on Unicode characters.

The regular expression syntax and semantics are identical to those defined in [XSD 1.1 Part 2] with the additions described in the following subsections.

Note:

In [XSD 1.1 Part 2] there are no substantive technical changes to the syntax or semantics of regular expressions relative to [XML Schema Part 2: Datatypes Second Edition], but a number of errors and ambiguities have been resolved. For example, the rules for the interpretation of hyphens within square brackets in a regular expression have been clarified; and the semantics of regular expressions are no longer tied to a specific version of Unicode.

XSD 1.1 is therefore used as the specification baseline, even for processors that only support XSD 1.0.

6.1.1 Processing model for regular expressions

As well as extending the XSD 1.1 syntax for regular expressions, this specification also extends the processing model.

In XSD, a regular expression is defined to denote a set of strings, and the only functionality offered is to test whether a string matches a regular expression: that is, whether it is a member of the set of strings denoted by the regular expression.

In this specification, matching a string S against a regular expression delivers a more complex outcome.

First some terminology:

[Definition: [Definition: [Definition] A string of length N has N+1character positions: one immediately before each character in the string, and one after the last character. In interfaces where character positions are exposed, they are numbered from 1 to N+1.]
[Definition: [Definition: [Definition] A segment of a string S is a sequence of zero or more contiguous characters starting at a given character position within S.] Segments of a string are uniquely identified by their start position and length. The sequence of characters making up a segment is referred to as the string value of the segment.
[Definition: [Definition: [Definition] The end position of a segment is the start position of the segment plus its length.]

The operation of matching a string S against a regular expression delivers:

A set of matching segments. The string S as a whole is said to match the regular expression if the set of matching segments is non-empty.
For each matching segmentM, a collection of captured groups. This is a mapping from positive integers to segments. The integer is called the group number, and corresponds to the ordinal sequence of opening parentheses of capturing subexpressions within the regular expression, as explained below. The corresponding segment is always a segment of S, but in the case of capturing expressions within lookahead assertions, it is not necessarily a segment of M.

The semantics of particular constructs in a regular expression are affected by a set of flags. The available flags and their effect are defined in 6.2 Flags.

The different functions available, such as fn:replace and fn:tokenize, are defined in terms of this outcome. For example:

The function fn:matches returns true if the set of matching segments is non-empty.
The function fn:replace replaces matching segments of the input string with a replacement string.
The function fn:tokenize returns the segments of the input string that appear between the matching segments.

In principle the set of segments that match a regular expression can be determined by enumerating all the segments of the input string and examining each one independently to establish whether it matches. In practice, however:

If several matching segments have the same starting position, then only one of them is returned. This is chosen as follows:
- In the case of a choice (operator "|") the first matching branch is chosen.
- In the case of a repetition with a greedy quantifier (for example "+" or "*") the longest matching segment is chosen.
- In the case of a repetition with a reluctant quantifier (for example "+?" or "*?") the shortest matching segment is chosen.
A matching segment is not included in the result if it overlaps an earlier matching segment: specifically, a segment with start position S₁ is excluded if there is a segment that has start position S₀ and length L₀, where S₀ < S₁ < S₀+L₀.

Note:

Two segments can be adjacent: that is, the start position of one segment can be equal to the end position of the previous segment. This is true even when the second segment is zero-length (the two segments are not considered to be overlapping, even though they have the same end position). This means, for example, that the regular expression a*(?=x) has two non-overlapping matches against the string aaax, one at position 1 and the other at position 4.

[Definition: [Definition: [Definition] The disjoint matching segments obtained by applying a regular expression R to a string S in the presence of a set of flags F are the segments of S that match R (using flags F), after elimination of overlapping segments.]

The semantics of a regular expression are thus defined by stating which segments of an input string it matches, and what the captured groups corresponding to this match are. This is defined recursively for each construct that may appear within a regular expression, in terms of the outcome of applying its subexpressions.

For constructs defined in XSD 1.1 (branch, piece, NormalChar, charClass), XSD defines a set of strings denoted by the construct. The corresponding semantics for this specification are that the segments matched by such a construct are the segments whose string value is contained in this set.

For constructs added to the XSD 1.1 baseline by this specification, the semantics are defined in the sections that follow.

6.1.5 Captured groups

The regular expression syntax defined by [XML Schema Part 2: Datatypes Second Edition] allows a regular expression to contain parenthesized subexpressions, but attaches no special significance to them. Some operations associated with regular expressions (for example, back-references, and the fn:replace function) allow access to the parts of the input string that matched a parenthesized subexpression (called captured groups).

[Definition: [Definition: [Definition] A left parenthesis is recognized as a capturing left parenthesis provided it is not immediately followed by ? or * (see below), is not within a character group (square brackets), and is not escaped with a backslash. The sub-expression enclosed by a capturing left parenthesis and its matching right parenthesis is referred to as a capturing subexpression.]

More specifically, the capturing subexpression enclosed by the Nth capturing left parenthesis within the regular expression (determined by its character position in left-to-right order, and counting from one) is referred to as the Nth capturing subexpression.

For example, in the regular expression A(BC(?:D(EF(GH[()])))), the subexpression BC(?:D(EF(GH[()]))) is capturing subexpression 1, the string subexpression EF(GH[()]) is capturing subexpression 2, and the subexpression GH[()] is capturing subexpression 3.

When, in the course of evaluating a regular expression, a particular segment of the input matches a capturing subexpression, that segment becomes available as a captured group. The segment matched by the Nth capturing subexpression is referred to as the Nth captured group. By convention, the segment captured by the entire regular expression is treated as captured group 0 (zero).

When a capturing subexpression is matched more than once (because it is within a construct that allows repetition), then only the last substring that it matched will be captured. Note that this rule is not sufficient in all cases to ensure an unambiguous result, especially in cases where (a) the regular expression contains nested repeating constructs, and/or (b) the repeating construct matches a zero-length string. In such cases it is implementation-dependent which substring is captured. For example given the regular expression (a*)+ and the input string "aaaa", an implementation might legitimately capture either "aaaa" or a zero length string as the content of the captured subgroup.

Parentheses that are required to group terms within the regular expression, but which are not required for capturing of substrings, can be represented using the syntax (?:xxxx).

In the absence of back-references (see below), the presence of the optional ?: has no effect on the set of strings that match the regular expression, but causes the left parenthesis not to be counted by operations (such as fn:replace and back-references) that number the capturing sub-expressions within a regular expression.

9 Processing dates and times

This section defines operations on the [XML Schema Part 2: Datatypes Second Edition] date and time types.

See [Working With Timezones] for a disquisition on working with date and time values with and without timezones.

9.1 Date and time types

[Definition: [Definition: [Definition] The eight primitive types xs:dateTime, xs:date, xs:time, xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth, xs:gDay are referred to collectively as the Gregorian types.]

This section describes operations on atomic items of these types.

Values of these types are modeled as comprising one or more of the seven components year, month, day, hour, minute, second, and timezone.

The only operations defined on xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth, and xs:gDay values are equality comparison and component extraction. For other types, further operations are provided, including order comparisons, arithmetic, formatted display, and timezone adjustment.

9.6 Extracting components of dates and times

The date and time datatypes may be considered to be composite datatypes in that they contain distinct properties or components. The extraction functions specified below extract a single component from a date or time value. In all cases the local value (that is, the original value as written, without any timezone adjustment) is used.

Note:

A time written as 24:00:00 is treated as 00:00:00 on the following day.

Function	Meaning
`fn:year-from-dateTime`	Returns the year component of a Gregorian value.
`fn:month-from-dateTime`	Returns the month component of a Gregorian value.
`fn:day-from-dateTime`	Returns the day component of a Gregorian value.
`fn:hours-from-dateTime`	Returns the hours component of a Gregorian value.
`fn:minutes-from-dateTime`	Returns the minute component of a Gregorian value.
`fn:seconds-from-dateTime`	Returns the seconds component of a Gregorian value.
`fn:timezone-from-dateTime`	Returns the timezone component of a Gregorian value.
`fn:year-from-date`	Returns the year component of an `xs:date`.
`fn:month-from-date`	Returns the month component of an `xs:date`.
`fn:day-from-date`	Returns the day component of an `xs:date`.
`fn:timezone-from-date`	Returns the timezone component of an `xs:date`.
`fn:hours-from-time`	Returns the hours component of an `xs:time`.
`fn:minutes-from-time`	Returns the minutes component of an `xs:time`.
`fn:seconds-from-time`	Returns the seconds component of an `xs:time`.
`fn:timezone-from-time`	Returns the timezone component of an `xs:time`.
`fn:parts-of-dateTime`	Returns all the components of a Gregorian value.

9.6.16 fn:parts-of-dateTime

Changes in 4.0 (next | previous)

New in 4.0

Summary

Returns all the components of a Gregorian value.

Signature

`fn:parts-of-dateTime`(
`$value`	`as` `(xs:dateTime \| xs:date \| xs:time \| xs:gYear \| xs:gYearMonth \| xs:gMonth \| xs:gMonthDay \| xs:gDay)?`
) `as` `dateTime-record?`

Properties

This function is deterministic, context-independent, and focus-independent.

Rules

If $value is the empty sequence, the function returns the empty sequence.

Otherwise, the function returns a record whose fields are as follows. All entries will be present, even when the value is an empty sequence.

Key	Value
`year`	The value of `fn:year-from-dateTime($value)`
`month`	The value of `fn:month-from-dateTime($value)`
`day`	The value of `fn:day-from-dateTime($value)`
`hours`	The value of `fn:hours-from-dateTime($value)`
`minutes`	The value of `fn:minutes-from-dateTime($value)`
`seconds`	The value of `fn:seconds-from-dateTime($value)`
`timezone`	The value of `fn:timezone-from-dateTime($value)`

Examples

Expression:	parts-of-dateTime( xs:dateTime("1999-05-31T13:20:00-05:00") )
Result:	{ "year": 1999, "month": 5, "day": 31, "hours": 13, "minutes": 20, "seconds": 0, "timezone": xs:dayTimeDuration("-PT5H") }
Expression:	parts-of-dateTime( xs:time("13:30:04.2678") )
Result:	{ "year": (), "month": (), "day": (), "hours": 13, "minutes": 30, "seconds": 4.2678, "timezone": () }
Expression:	parts-of-dateTime( xs:gYearMonth("2007-05Z") )
Result:	{ "year": 2007, "month": 5, "day": (), "hours": (), "minutes": (), "seconds": (), "timezone": xs:dayTimeDuration("-PT0S") } { "year": 2007, "month": 5, "day": (), "hours": (), "minutes": (), "seconds": (), "timezone": xs:dayTimeDuration("PT0S") }

9.9 Formatting dates and times

Function	Meaning
`fn:format-dateTime`	Returns a string containing an `xs:dateTime` value formatted for display.
`fn:format-date`	Returns a string containing an `xs:date` value formatted for display.
`fn:format-time`	Returns a string containing an `xs:time` value formatted for display.

Three functions are provided to represent dates and times as a string, using the conventions of a selected calendar, language, and country. The functions are presented in their customary fashion, except for the rules and examples, which are described en bloc at 9.9.4 The date/time formatting functions and 9.9.5 Examples of date and time formatting.

9.9.4 The date/time formatting functions

The fn:format-dateTime, fn:format-date, and fn:format-time functions format $value as a string using the picture string specified by the $picture argument, the calendar specified by the $calendar argument, the language specified by the $language argument, and the country or other place name specified by the $place argument. The result of the function is the formatted string representation of the supplied xs:dateTime, xs:date, or xs:time value.

[Definition: [Definition: [Definition] The three functions fn:format-dateTime, fn:format-date, and fn:format-time are referred to collectively as the date formatting functions.]

If $value is the empty sequence, the function returns the empty sequence.

Calling the two-argument form of each of the three functions is equivalent to calling the five-argument form with each of the last three arguments set to the empty sequence.

For details of the $language, $calendar, and $place arguments, see 9.9.4.8 The language, calendar, and place arguments.

In general, the use of an invalid $picture, $language, $calendar, or $place argument results in a dynamic error [err:FOFD1340]. By contrast, use of an option in any of these arguments that is valid but not supported by the implementation is not an error, and in these cases the implementation is required to output the value in a fallback representation. More detailed rules are given below.

10 Processing QNames and NOTATIONS

10.1 Functions to create a QName

In XPath 4.0, statically-known QNames can be expressed using a QName literal such as #xml:space. Where the QName is not known statically, the xs:QName constructor function can be used.

In addition to the xs:QName constructor function, QName values can be constructed by combining a namespace URI, prefix, and local name, or by resolving a lexical QName against the in-scope namespaces of an element node. This section defines functions that perform these operations. Leading and trailing whitespace, if present, is stripped from string arguments before the result is constructed.

Function	Meaning
`fn:QName`	Returns an `xs:QName` value formed using a supplied namespace URI and lexical QName.
`fn:parse-QName`	Returns an `xs:QName` value formed by parsing an EQName.
`fn:resolve-QName`	Returns an `xs:QName` value (that is, an expanded-QName) by taking an `xs:string` that has the lexical form of an `xs:QName` (a string in the form `"prefix:local-name"` or `"local-name"`) and resolving it using the in-scope namespaces for a given element.

10.1.2 fn:parse-QName

Changes in 4.0 (next | previous)

New in 4.0 [Issue 1 PR 207 15 November 2022]

Summary

Returns an xs:QName value formed by parsing an EQName.

Signature

`fn:parse-QName`(
`$value`	`as` `xs:string?`
) `as` `xs:QName?`

Properties

This function is deterministic, context-dependent, and focus-independent. It depends on namespaces.

Rules

If $value is the empty sequence, the result is the empty sequence.

Otherwise, leading and trailing whitespace in $value is stripped: call the result V

If V is castable to xs:NCName, the result is fn:QName("", $value): that is, a QName in no namespace.

If V is in the lexical space of xs:QName (that is, if it is in the form prefix:local), the result is xs:QName($value). Note that this result depends on the in-scope prefixes in the static context, and may result in various error conditions.

If V takes the form of a XPath URIQualifiedName^XP (that is, Q{uri}local, where the uri part may be zero-length, or Q{uri}prefix:local), then the result is fn:QName(uri, local) or fn:QName(uri, prefix:local) respectively.

The rules used for parsing a BracedURILiteral^XP within a URIQualifiedName^XP are the XPath rules, not the XQuery rules (the XQuery rules require special characters such as < and & to be escaped).

Error Conditions

A dynamic error is raised [err:FOCA0002] if the supplied value of $value, after whitespace normalization, does not match the XPath production EQName^XP, or if the input is a URIQualifiedName in which the namespace prefix is present but the namespace URI is absent.

A dynamic error is raised [err:FONS0004] if the supplied value of $value, after whitespace normalization, is in the form prefix:local (with a non-absent prefix), and the prefix cannot be resolved to a namespace URI using the in-scope namespace bindings from the static context.

Examples

Expression:	`fn:parse-QName("Q{http://www.example.com/ns}person")`
Result:	`fn:QName("http://www.example.com/ns", "person")`
Expression:	`fn:parse-QName("person")`
Result:	`fn:QName("", "person")`
Expression:	`fn:parse-QName("Q{}person")`
Result:	`fn:QName("", "person")`
Expression:	`fn:parse-QName("Q{http://www.example.com/ns}xmp:person")`
Result:	`fn:QName("http://www.example.com/ns", "xmp:person")`
Expression:	declare namespace xmp = "http://www.example.com/ns"; fn:parse-QName("xmp:person")
Result:	fn:QName("http://www.example.com/ns", "xmp:person") #Q{http://www.example.com/ns}xmp:person

10.2 Functions and operators on QNames

This section specifies functions on QNames as defined in [XML Schema Part 2: Datatypes Second Edition].

Function	Meaning
`fn:prefix-from-QName`	Returns the prefix component of the supplied QName.
`fn:local-name-from-QName`	Returns the local part of the supplied QName.
`fn:namespace-uri-from-QName`	Returns the namespace URI part of the supplied QName.
`fn:expanded-QName`	Returns a string representation of an `xs:QName` in the format `Q{uri}local`.

10.2.2 fn:local-name-from-QName

Summary

Returns the local part of the supplied QName.

Signature

`fn:local-name-from-QName`(
`$value`	`as` `xs:QName?`
) `as` `xs:NCName?`

Properties

This function is deterministic, context-independent, and focus-independent.

Rules

If $value is the empty sequence the function returns the empty sequence.

Otherwise, the function returns an xs:NCName representing the local part of $value.

Examples

Expression:	local-name-from-QName( QName("http://www.example.com/example", "person") ) local-name-from-QName(#Q{http://www.example.com/ns}person)
Result:	"person"

10.2.3 fn:namespace-uri-from-QName

Summary

Returns the namespace URI part of the supplied QName.

Signature

`fn:namespace-uri-from-QName`(
`$value`	`as` `xs:QName?`
) `as` `xs:anyURI?`

Properties

This function is deterministic, context-independent, and focus-independent.

Rules

If $value is the empty sequence the function returns the empty sequence.

Otherwise, the function returns an xs:anyURI representing the namespace URI part of $value.

If $value is in no namespace, the function returns the zero-length xs:anyURI.

Examples

Expression:	namespace-uri-from-QName( QName("http://www.example.com/example", "person") ) namespace-uri-from-QName(#Q{http://www.example.com/ns}person)
Result:	xs:anyURI("http://www.example.com/example") xs:anyURI("http://www.example.com/ns")

Expression:

namespace-uri-from-QName(
  QName("http://www.example.com/example", "person")
)

namespace-uri-from-QName(#Q{http://www.example.com/ns}person)

Result:

xs:anyURI("http://www.example.com/example")

xs:anyURI("http://www.example.com/ns")

12 Processing nodes

12.2 Other properties of nodes

This section specifies further functions that return properties of nodes. Nodes are formally defined in 6 Nodes ^DM31.

Function	Meaning
`fn:has-children`	Returns `true` if the supplied GNode has one or more child nodes (of any kind).
`fn:in-scope-namespaces`	Returns the in-scope namespaces of an element node, as a map.
`fn:in-scope-prefixes`	Returns the prefixes of the in-scope namespaces for an element node.
`fn:lang`	This function tests whether the language of `$node`, or the context value if the second argument is omitted, as specified by `xml:lang` attributes is the same as, or is a sublanguage of, the language specified by `$language`.
`fn:local-name`	Returns the local part of the name of `$node` as an `xs:string` that is either the zero-length string, or has the lexical form of an `xs:NCName`.
`fn:name`	Returns the name of a node, as an `xs:string` that is either the zero-length string, or has the lexical form of an `xs:QName`.
`fn:namespace-uri`	Returns the namespace URI part of the name of `$node`, as an `xs:anyURI` value.
`fn:namespace-uri-for-prefix`	Returns the namespace URI of one of the in-scope namespaces for `$element`, identified by its namespace prefix.
`fn:path`	Returns a path expression that can be used to select the supplied node relative to the root of its containing document.
`fn:root`	Returns the root of the tree to which `$node` belongs. The function can be applied both to XNodes^DM and to JNodes^DM.
`fn:siblings`	Returns the supplied GNode together with its siblings, in document order.

12.2.1 fn:has-children

Changes in 4.0 (next | previous)

Generalized to work with JNodes as well as XNodes. [Issue 2100 PR 2149 12 August 2025]

Summary

Returns true if the supplied GNode has one or more child nodes (of any kind).

Signature

`fn:has-children`(
`$node`	`as` `gnode()?`	`:=` `.`
) `as` `xs:boolean`

Properties

The zero-argument form of this function is deterministic, context-dependent, and focus-dependent.

The one-argument form of this function is deterministic, context-independent, and focus-independent.

Rules

If the argument is omitted, it defaults to the context value (.).

Provided that the supplied argument $node matches the expected type gnode()?, the result of the function call fn:has-children($node) is defined to be the same as the result of the expression fn:exists($node/child::gnode()).

Error Conditions

The following errors may be raised when $node is omitted:

If the context value is absent^DM, type error [err:XPDY0002]^XP
If the context value is not an instance of the sequence type gnode()?, type error [err:XPTY0004]^XP.

Notes

If $node is the empty sequence the result is false.

The motivation for this function is to support streamed evaluation. According to the streaming rules in [XSL Transformations (XSLT) Version 4.0], the following construct is not streamable:

<xsl:if test="exists(row)">
  <ulist>
    <xsl:for-each select="row">
      <item><xsl:value-of select="."/></item>
    </xsl:for-each>
  </ulist>
</xsl:if>

This is because it makes two downward selections to read the child row elements. The use of fn:has-children in the xsl:if conditional is intended to circumvent this restriction.

Although the function was introduced to support streaming use cases, it has general utility as a convenience function.

If the supplied argument is a map or an array, it will automatically be coerced to a JNode.

Examples

Variables
let $e := <doc> <p id="alpha">One</p> <p/> <p>Three</p> <?pi 3.14159?> </doc>

Expression	Result
`has-children($e)`	`true()`
`has-children($e//p[1])`	`true()`
`has-children($e//p[2])`	`false()`
`has-children($e//p[3])`	`true()`
`has-children($e//processing-instruction())`	`false()`
`has-children($e//p[1]/text())`	`false()`
`has-children($e//p[1]/@id)`	`false()`
`[1,2,3] => has-children()` `jtree([1,2,3]) => has-children()`	`true()`
`[] => has-children()` `jtree([]) => has-children()`	`false()`

13 Processing function items

The functions included in this section operate on function items, that is, values referring to a function.

[Definition: [Definition: [Definition] Functions that accept functions among their arguments, or that return functions in their result, are described in this specification as higher-order functions.]

Note:

Some functions such as fn:parse-json allow the option of supplying a callback function for example to define exception behavior. Where this is not essential to the use of the function, the function has not been classified as higher-order for this purpose; in applications where function items cannot be created, these particular options will not be available.

Function	Meaning
`fn:function-lookup`	Returns a function item having a given name and arity, if there is one.
`fn:function-name`	Returns the name of the function identified by a function item.
`fn:function-arity`	Returns the arity of the function identified by a function item.
`fn:function-identity`	Returns a string representing the identity of a function item.
`fn:function-annotations`	Returns the annotations of the function item.

13.1 fn:function-lookup

Summary

Returns a function item having a given name and arity, if there is one.

Signature

`fn:function-lookup`(
`$name`	`as` `xs:QName`,
`$arity`	`as` `xs:integer`
) `as` `fn(*)?`

Properties

This function is deterministic, context-dependent, and focus-dependent.

Rules

A call to fn:function-lookup starts by looking for a function definition^XP in the named functions component of the dynamic context (specifically, the dynamic context of the call to fn:function-lookup), using the expanded QName supplied as $name and the arity supplied as $arity. There can be at most one such function definition.

If no function definition can be identified (by name and arity), then the empty sequence is returned.

If a function definition is identified, then a function item is obtained from the function definition using the same rules as for evaluation of a named function reference (see [XML Path Language (XPath) 4.0] section 4.6.5 Named Function References). The captured context of the returned function item (if it is context dependent) is the static and dynamic context of the call on fn:function-lookup.

If the arguments to fn:function-lookup identify a function that is present in the static context of the function call, the function will always return the same function that a static reference to this function would bind to. If there is no such function in the static context, then the results depend on what is present in the dynamic context, which is implementation-defined.

Error Conditions

An error is raised if the identified function depends on components of the static or dynamic context that are not present, or that have unsuitable values. For example [err:XPDY0002]^XP is raised for the call function-lookup( #fn:name, 0 ) if the context value is absent, and [err:FODC0001] is raised for the call function-lookup( #fn:id, 1 ) if the context value is not a single node in a tree that is rooted at a document node. The error that is raised is the same as the error that would be raised by the corresponding function if called with the same static and dynamic context.

Notes

This function can be useful where there is a need to make a dynamic decision on which of several statically known functions to call. It can thus be used as a substitute for polymorphism, in the case where the application has been designed so several functions implement the same interface.

The function can also be useful in cases where a query or stylesheet module is written to work with alternative versions of a library module. In such cases the author of the main module might wish to test whether an imported library module contains or does not contain a particular function, and to call a function in that module only if it is available in the version that was imported. A static call would cause a static error if the function is not available, whereas getting the function using fn:function-lookup allows the caller to take fallback action in this situation.

If the function that is retrieved by fn:function-lookup is context-dependent, that is, if it has dependencies on the static or dynamic context of its caller, the context that applies is the static and/or dynamic context of the call to the fn:function-lookup function itself. The context thus effectively forms part of the closure of the returned function. This mainly applies when the target of fn:function-lookup is a built-in function, because user-defined functions typically have no dependency on the static or dynamic context of the function call (an exception arises when the expressions used to define default values for parameters are context-dependent). The rule applies recursively, since fn:function-lookup is itself a context-dependent built-in function.

However, the static and dynamic context of the call to fn:function-lookup may play a role even when the selected function definition is not itself context dependent, if the expressions used to establish default parameter values are context dependent.

User-defined XSLT or XQuery functions should be accessible to fn:function-lookup only if they are statically visible at the location where the call to fn:function-lookup appears. This means that private functions, if they are not statically visible in the containing module, should not be accessible using fn:function-lookup.

The function identity is determined in the same way as for a named function reference. Specifically, if there is no context dependency, two calls on fn:function-lookup with the same name and arity must return the same function.

These specifications do not define any circumstances in which the dynamic context will contain functions that are not present in the static context, but neither do they rule this out. For example an API may provide the ability to add functions to the dynamic context, and such functions may potentially be context-dependent.

The mere fact that a function exists and has a name does not of itself mean that the function is present in the dynamic context. For example, functions obtained through use of the fn:load-xquery-module function are not added to the dynamic context.

Examples

Expression:	`function-lookup( #fn:substring, 2 )( 'abcd', 2 )`
Result:	`'bcd'`
Expression:	(fn:function-lookup( #xs:dateTimeStamp, 1 ), xs:dateTime#1)[1] ('2011-11-11T11:11:11Z')
Result:	An `xs:dateTime` value set to the specified date, time, and timezone; if the implementation supports XSD 1.1 then the result will be an instance of the derived type `xs:dateTimeStamp`. The query is written to ensure that no failure occurs when the implementation does not recognize the type `xs:dateTimeStamp`.
Expression:	let $f := function-lookup( #zip:binary-entry, 2 ) return if (exists($f)) then $f($source, $entry) else () declare namespace zip = "http://expath.org/ns/zip"; let $f := function-lookup( #zip:binary-entry, 2 ) return if (exists($f)) then $f("file:///temp.zip", "index.xml") else ()
Result:	The result of calling zip:binary-entry($source, $entry) if the function is available, or the empty sequence otherwise. The result of calling `zip:binary-entry("file:///temp.zip", "index.xml")` if the function is available, or the empty sequence otherwise.

13.2 fn:function-name

Summary

Returns the name of the function identified by a function item.

Signature

`fn:function-name`(
`$function`	`as` `fn(*)`
) `as` `xs:QName?`

Properties

This function is deterministic, context-independent, and focus-independent.

Rules

If $function refers to a named function, fn:function-name($func) returns the name of that function.

Otherwise ($function refers to an anonymous function), fn:function-name($function) returns the empty sequence.

The prefix part of the returned QName is implementation-dependent.

Examples

Expression:	`function-name(substring#2)`
Result:	QName("http://www.w3.org/2005/xpath-functions", "fn:substring") #Q{http://www.w3.org/2005/xpath-functions}substring (The namespace prefix of the returned QName is not predictable.)
Expression:	`function-name(fn($node) { count($node/*) })`
Result:	`()`

14 Processing maps

Maps were introduced as a new datatype in XDM 3.1. This section describes functions that operate on maps.

A map is a kind of item.

[Definition: [Definition: [Definition] A map consists of a sequence of entries, also known as key-value pairs. Each entry comprises a key which is an arbitrary atomic item, and an arbitrary sequence called the associated value.]

[Definition: [Definition: [Definition] Within a map, no two entries have the same key. Two atomic items K1 and K2 are the same key for this purpose if the function call fn:atomic-equal($K1, $K2) returns true.]

It is not necessary that all the keys in a map should be of the same type (for example, they can include a mixture of integers and strings).

Maps are immutable, and have no identity separate from their content. For example, the map:remove function returns a map that differs from the supplied map by the omission (typically) of one entry, but the supplied map is not changed by the operation. Two calls on map:remove with the same arguments return maps that are indistinguishable from each other; there is no way of asking whether these are “the same map”.

A map can also be viewed as a function from keys to associated values. To achieve this, a map is also a function item. The function corresponding to the map has the signature function($key as xs:anyAtomicValue) as item()*. Calling the function has the same effect as calling the map:get function: the expression $map($key) returns the same result as get($map, $key). For example, if $books-by-isbn is a map whose keys are ISBNs and whose assocated values are book elements, then the expression $books-by-isbn("0470192747") returns the book element with the given ISBN. The fact that a map is a function item allows it to be passed as an argument to higher-order functions that expect a function item as one of their arguments.

14.2 Composing and Decomposing Maps

It is often useful to decompose a map into a sequence of entries, or key-value pairs (in which the key is an atomic item and the value is an arbitrary sequence). Subsequently it may be necessary to reconstruct a map from these components, typically after modification.

There are two conventional ways of representing a map as a sequence of key-value pairs, each with its own advantages and disadvantages. These are described below:

A map can be represented as a sequence of single-entry maps.
[Definition: [Definition: [Definition] A single-entry map is a map containing a single entry.]
It is possible to decompose any map into a sequence of single-entry maps, and to construct a map from a sequence of single-entry maps.
For example the map { "x": 1, "y": 2 } can be decomposed to the sequence ({ "x": 1 }, { "y": 2 }).
A map can be represented as a sequence of JNodes.
A JNode holds the map key in its ·selector· property and the corresponding value in its ·content· property.

The following table summarizes the way in which these two representations can be used to compose and decompose maps:

Operation	Single-Entry Maps	JNodes
Decompose a map	`map:entries($map)`	`$map/child::*`
Compose a map	`map:merge($entries)`	`map:build($jnodes, jnode-selector#1, jnode-content#1)`
Create a single entry	`map:entry($key, $value)`	`{$key : $value}/child::*`
Extract the key part of a single entry	`map:keys($entry)`	`jnode-selector($jnode)`
Extract the value part of a single entry	`map:items($entry)`	`jnode-content($jnode)`

It is also possible to decompose a map using:

The function map:for-each
The expression for key $k value $v in $map return ....

Example: Reordering the entries in a map

The examples below show several ways of constructing a map with the same entries as an input map, but with the entries sorted by key.

Using map:entries and map:merge:

map:entries($map) => sort-by({'key': map:keys#1}) => map:merge()

Using JNodes:

$map/* => sort-by({'key': jnode-selector#1}) => map:build(jnode-selector#1, jnode-content#1)

Using map:for-each:

map:merge( map:for-each($map, map:entry#2) => sort-by({'key': map:keys#1}) )

Using an XQuery FLWOR expression:

map:merge( for key $k value $v order by $k return {$k : $v} )

14.4 Functions that operate on maps

The functions defined in this section use a conventional namespace prefix map, which is assumed to be bound to the namespace URI http://www.w3.org/2005/xpath-functions/map.

The function call map:get($map, $key) can be used to retrieve the value associated with a given key.

There is no operation to atomize a map or convert it to a string. The function fn:serialize can in some cases be used to produce a JSON representation of a map.

Note that when the required type of an argument to a function such as map:build is a map type, then the coercion rules ensure that a JNode can be supplied in the function call: if the ·content· property of the JNode is a map, then the map is automatically extracted as if by the jnode-content function.

Function	Meaning
`map:build`	Returns a map that typically contains one entry for each item in a supplied input sequence.
`map:contains`	Tests whether a supplied map contains an entry for a given key.
`map:empty`	Returns `true` if the supplied map contains no entries.
`map:entries`	Returns a sequence containing all the key-value pairs present in a map, each represented as a single-entry map.
`map:entry`	Returns a single-entry map that represents a single key-value pair.
`map:filter`	Selects entries from a map, returning a new map.
`map:find`	Searches the supplied input sequence and any contained maps and arrays for a map entry with the supplied key, and returns the corresponding values.
`map:for-each`	Applies a supplied function to every entry in a map, returning the sequence concatenation^XP of the results.
`map:get`	Returns the value associated with a supplied key in a given map.
`map:items`	Returns a sequence containing all the values present in a map, in order.
`map:keys`	Returns a sequence containing all the keys present in a map.
`map:keys-where`	Returns a sequence containing selected keys present in a map.
`map:merge`	Returns a map that combines the entries from a number of existing maps.
`map:put`	Returns a map containing all the contents of the supplied map, but with an additional entry, which replaces any existing entry for the same key.
`map:remove`	Returns a map containing all the entries from a supplied map, except those having a specified key.
`map:size`	Returns the number of entries in the supplied map.

14.4.1 map:build

Changes in 4.0 (next | previous)

New in 4.0 [Issue 151 PR 203 18 October 2022]

Summary

Returns a map that typically contains one entry for each item in a supplied input sequence.

Signature

`map:build`(
`$input`	`as` `item()*`,
`$key`	`as` `(fn($item as item(), $position as xs:integer) as xs:anyAtomicType*)?`	`:=` `fn:identity#1`,
`$value`	`as` `(fn($item as item(), $position as xs:integer) as item()*)?`	`:=` `fn:identity#1`,
`$options`	`as` `map(*)?`	`:=` `{}`
) `as` `map(*)`

Properties

This function is deterministic, context-independent, and focus-independent.

Rules

Informally, the function processes each item in $input in order. It calls the $key function on that item to obtain a sequence of key values, and the $value function to obtain an associated value. Then, for each key value:

If the key is not already present in the target map, the processor adds a new key-value pair to the map, with that key and that value.

If the key is already present, the processor combines the new value for the key with the existing value; the way they are combined is determined by the duplicates option.

By default, when two duplicate entries occur:

A single combined entry will be present in the result.
This entry will contain the sequence concatenation^XP of the supplied values.
The position of the combined entry in the entry order^DM of the result map will correspond to the position of the first of the duplicates.
The key of the combined entry will correspond to the key of one of the duplicates: it is implementation-dependent which one is chosen. (It is possible for two keys to be considered duplicates even if they differ: for example, they may have different type annotations, or they may be xs:dateTime values in different timezones.)

The $options argument can be used to control the way in which duplicate keys are handled. The option parameter conventions apply. The entries that may appear in the $options map are as follows:

`record(`
`duplicates?`	`as` `(enum( "reject", "use-first", "use-last", "use-any", "combine") \| fn(item(), item()) as item()*)?`
`)`

Key	Value	Meaning
`duplicates?`	Determines the policy for handling duplicate keys: specifically, the action to be taken if two entries in the input sequence have key values `K₁` and `K₂` where `K₁` and `K₂` are the same key. Type: `(enum( "reject", "use-first", "use-last", "use-any", "combine") \| fn(item(), item()) as item())?` Default:* `"combine"`
	`"reject"`	Equivalent to supplying a function that raises a dynamic error with error code "FOJS0003". The effect is that duplicate keys result in an error.
	`"use-first"`	Equivalent to supplying the function `fn($a, $b){ $a }`. The effect is that the first of the duplicates is chosen.
	`"use-last"`	Equivalent to supplying the function `fn($a, $b){ $b }`. The effect is that the last of the duplicates is chosen.
	`"use-any"`	Equivalent to supplying the function `fn($a, $b){ one-of($a, $b) }` where `one-of` chooses either `$a` or `$b` in an implementation-dependent way. The effect is that it is implementation-dependent which of the duplicates is chosen.
	`"combine"`	Equivalent to supplying the function `fn($a, $b){ $a, $b }` (or equivalently, the function `op(",")`). The effect is that the result contains the sequence concatenation^XP of the values having the same key, retaining order.
	`function(*)`	A function with signature `fn(item(), item()) as item()*`. The function is called for any entry in the input sequence that has the same key as a previous entry. The first argument is the existing value associated with the key; the second argument is the value associated with the key in the duplicate input entry, and the result is the new value to be associated with the key. The effect is cumulative: for example if there are three values `X`, `Y`, and `Z` associated with the same key, and the supplied function is `F`, then the result is an entry whose value is `X => F(Y) => F(Z)`.

Formal Equivalent

The effect of the function is equivalent to the result of the following XPath expression.

for-each(
  $input, 
  fn($item, $pos) {
    for-each($key($item, $pos), fn($k) {
      map:entry($k, $value($item, $pos))
    }
  )}
)
=> map:merge($options)

Error Conditions

An error is raised [err:FOJS0003] if the value of $options indicates that duplicates are to be rejected, and a duplicate key is encountered.

An error is raised [err:FOJS0005] if the value of $options includes an entry whose key is defined in this specification, and whose value is not a permitted value for that key.

Notes

The default function for both $key and $value is the identity function. Although it is permitted to default both, this serves little purpose: usually at least one of these arguments will be supplied.

Examples

Expression:	`map:build((), string#1)`
Result:	`{}`
Expression:	`map:build(1 to 10, fn { . mod 3 })`
Result:	`{ 0: (3, 6, 9), 1: (1, 4, 7, 10), 2: (2, 5, 8) }` (Returns a map with one entry for each distinct value of `. mod 3`. The function to compute the value is the identity function, and duplicates are combined by sequence concatenation.)
Expression:	map:build( 1 to 5, value := format-integer(?, "w") )
Result:	{ 1: "one", 2: "two", 3: "three", 4: "four", 5: "five" } (Returns a map with five entries. The function to compute the key is an identity function, the function to compute the value invokes `fn:format-integer`.)
Expression:	map:build( ("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"), substring(?, 1, 1) )
Result:	{ "A": ("April", "August"), "D": ("December"), "F": ("February"), "J": ("January", "June", "July"), "M": ("March", "May"), "N": ("November"), "O": ("October"), "S": ("September") }
Expression:	map:build(1 to 5, { 1: ("eins", "one"), 4: ("vier", "four") })
Result:	{ "eins": 1, "one": 1, "vier": 4, "four": 4 }
Expression:	map:build( ("apple", "apricot", "banana", "blueberry", "cherry"), substring(?, 1, 1), string-length#1, { "duplicates": op("+") } )
Result:	{ "a": 12, "b": 15, "c": 6 } (Constructs a map where the key is the first character of an input item, and where the corresponding value is the total string-length of the items starting with that character.)
Expression:	map:build( ('Wang', 'Liu', 'Zhao'), key := fn($name, $pos) { $name }, value := fn($name, $pos) { $pos } )
Result:	{ "Wang": 1, "Liu": 2, "Zhao": 3 } (Returns an inverted index for the input sequence with the string stored as key and the position stored as value.)
Expression:	let $titles := <titles> <title>A Beginner’s Guide to <ix>Java</ix></title> <title>Learning <ix>XML</ix></title> <title>Using <ix>XML</ix> with <ix>Java</ix></title> </titles> return map:build($titles/title, fn($title) { $title/ix })
Result:	{ "Java": ( <title>A Beginner’s Guide to <ix>Java</ix></title>, <title>Using <ix>XML</ix> with <ix>Java</ix></title> ), "XML": ( <title>Learning <ix>XML</ix></title>, <title>Using <ix>XML</ix> with <ix>Java</ix></title> ) }
Expression:	map:build(//employee, fn { @ssn })
Result:	A map whose keys are employee `@ssn` values, and whose corresponding values are the employee nodes
Expression:	map:build(//employee, fn { @location }, fn { 1 }, { "duplicates": op("+") })
Result:	A map whose keys are employee `@location` values, and whose corresponding values represent the number of employees at each distinct location. Any employees that lack an `@location` attribute will be excluded from the result.
Expression:	map:build( //employee, key := fn { @location }, combine := fn($a, $b) { highest(($a, $b), fn { xs:decimal(@salary) }) } ) map:build( //employee, key := fn { @location }, options := {"duplicates" : fn($a, $b) { highest(($a, $b), (), fn { xs:decimal(@salary) }) } } )
Result:	A map whose keys are employee `@location` values, and whose corresponding values contain the employee node for the highest-paid employee at each distinct location
Expression:	map:build(//*, generate-id#1)
Result:	A map allowing efficient access to every element in a document by means of its `fn:generate-id` value.
Expression:	let $tree := parse-json('{ "type": "package", "name": "org", "content": [ { "type": "package", "name": "xml, "content: [ { "type": "package", "name": "sax", "content": [ { "type": "class", "name": "Attributes"}, { "type": "class", "name": "ContentHandler"}, { "type": "class", "name": "XMLReader"} ] }] }] }') return map:build($tree ? descendant::~[record(type, name, content?)], fn{?ancestor-or-self::name => reverse() => string-join(,)}, fn{`{?type} {?name}`}) let $tree := parse-json('{ "type": "package", "name": "org", "content": [ { "type": "package", "name": "xml", "content": [ { "type": "package", "name": "sax", "content": [ { "type": "class", "name": "Attributes"}, { "type": "class", "name": "ContentHandler"}, { "type": "class", "name": "XMLReader"} ] }] }] }') return map:build($tree/descendant-or-self::jnode(*, record(type, name, content?)), fn{./ancestor-or-self::jnode()?name => string-join(".")}, fn{`{?type} {?name}`})
Result:	{ "org.xml.sax.Attributes": "class Attributes", "org.xml.sax.ContentHandler": "class ContentHandler", "org.xml.sax.XMLReader": "class XMLReader" } { "org":"package org", "org.xml":"package xml", "org.xml.sax":"package sax", "org.xml.sax.Attributes": "class Attributes", "org.xml.sax.ContentHandler": "class ContentHandler", "org.xml.sax.XMLReader": "class XMLReader" } (Constructs a map allowing efficient access to values in a recursive JSON structure using hierarchic paths).

15 Processing arrays

Arrays were introduced as a new datatype in XDM 3.1. This section describes functions that operate on arrays.

An array is an additional kind of item. An array of size N is a mapping from the integers (1 to N) to a set of values, called the members of the array, each of which is an arbitrary sequence. Because an array is an item, and therefore a sequence, arrays can be nested.

An array acts as a function from integer positions to associated values, so the function call $array($index) can be used to retrieve the array member at a given position. The function corresponding to the array has the signature function($index as xs:integer) as item()*. The fact that an array is a function item allows it to be passed as an argument to higher-order functions that expect a function item as one of their arguments.

15.2 Functions that operate on arrays

The functions defined in this section use a conventional namespace prefix array, which is assumed to be bound to the namespace URI http://www.w3.org/2005/xpath-functions/array.

As with all other values, arrays are treated as immutable. For example, the array:reverse function returns an array that differs from the supplied array in the order of its members, but the supplied array is not changed by the operation. Two calls on array:reverse with the same argument will return arrays that are indistinguishable from each other; there is no way of asking whether these are “the same array”. Like sequences, arrays have no identity.

All functionality on arrays is defined in terms of two primitives:

The function array:members decomposes an array to a sequence of value records.
The function array:of-members composes an array from a sequence of value records.

A value record here is an item that encapsulates an arbitrary value; the representation chosen for a value record is record(value as item()*), that is, a map containing a single entry whose key is the string "value" and whose value is the encapsulated sequence.

Note that when the required type of an argument to a function such as array:build is an array type, then the coercion rules ensure that a JNode can be supplied in the function call: if the ·content· property of the JNode is an array, then the array is automatically extracted as if by the jnode-content function.

Function	Meaning
`array:append`	Returns an array containing all the members of a supplied array, plus one additional member at the end.
`array:build`	Returns an array obtained by evaluating the supplied function once for each item in the input sequence.
`array:empty`	Returns `true` if the supplied array contains no members.
`array:filter`	Returns an array containing those members of the `$array` for which `$predicate` returns `true`. A return value of `()` is treated as `false`.
`array:flatten`	Replaces any array appearing in a supplied sequence with the members of the array, recursively.
`array:fold-left`	Evaluates the supplied function cumulatively on successive members of the supplied array.
`array:fold-right`	Evaluates the supplied function cumulatively on successive values of the supplied array.
`array:foot`	Returns the last member of an array.
`array:for-each`	Returns an array whose size is the same as `array:size($array)`, in which each member is computed by applying `$action` to the corresponding member of `$array`.
`array:for-each-pair`	Returns an array obtained by evaluating the supplied function once for each pair of members at the same position in the two supplied arrays.
`array:get`	Returns the value at the specified position in the supplied array (counting from 1).
`array:head`	Returns the first member of an array, that is `$array(1)`.
`array:index-of`	Returns a sequence of positive integers giving the positions within the array `$array` of members that are equal to `$target`.
`array:index-where`	Returns the positions in an input array of members that match a supplied predicate.
`array:insert-before`	Returns an array containing all the members of the supplied array, with one additional member at a specified position.
`array:items`	Returns the sequence concatenation of the members of an array.
`array:join`	Concatenates the contents of several arrays into a single array.
`array:members`	Delivers the contents of an array as a sequence of value records.
`array:of-members`	Constructs an array from the contents of a sequence of value records.
`array:put`	Returns an array containing all the members of a supplied array, except for one member which is replaced with a new value.
`array:remove`	Returns an array containing all the members of the supplied array, except for the members at specified positions.
`array:reverse`	Returns an array containing all the members of a supplied array, but in reverse order.
`array:size`	Returns the number of members in the supplied array.
`array:slice`	Returns an array containing selected members of a supplied input array based on their position.
`array:sort`	Sorts a supplied array, based on the value of a sort key supplied as a function.
`array:sort-by`	Sorts a supplied array, based on the value of a number of sort keys supplied as functions.
`array:sort-with`	Sorts a supplied array, according to the order induced by the supplied comparator functions.
`array:split`	Delivers the contents of an array as a sequence of single-member arrays.
`array:subarray`	Returns an array containing all members from a supplied array starting at a supplied position, up to a specified length.
`array:tail`	Returns an array containing all members except the first from a supplied array.
`array:trunk`	Returns an array containing all members except the last from a supplied array.

15.2.26 array:sort-by

Changes in 4.0 (next | previous)

New in 4.0. [Issue 1085 PR 2001 19 May 2025]

Summary

Sorts a supplied array, based on the value of a number of sort keys supplied as functions.

Signature

`array:sort-by`(
`$array`	`as` `array(*)`,
`$keys`	`as` `record(key? as (fn(item()) as xs:anyAtomicType)?, collation? as xs:string?, order? as enum('ascending', 'descending')?)*`
) `as` `array(*)`

Properties

This function is deterministic, context-dependent, and focus-independent. It depends on collations.

Rules

The result of the function is an array that contains the same members as $array, typically in a different order, the order being defined by the supplied sort key definitions.

A sort key definition is a record with three parts:

key: A sort key function, which is applied to each member of the input sequence to determine a sort key value. If no function is supplied, the default is fn:data#1, which atomizes the value of the array member.
collation: A collation, which is used when comparing sort key values that are of type xs:string or xs:untypedAtomic. If no collation is supplied, the default collation from the static context is used.
When comparing values of types other than xs:string or xs:untypedAtomic, the collation is ignored (but an error may be reported if it is invalid). For more information see 5.3.7 Choosing a collation.
order: An order direction, either "ascending" or "descending". The default is "ascending".

The result of the array:sort-by function is obtained as follows:

The result array contains the same members as $array, but generally in a different order.
The sort key definitions are established as described above. The sort key definitions are in major-to-minor order. That is, the position of two values $A and $B in the result sequence is determined first by the relative magnitude of their primary sort key values, which are computed by evaluating the sort key function in the first sort key definition. If those two sort key values are equal, then the position is determined by the relative magnitude of their secondary sort key values, computed by evaluating the sort key function in the second sort key definition, and so on.
When a pair of corresponding sort key values of $A and $B are found to be not equal, then $A precedes $B in the result sequence if both the following conditions are true, or if both conditions are false:
1. The sort key value for $A is less than the sort key value for $B, as defined below.
2. The order direction in the corresponding sort key definition is "ascending".
If all the sort key values for $A and $B are pairwise equal, then $A precedes $B in the result sequence if and only if $A precedes $B in the input sequence.
Note:
That is, the sort is stable.
Each sort key value for a given array member is obtained by applying the sort key function of the corresponding sort key definition to that member. The result of this function is in the general case a sequence of atomic items. Two sort key values $a and $b are compared as follows:
1. Let $C be the collation in the corresponding sort key definition.
2. Let $REL be the result of evaluating op:lexicographic-compare($key($A), $key($B), $C) where op:lexicographic-compare($a, $b, $C) is defined as follows:
```
if (empty($a) and empty($b)) then 0 
else if (empty($a)) then -1
else if (empty($b)) then +1
else let $rel = op:simple-compare(head($a), head($b), $C)
     return if ($rel eq 0)
            then op:lexicographic-compare(tail($a), tail($b), $C)
            else $rel
```
3. Here op:simple-compare($k1, $k2) is defined as follows:
```
if ($k1 instance of (xs:string | xs:anyURI | xs:untypedAtomic)
    and $k2 instance of (xs:string | xs:anyURI | xs:untypedAtomic))
then compare($k1, $k2, $C)
else if ($k1 instance of xs:numeric and $k2 instance of xs:numeric)
then compare($k1, $k2)
else if ($k1 eq $k2) then 0
else if ($k2 lt $k2) then -1
else +1
```
  Note:
  This raises an error if two keys are not comparable, for example if one is a string and the other is a number, or if both belong to a non-ordered type such as xs:QName.
4. If $REL is zero, then the two sort key values are deemed equal; if $REL is -1 then $a is deemed less than $b, and if $REL is +1 then $a is deemed greater than $b

Formal Equivalent

The effect of the function is equivalent to the result of the following XPath expression.

$array
=> array:members()
=> fn:sort-by(
  for $key-spec in ($keys otherwise {})
  return map:put{$key-spec, 'key', fn($member as record(value)) as xs:anyAtomicType* {
    map:get($key-spec, 'key', fn:data#1)(map:get($member, 'value'))
  }
)
=> array:of-members()

$array
=> array:members()
=> fn:sort-by(
  for $key-spec in ($keys otherwise {})
  return map:put($key-spec, 'key', fn($member as record(value)) as xs:anyAtomicType* {
    map:get($key-spec, 'key', fn:data#1)(map:get($member, 'value'))
  }
)
=> array:of-members()

Error Conditions

If the set of computed sort keys contains values that are not comparable using the lt operator then the sort operation will fail with a type error ([err:XPTY0004]^XP).

Notes

The function is a generalization of the array:sort function available in 3.1, which is retained for compatibility. The enhancements allow multiple sort keys to be defined, each potentially with a different collation, and allow sorting in descending order.

If the sort key for an item evaluates to the empty sequence, the effect of the rules is that this item precedes any value for which the key is non-empty. This is equivalent to the effect of the XQuery option empty least. The effect of the option empty greatest can be achieved by adding an extra sort key definition with {'key': fn{empty(K(.)}}: when comparing boolean sort keys, false precedes true.

Examples

Expression:	`array:sort-by([1, 4, 6, 5, 3], {})`
Result:	`[1, 3, 4, 5, 6]`
Expression:	`array:sort-by([1, 4, 4e0, 6, 5, 3], {'order': 'descending'})`
Result:	`[6, 5, 4, 4e0, 3, 1]`
Expression:	`array:sort-by([(1,4,3), 4, (5,6), 2], ({'key': count#1}, {}))`
Result:	`[2, 4, (5,6), (1,4,3)]`

16 Processing JNodes

Changes in 4.0 (next | previous)

Introduced the concept of JNodes. [Issue 2025 PR 2031 11 June 2025]

A JNode^DM is a wrapper around a map or array, or around a value that appears within the content of a map or array. JNodes are described at [XQuery and XPath Data Model (XDM) 4.0] section 8.4 JNodes. Wrapping a map or array in a JNode enables the use of path expressions such as $jnode/descendant::title, as described at [XML Path Language (XPath) 4.0] section 4.7 Path Expressions.

In addition to the functions defined in this section, functions that operate on JNodes include:

fn:distinct-ordered-nodes
fn:generate-id
fn:has-children
fn:innermost
fn:outermost
fn:path
fn:root
fn:siblings
fn:transitive-closure

16.1 Functions on JNodes

16.1.1 fn:jtree

Changes in 4.0 (next | previous)

New in 4.0 [Issue 2025 PR 2031 12 June 2025]

Summary

Delivers a root JNode^DM wrapping a map or array, enabling the use of lookup expression to navigate a JTree^DM rooted at that map or array.

Signature

`fn:jtree`(
`$input`	`as` `(map()\|array())`
) `as` `jnode((map()\|array()))`) `as` `jnode((), (map()\|array()))`) `as` `jnode((), (map()\|array()))`

Properties

This function is nondeterministic, context-independent, and focus-independent.

Rules

The function creates a JNode^DM that wraps the supplied map or array. Specifically, it creates a root JNode whose ·content· property is $input, and whose ·parent·, ·position·, and ·selector· properties are absent.

This has the effect that lookup expressions starting from this JNode retain information for subsequent navigation.

A JNode has unique identity. If two maps or arrays M₁ and M₂ have the same function identity, as determined by the function-identity function, then jtree(M₁) is jtree(M₂)must return true: that is, the same JNode must be delivered for both.

Notes

It is to some extent implementation-defined whether two maps or arrays have the same function identity. Processors should ensure as a minimum that when a variable $m is bound to a map or array, calling jtree($m) more than once (with the same variable reference) will deliver the same JNode each time.

The effect of the coercion rules is technically that if an existing JNode is supplied as $input, the wrapped value will be extracted, and then rewrapped as a JNode: in practice, this can be short-circuited by returning the supplied JNode unchanged.

Although fn:jnode is available as a function for user applications to call explicitly, it is also invoked implicitly by some expressions, notably when a path expression is written in a form such as $map/child::*. Specifically, if the left-hand operand of the / operator is a map or array, then the supplied map or array is implicitly wrapped in a JNode.

The effect of applying fn:jnode to a map or array is that subsequent retrieval operations within the wrapped map or array return results that retain useful information about where the results were found. For example, consider an expression such as json-doc($source)//name. This expression returns a set of JNodes representing all entries in the JTree having the key "name"; each of these JNodes contains not only the value of the relevant "name" entry, but also the key (which in this simple example is always "name" and the containing map. This means, for example, if $result is the result of the expression json-doc($source) // name, then:

$result / .. / ssn locates the map that contained each name, and returns the value of the ssn entry in that map.
$result / ancestor::course returns any course entries in containing maps.
$result / ancestor::* => jnode-selector() returns a sequence of map keys and array index values representing the location of the found entries within the JSON structure.

An alternative way of wrapping a map or array, rather than calling jtree($X), is to use the path expression $X/..

There are two situations where a map or array is implicitly wrapped in a JNode:

When the value of the left-hand operand of the / operator includes a map or array;
When the context value for evaluation of an AxisStep includes a map or array.

Examples

Expression:	`jtree([ "a", "b", "c" ])/[1]/../[last()] => string()`
Result:	`"c"` (The call on `fn:jnode` would happen automatically).
Expression:	`jtree([ "a", "b", "c", "d" ])/* =!> jnode-selector()`
Result:	`1, 2, 3, 4`
Expression:	let $data := { "fr": { "capital": "Paris", "languages": [ "French" ] }, "de": { "capital": "Berlin", "languages": [ "German" ] } } return jtree($data)//languages[. = 'German']/../capital =!> string()
Result:	"Berlin"

16.1.4 fn:jnode-position

Changes in 4.0 (next | previous)

New in 4.0 [Issue 2025 PR 2031 12 June 2025]

Summary

Returns the ·position· property of a JNode.

Signature

`fn:jnode-position`(
`$input`	`as` `jnode()?`	`:=` `.`
) `as` `xs:anyAtomicType?`) `as` `xs:integer?`) `as` `xs:anyAtomicTypexs:integer?`

Properties

This function is deterministic, context-independent, and focus-independent.

Rules

If the argument is omitted, it defaults to the context value (.).

If $input is the empty sequence, the function returns the empty sequence.

If $input is a root JNode (one in which the ·position· property is absent), the function returns the empty sequence.

Otherwise, the function returns the ·position· property of $input. The value of this property will be 1 (one) except in cases where the value of an entry in a map, or a member in an array, is a sequence that contains multiple items including maps and/or arrays; in such cases the position will be the 1-based position of the relevant map or array.

Error Conditions

The following errors may be raised when $node is omitted:

If the context value is absent^DM, type error [err:XPDY0002]^XP.
If the context value is not an instance of the sequence type jnode()?, type error [err:XPTY0004]^XP.

Notes

This function is relevant only when there are maps whose entries are multi-item sequences that include maps and arrays, or arrays whose members include such multi-item sequences. Such structures are uncommon, and never arise from parsing of JSON source text. It is generally best to avoid such structures by using arrays rather than sequences within array and map content; apart from other considerations, this allows the data to be serialized in JSON format.

If an entry within a map, or a member of an array, contains a sequence of items that mixes arrays and maps with other content (for example the array [1, 2, ([1,2], [3,4], 5)), then a lookup using the child axis will only construct JNodes in respect of those items that are non-empty maps or arrays. This may leave gaps in the position numbering sequence, as illustrated in the examples below.

Examples

Expression:	let $input := { "a": [10, 20, 30], "b": ([40, 50, 60], [], 0, [70, 80, (90, 100)]) } return $input / child::b / * ! { "position": jnode-position(), "index": jnode-selector(), "value": jnode-content() }
Result:	{ "position": 1, "index": 1, "value": 40 }, { "position": 1, "index": 2, "value": 50 }, { "position": 1, "index": 3, "value": 60 }, { "position": 4, "index": 1, "value": 70 }, { "position": 4, "index": 2, "value": 80 }, { "position": 4, "index": 3, "value": (90, 100) }
Expression:	let $input := { "a": {"x": 10, "y": 20, "z": 30}, "b": ( {"x": 40, "y": 50, "z": 60}, {}, {"x": 70, "y": 80, "z": (90, 100)}) } return $input / child::b / * ! { "position": jnode-position(), "key": jnode-selector(), "value": jnode-content() }
Result:	{ "position": 1, "key": "x", "value": 40 }, { "position": 1, "key": "y", "value": 50 }, { "position": 1, "key": "z", "value": 60 }, { "position": 3, "key": "x", "value": 70 }, { "position": 3, "key": "y", "value": 80 }, { "position": 3, "key": "z", "value": (90, 100) }

17 External resources and data formats

These functions in this section access resources external to a query or stylesheet, and convert between external file formats and their XPath and XQuery data model representation.

17.5 Functions on CSV Data

Changes in 4.0 (next | previous)

New functions are available for processing input data in CSV (comma separated values) format. [Issue 413 PRs 533 719 834]

This section describes functions that parse CSV data.

[Definition: [Definition: [Definition] The term comma separated values or CSV refers to a wide variety of plain-text tabular data formats with fields and records separated by standard character delimiters (often, but not invariably, commas).]

A CSV is a 2-dimensional tabular data structure consisting of multiple rows (also known as records). Each row contains multiple fields. Fields occupying the same position in successive rows constitute a column. Columns are identified by position and optionally by name. Column names can be assigned within a CSV using an optional header row.

CSV has developed informally for decades, and many variations are found. This specification refers to [RFC 4180], which provides a standardized grammar. This specification extends the grammar defined in [RFC 4180] as follows:

This specification uses the term row where RFC 4180 uses record.
Line endings are normalized: specifically, the character sequences U+000D (CARRIAGE RETURN) , or U+000D (CARRIAGE RETURN) followed by U+000A (NEWLINE) , are converted to a single U+000A (NEWLINE) character. This applies whether or not the line ending appears within a quoted string, and whether or not U+000A (NEWLINE) is the chosen row delimiter.
Row delimiters other than newline are recognized.
Field delimiters other than U+002C (COMMA, ,) are recognized.
Quote characters other than U+0022 (QUOTATION MARK, ") are recognized.
Non-ASCII characters are recognized.

This specification defines a mapping from this extended grammar to constructs in the XDM model, and provides illustrative examples of how these constructs can be combined with other language features to process CSV data.

Function	Meaning
`fn:csv-to-arrays`	Parses CSV data supplied as a string, returning the results in the form of a sequence of arrays of strings.
`fn:parse-csv`	Parses CSV data, returning the results in the form of a record containing information about the names in the header, as well as the data itself.
`fn:csv-doc`	Reads an external resource containing CSV, and returns the results as a record containing information about the names in the header, as well as the data itself.
`fn:csv-to-xml`	Parses CSV data supplied as a string, returning the results as an XML document, as described by 17.5.9 Representing CSV data as XML.

The most basic function for parsing CSV is fn:csv-to-arrays which recognizes the delimiters for rows and fields and returns a sequence of arrays each corresponding to one row. The fields within each array are represented as instances of xs:string.

The other two functions recognize column names, and make it easier to address individual fields using these names. The parse-csv function delivers this capability using XDM maps and functions, while csv-to-xml function represents the information using XDM element nodes.

21 Errors and diagnostics

21.2 Diagnostic tracing

21.2.1 fn:trace

Changes in 4.0 (next | previous)

The $label argument can now be set to the empty sequence. Previously if $label was supplied, it could not be empty. [Issue 895 PR 901 16 December 2023]

Summary

Provides an execution trace intended to be used in debugging queries.

Signature

`fn:trace`(
`$input`	`as` `item()*`,
`$label`	`as` `xs:string?`	`:=` `()`
) `as` `item()*`

Properties

This function is deterministic, context-independent, and focus-independent.

Rules

The function returns $input, unchanged.

In addition, the values of $input, typically serialized and converted to an xs:string, and $label (if supplied and non-empty) may be output to an implementation-defined destination.

Any serialization of the implementation’s trace output must not raise an error. This can be achieved (for example) by using a serialization method that can handle arbitrary input, such as the adaptive output method (see 10 Adaptive Output Method ^SER31).

The format of the trace output and its order are implementation-dependent. Therefore, the order in which the output appears is not predictable. This also means that if dynamic errors occur (whether or not they are caught using try/catch), it may be unpredictable whether any output is reported before the error occurs.

Notes

If the trace information is unrelated to a specific value, fn:message can be used instead.

Examples

Expression:	fn:trace($v, 'the value of $v is: ') let $v := 124.84 return fn:trace($v, 'the value of $v is: ')
Result:	Supposing thatThe function $vreturns is anthe `xs:decimal` with the value `124.84`, the function returns the value 124.84, while outputting a message such as "the value of $v is: 124.84" to an implementation-defined destination. The format of the message is also implementation-defined.
Expression:	//book[xs:decimal(@price) gt 100] => trace('books more expensive than €100:')
Result:	The result of the expression is the same as the result of `//book[xs:decimal(@price) gt 100]`, but evaluation has the side-effect of providing diagnostic feedback to the user.

XPath and XQuery Functions and Operators 4.0

W3C Editor's Draft 312 March 2026

Abstract

Status of this Document

Dedication

1 Introduction

1.2 Conformance

1.7 Options

1.9 Terminology

1.9.1 Atomic items

1.9.2 Strings, characters, and codepoints

1.9.3 Namespaces and URIs

1.9.4 Conformance terminology

1.9.5 Properties of functions

2 Processing sequences

2.2 Comparison functions

2.3 Asserting cardinality

2.3.1 fn:exactly-one

2.3.2 fn:one-or-more

2.3.3 fn:zero-or-one

2.5 Basic higher-order functions

2.5.18 fn:sort-by

4 Processing numerics

4.6 Formatting integers

4.6.1 fn:format-integer

4.7 Formatting numbers

4.7.1 Defining a decimal format

4.7.3 Syntax of the picture string

5 Processing strings

5.3 Comparison of strings

5.3.1 Collations

5.3.3 The Unicode Codepoint Collation

5.5 Functions based on substring matching

6 Regular expressions

6.1 Regular expression syntax

6.1.1 Processing model for regular expressions

6.1.5 Captured groups

9 Processing dates and times

9.1 Date and time types

9.6 Extracting components of dates and times

9.6.16 fn:parts-of-dateTime

9.9 Formatting dates and times

9.9.4 The date/time formatting functions

10 Processing QNames and NOTATIONS

10.1 Functions to create a QName

10.1.2 fn:parse-QName

10.2 Functions and operators on QNames

10.2.2 fn:local-name-from-QName

10.2.3 fn:namespace-uri-from-QName

12 Processing nodes

12.2 Other properties of nodes

12.2.1 fn:has-children

13 Processing function items

13.1 fn:function-lookup

13.2 fn:function-name

14 Processing maps

14.2 Composing and Decomposing Maps

14.4 Functions that operate on maps

14.4.1 map:build

15 Processing arrays

15.2 Functions that operate on arrays

15.2.26 array:sort-by

16 Processing JNodes

16.1 Functions on JNodes

16.1.1 fn:jtree

16.1.4 fn:jnode-position

17 External resources and data formats

17.5 Functions on CSV Data

21 Errors and diagnostics

21.2 Diagnostic tracing

21.2.1 fn:trace

G Implementation-defined features (Non-Normative)