This document is also available in these non-normative formats: Specification in XML format and XML function catalog.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
This document defines constructor functions, operators, and functions on the datatypes defined in [XML Schema Part 2: Datatypes Second Edition] and the datatypes defined in [XQuery and XPath Data Model (XDM) 4.0]. It also defines functions and operators on nodes and node sequences as defined in the [XQuery and XPath Data Model (XDM) 4.0]. These functions and operators are defined for use in [XML Path Language (XPath) 4.0] and [XQuery 4.0: An XML Query Language] and [XSL Transformations (XSLT) Version 4.0] and other related XML standards. The signatures and summaries of functions defined in this document are available at: http://www.w3.org/2005/xpath-functions/.
A summary of changes since version 3.1 is provided at H Changes since 3.1.
This section describes the status of this document at the time of its publication. Other documents may supersede this document.
This document is a working draft developed and maintained by a W3C Community Group, the XQuery and XSLT Extensions Community Group unofficially known as QT4CG (where "QT" denotes Query and Transformation). This draft is work in progress and should not be considered either stable or complete. Standard W3C copyright and patent conditions apply.
The community group welcomes comments on the specification. Comments are best submitted as issues on the group's GitHub repository.
As the Community Group moves towards publishing dated, stable drafts, some features that the group thinks may likely be removed or substantially changed are marked “at risk” in their changes section. In this draft:
The community group maintains two extensive test suites, one oriented to XQuery and XPath, the other to XSLT. These can be found at qt4tests and xslt40-test respectively. New tests, or suggestions for correcting existing tests, are welcome. The test suites include extensive metadata describing the conditions for applicability of each test case as well as the expected results. They do not include any test drivers for executing the tests: each implementation is expected to provide its own test driver.
The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).
Changes in 4.0 (next)
If a section of this specification has been updated since version 3.1, an overview of the changes is provided, along with links to navigate to the next or previous change.
Sections with significant changes are marked with a ✭ symbol in the table of contents. New functions are indicated by ✚.
The purpose of this document is to define functions and operators for inclusion in XPath 4.0, XQuery 4.0, and XSLT 4.0. The exact syntax used to call these functions and operators is specified in [XML Path Language (XPath) 4.0], [XQuery 4.0: An XML Query Language] and [XSL Transformations (XSLT) Version 4.0].
This document defines three classes of functions:
General purpose functions, available for direct use in user-written queries, stylesheets, and XPath expressions, whose arguments and results are values defined by the [XQuery and XPath Data Model (XDM) 4.0].
Constructor functions, used for creating instances of a datatype from values of (in general) a different datatype. These functions are also available for general use; they are named after the datatype that they return, and they always take a single argument.
Functions that specify the semantics of operators defined in [XML Path Language (XPath) 4.0] and [XQuery 4.0: An XML Query Language]. These exist for specification purposes only, and are not intended for direct calling from user-written code.
[XML Schema Part 2: Datatypes Second Edition] defines a number of primitive and derived datatypes, collectively known as built-in datatypes. This document defines functions and operations on these datatypes as well as the other types (for example, nodes and sequences of nodes) defined in 2.7 Schema Information DM31 of the [XQuery and XPath Data Model (XDM) 4.0]. These functions and operations are available for use in [XML Path Language (XPath) 4.0], [XQuery 4.0: An XML Query Language] and any other host language that chooses to reference them. In particular, they may be referenced in future versions of XSLT and related XML standards.
[XSD 1.1 Part 2] adds to the datatypes defined in [XML Schema Part 2: Datatypes Second Edition]. It introduces a new derived type xs:dateTimeStamp, and it incorporates as built-in types the two types xs:yearMonthDuration and xs:dayTimeDuration which were previously XDM additions to the type system. In addition, XSD 1.1 clarifies and updates many aspects of the definitions of the existing datatypes: for example, it extends the value space of xs:double to allow both positive and negative zero, and extends the lexical space to allow +INF; it modifies the value space of xs:Name to permit additional Unicode characters; it allows year zero and disallows leap seconds in xs:dateTime values; and it allows any character string to appear as the value of an xs:anyURI item. Implementations of this specification may support either XSD 1.0 or XSD 1.1 or both.
In some cases, this specification references XSD for the semantics of operations such as the effect of matching using regular expressions, or conversion of atomic items to strings. In most such cases there is no intended technical difference between the XSD 1.0 and XSD 1.1 specifications, but the 1.1 version often provides clearer explanations and sometimes also corrects technical errors. In such cases this specification often chooses to reference the XSD 1.1 specification. This should not be taken as implying that it is necessary to invoke an XSD 1.1 processor.
References to specific sections of some of the above documents are indicated by cross-document links in this document. Each such link consists of a pointer to a specific section followed a superscript specifying the linked document. The superscripts have the following meanings: XQ [XQuery 4.0: An XML Query Language], XT [XSL Transformations (XSLT) Version 4.0], XP [XML Path Language (XPath) 4.0], and DM [XQuery and XPath Data Model (XDM) 4.0].
This recommendation contains a set of function specifications. It defines conformance at the level of individual functions. An implementation of a function conforms to a function specification in this recommendation if all the following conditions are satisfied:
For all combinations of valid inputs to the function (both explicit arguments and implicit context dependencies), the result of the function meets the mandatory requirements of this specification.
For all invalid inputs to the function, the implementation raises (in some way appropriate to the calling environment) a dynamic error.
For a sequence of calls within the same execution scope, the requirements of this recommendation regarding the determinism of results are satisfied (see 1.9.5 Properties of functions).
Other recommendations (“host languages”) that reference this document may dictate:
Subsets or supersets of this set of functions to be available in particular environments;
Mechanisms for invoking functions, supplying arguments, initializing the static and dynamic context, receiving results, and handling errors;
A concrete realization of concepts such as execution scope;
Which versions of other specifications referenced herein (for example, XML, XSD, or Unicode) are to be used.
Any behavior that is discretionary (implementation-defined or implementation-dependent) in this specification may be constrained by a host language.
Note:
Adding such constraints in a host language, however, is discouraged because it makes it difficult to reuse implementations of the function library across host languages.
This specification allows flexibility in the choice of versions of specifications on which it depends:
It is implementation-defined which version of Unicode is supported, but it is recommended that the most recent version of Unicode be used.
It is implementation-defined whether the type system is based on XML Schema 1.0 or XML Schema 1.1.
It is implementation-defined whether definitions that rely on XML (for example, the set of valid XML characters) should use the definitions in XML 1.0 or XML 1.1.
Note:
The XML Schema 1.1 recommendation introduces one new concrete datatype: xs:dateTimeStamp; it also incorporates the types xs:dayTimeDuration, xs:yearMonthDuration, and xs:anyAtomicType which were previously defined in earlier versions of [XQuery and XPath Data Model (XDM) 4.0]. Furthermore, XSD 1.1 includes the option of supporting revised definitions of types such as xs:NCName based on the rules in XML 1.1 rather than 1.0.
The [XQuery and XPath Data Model (XDM) 4.0] allows flexibility in the repertoire of characters permitted during processing that goes beyond even what version of XML is supported. A processor may allow the user to construct nodes and atomic items that contain characters not allowed by any version of XML. [Definition: [Definition] A permitted character is one within the repertoire accepted by the implementation.]
In this document, text labeled as an example or as a note is provided for explanatory purposes and is not normative.
As a matter of convention, a number of functions defined in this document take a parameter whose value is a map, defining options controlling the detail of how the function is evaluated. Maps are a new datatype introduced in XPath 3.1.
For example, the function fn:xml-to-json has an options parameter allowing specification of whether the output is to be indented. A call might be written:
xml-to-json($input, { 'indent': true() })[Definition: [Definition] Functions that take an options parameter adopt common conventions on how the options are used. These are referred to as the option parameter conventions. These rules apply only to functions that explicitly refer to them.]
Where a function adopts the option parameter conventions, the following rules apply:
The value of the relevant argument must be a map. The entries in the map are referred to as options: the key of the entry is called the option name, and the associated value is the option value. Option names defined in this specification are always strings (single xs:string values). Option values may be of any type.
The type of the options parameter in the function signature is always given as map(*).
Although option names are described above as strings, the actual key may be any value that is the same key as the required string. For example, instances of xs:untypedAtomic or xs:anyURI are equally acceptable.
Note:
This means that the implementation of the function can check for the presence and value of particular options using the functions map:contains and/or map:get.
Implementations may attach an implementation-defined meaning to options in the map that are not described in this specification. These options should use values of type xs:QName as the option names, using an appropriate namespace.
If an option is present whose key is not described in the specification, then a type error [err:XPTY0004]XPmust be raised unless either (a) the key is recognized by the implementation, or (b) the key is a value of type xs:QName with a non-absent namespace.
All entries in the options map are optional, and supplying the empty map has the same effect as omitting the relevant argument in the function call, assuming this is permitted.
The ordering of the options map is immaterial.
For each named option, the function specification defines a required type for the option value. The value that is actually supplied in the map is converted to this required type using the coercion rulesXP. This will result in an error (typically [err:XPTY0004]XP or [err:FORG0001]FO) if conversion of the supplied value to the required type is not possible. A type error also occurs if this conversion delivers a coerced function whose invocation fails with a type error. A dynamic error occurs if the supplied value after conversion is not one of the permitted values for the option in question: the error codes for this error are defined in the specification of each function.
Note:
It is the responsibility of each function implementation to invoke this conversion; it does not happen automatically as a consequence of the function-calling rules.
In cases where the value of an option is itself a map, the specification of the particular function must indicate whether or not these rules apply recursively to the contents of that map.
The terminology used to describe the functions and operators on types defined in [XML Schema Part 2: Datatypes Second Edition] is defined in the body of this specification. The terms defined in this section are used in building those definitions.
Note:
Following in the tradition of [XML Schema Part 2: Datatypes Second Edition], the terms type and datatype are used interchangeably.
The following definitions are adopted from [XQuery and XPath Data Model (XDM) 4.0].
[Definition: [Definition] An atomic item is a pair (T, D) where T (the type annotation) is an atomic type, and D (the datum) is a point in the value space of T.]
[Definition: [Definition] A primitive type is one of the 19 primitive atomic types defined in 3.2 Primitive datatypesXS2 of [XML Schema Part 2: Datatypes Second Edition], or the type xs:untypedAtomic defined in [XQuery and XPath Data Model (XDM) 4.0].]
[Definition: [Definition] The datum of an atomic item is a point in the value space of its type, which is also a point in the value space of the primitive type from which that type is derived.] There are 20 primitive atomic types (19 defined in XSD, plus xs:untypedAtomic), and these have non-overlapping value spaces, so each datum belongs to exactly one primitive atomic type.
[Definition: [Definition] The type annotation of an atomic item is the most specific atomic type that it is an instance of (it is also an instance of every type from which that type is derived).]
Note:
The term value space is defined in [XSD 1.1 Part 2] as a set of values. The term datum is used here in preference to value, because value has a different meaning in this data model.
This document uses the terms string, character, and codepoint with meanings that are normatively defined in [XQuery and XPath Data Model (XDM) 4.0], and which are paraphrased here for ease of reference:
[Definition: [Definition] A character is an instance of the CharXML production of [Extensible Markup Language (XML) 1.0 (Fifth Edition)].]
Note:
This definition excludes Unicode characters in the surrogate blocks as well as U+FFFE and U+FFFF, while including characters with codepoints greater than U+FFFF which some programming languages treat as two characters. The valid characters are defined by their codepoints, and include some whose codepoints have not been assigned by the Unicode consortium to any character.
[Definition: [Definition] A string is a sequence of zero or more characters, or equivalently, a value in the value space of the xs:string datatype.]
[Definition: [Definition] A codepoint is an integer assigned to a character by the Unicode consortium, or reserved for future assignment to a character.]
Note:
The set of codepoints is thus wider than the set of characters.
This specification spells “codepoint” as one word; the Unicode specification spells it as “code point”. Equivalent terms found in other specifications are “character number” or “code position”. See [Character Model for the World Wide Web 1.0: Fundamentals]
Because these terms appear so frequently, they are hyperlinked to the definition only when there is a particular desire to draw the reader’s attention to the definition; the absence of a hyperlink does not mean that the term is being used in some other sense.
It is implementation-defined which version of [The Unicode Standard] is supported, but it is recommended that the most recent version of Unicode be used.
This specification adopts the Unicode notation U+xxxx to refer to a codepoint by its hexadecimal value (always four to six hexadecimal digits). This is followed where appropriate by the official Unicode character name and its graphical representation: for example U+20AC (EURO SIGN, €) .
Unless explicitly stated, the functions in this document do not ensure that any returned xs:string values are normalized in the sense of [Character Model for the World Wide Web 1.0: Fundamentals].
Note:
In functions that involve character counting such as fn:substring, fn:string-length and fn:translate, what is counted is the number of XML characters in the string (or equivalently, the number of Unicode codepoints). Some implementations may represent a codepoint above U+FFFF using two 16-bit values known as a surrogate pair. A surrogate pair counts as one character, not two.
Wherever encoding names (such as UTF-8 and UTF-16) are used in this specification, they are compared without regard to case: the strings "UTF-8" and "utf-8" both refer to the same encoding.
This document uses the phrase “namespace URI” to identify the concept identified in [Namespaces in XML] as “namespace name”, and the phrase “local name” to identify the concept identified in [Namespaces in XML] as “local part”.
It also uses the term “expanded-QName” defined below.
[Definition: [Definition] An expanded-QName is a value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 4.0]): that is, a triple containing namespace prefix (optional), namespace URI (optional), and local name. Two expanded QNames are equal if the namespace URIs are the same (or both absent) and the local names are the same. The prefix plays no part in the comparison, but is used only if the expanded QName needs to be converted back to a string.]
The term URI is used as follows:
[Definition: [Definition] Within this specification, the term URI refers to Universal Resource Identifiers as defined in [RFC 3986] and extended in [RFC 3987] with a new name IRI. The term URI Reference, unless otherwise stated, refers to a string in the lexical space of the xs:anyURI datatype as defined in [XML Schema Part 2: Datatypes Second Edition].]
Note:
This means, in practice, that where this specification requires a “URI Reference”, an IRI as defined in [RFC 3987] will be accepted, provided that other relevant specifications also permit an IRI. The term URI has been retained in preference to IRI to avoid introducing new names for concepts such as “Base URI” that are defined or referenced across the whole family of XML specifications. Note also that the definition of xs:anyURI is a wider definition than the definition in [RFC 3987]; for example it does not require non-ASCII characters to be escaped.
In this specification:
The auxiliary verb must, when rendered in small capitals, indicates a precondition for conformance.
When the sentence relates to an implementation of a function (for example "All implementations must recognize URIs of the form ...") then an implementation is not conformant unless it behaves as stated.
When the sentence relates to the result of a function (for example "The result must have the same type as $arg") then the implementation is not conformant unless it delivers a result as stated.
When the sentence relates to the arguments to a function (for example "The value of $argmust be a valid regular expression") then the implementation is not conformant unless it enforces the condition by raising a dynamic error whenever the condition is not satisfied.
The auxiliary verb may, when rendered in small capitals, indicates optional or discretionary behavior. The statement “An implementation may do X” implies that it is implementation-dependent whether or not it does X.
The auxiliary verb should, when rendered in small capitals, indicates desirable or recommended behavior. The statement “An implementation should do X” implies that it is desirable to do X, but implementations may choose to do otherwise if this is judged appropriate.
[Definition: [Definition] Where behavior is described as implementation-defined, variations between processors are permitted, but a conformant implementation must document the choices it has made.]
[Definition: [Definition] Where behavior is described as implementation-dependent, variations between processors are permitted, and conformant implementations are not required to document the choices they have made.]
Note:
Where this specification states that something is implementation-defined or implementation-dependent, it is open to host languages to place further constraints on the behavior.
This section is concerned with the question of whether two calls on a function, with the same arguments, may produce different results.
In this section the term function, unless otherwise specified, applies equally to function definitionsXP (which can be the target of a static function call) and function itemsDM (which can be the target of a dynamic function call).
[Definition: [Definition] An execution scope is a sequence of calls to the function library during which certain aspects of the state are required to remain invariant. For example, two calls to fn:current-dateTime within the same execution scope will return the same result. The execution scope is defined by the host language that invokes the function library.] In XSLT, for example, any two function calls executed during the same transformation are in the same execution scope (except that static expressions, such as those used in use-when attributes, are in a separate execution scope).
The following definition explains more precisely what it means for two function calls to return the same result:
[Definition: [Definition] Two values $V1 and $V2 are defined to be identical if they contain the same number of items and the items are pairwise identical. Two items are identical if and only if one of the following conditions applies:]
Both items are atomic items, of precisely the same type, and the values are equal as defined using the eq operator, using the Unicode codepoint collation when comparing strings.
Both items are nodes, and represent the same node.
Both items are maps, both maps have the same number of entries, and for every entry E1 in the first map there is an entry E2 in the second map such that the keys of E1 and E2 are the same key, and the corresponding values V1 and V2 are identical.
Both items are arrays, both arrays have the same number of members, and the members are pairwise identical.
Both items are function items, neither item is a map or array, and the two function items have the same function identity. The concept of function identity is explained in [XQuery and XPath Data Model (XDM) 4.0] section 8.1 Function Items.
Some functions produce results that depend not only on their explicit arguments, but also on the static and dynamic context.
[Definition: [Definition] A function definitionXP may have the property of being context-dependent: the result of such a function depends on the values of properties in the static and dynamic evaluation context of the caller as well as on the actual supplied arguments (if any). A function definition may be context-dependent for some arities in its arity range, and context-independent for others: for example fn:name#0 is context-dependent while fn:name#1 is context-independent.]
[Definition: [Definition] A function definitionXP that is not context-dependent is called context-independent.]
The main categories of context-dependent functions are:
Functions that explicitly deliver the value of a component of the static or dynamic context, for example fn:static-base-uri, fn:default-collation, fn:position, or fn:last.
Functions with an optional parameter whose default value is taken from the static or dynamic context of the caller, usually either the context value (for example, fn:node-name) or the default collation (for example, fn:index-of).
Functions that use the static context of the caller to expand or disambiguate the values of supplied arguments: for example fn:doc expands its first argument using the static base URI of the caller, and xs:QName expands its first argument using the in-scope namespaces of the caller.
[Definition: [Definition] A function is focus-dependent if its result depends on the focusXP31 (that is, the context item, position, or size) of the caller.]
[Definition: [Definition] A function that is not focus-dependent is called focus-independent.]
Note:
Some functions depend on aspects of the dynamic context that remain invariant within an execution scope, such as the implicit timezone. Formally this is treated in the same way as any other context dependency, but internally, the implementation may be able to take advantage of the fact that the value is invariant.
Note:
User-defined functions in XQuery and XSLT may depend on the static context of the function definition (for example, the in-scope namespaces) and also in a limited way on the dynamic context (for example, the values of global variables). However, the only way they can depend on the static or dynamic context of the caller — which is what concerns us here — is by defining optional parameters whose default values are context-dependent.
Note:
Because the focus is a specific part of the dynamic context, all focus-dependent functions are also context-dependent. A context-dependent function, however, may be either focus-dependent or focus-independent.
A function definition that is context-dependent can be used as the target of a named function reference, can be partially applied, and can be found using fn:function-lookup. The principle in such cases is that the static context used for the function evaluation is taken from the static context of the named function reference, partial function application, or the call on fn:function-lookup; and the dynamic context for the function evaluation is taken from the dynamic context of the evaluation of the named function reference, partial function application, or the call of fn:function-lookup. These constructs all deliver a function itemDM having a captured context based on the static and dynamic context of the construct that created the function item. This captured context forms part of the closure of the function item.
The result of a dynamic call to a function item never depends on the static or dynamic context of the dynamic function call, only (where relevant) on the captured context held within the function item itself.
The fn:function-lookup function is a special case because it is potentially dependent on everything in the static and dynamic context. This is because the static and dynamic context of the call to fn:function-lookupform the captured context of the function item that fn:function-lookup returns.
[Definition: [Definition] A function that is guaranteed to produce identical results from repeated calls within a single execution scope if the explicit and implicit arguments are identical is referred to as deterministic.]
[Definition: [Definition] A function that is not deterministic is referred to as nondeterministic.]
All functions defined in this specification are deterministic unless otherwise stated. Exceptions include the following:
[Definition: [Definition] Some functions (such as fn:in-scope-prefixes, fn:load-xquery-module, and fn:unordered) produce result sequences or result maps in an implementation-defined or implementation-dependent order. In such cases two calls with the same arguments are not guaranteed to produce the results in the same order. These functions are said to be nondeterministic with respect to ordering.]
Some functions (such as fn:analyze-string, fn:parse-xml, fn:parse-xml-fragment, fn:parse-html, and fn:json-to-xml) construct a tree of nodes to represent their results. There is no guarantee that repeated calls with the same arguments will return the same identical node (in the sense of the is operator). However, if non-identical nodes are returned, their content will be the same in the sense of the fn:deep-equal function. Such a function is said to be nondeterministic with respect to node identity.
Some functions (such as fn:doc and fn:collection) create new nodes by reading external documents. Such functions are guaranteed to be deterministic by default (some such functions have an option "stable":false() that makes them nondeterministic as a user option, and implementations may also provide configuration options to change the default).
Where the results of a function are described as being (to a greater or lesser extent) implementation-defined or implementation-dependent, this does not by itself remove the requirement that the results should be deterministic: that is, that repeated calls with the same explicit and implicit arguments must return identical results.
[Definition: [Definition] The function fn:concat is defined to be variadic: it accepts any number of arguments. No other function has this property.]
A sequence is an ordered collection of zero or more items. An item is a node, an atomic item, or a function, such as a map or an array. The terms sequence and item are defined formally in [XQuery 4.0: An XML Query Language] and [XML Path Language (XPath) 4.0].
The functions in this section perform comparisons between the items in one or more sequences.
Many of these functions require atomic items to be compared for equality.
[Definition: [Definition] Two atomic items A and B are said to be contextually equal if the function call fn:compare(A, B) returns zero when evaluated with a specified or context-determined collation and implicit timezone.] If two values are not contextually equal, they are considered to be contextually unequal, even in the case when comparing them using fn:compare raises an error.
Note:
Except where explicitly stated otherwise, an appeal to contextual equality implies that NaN is treated as equal to NaN.
| Function | Meaning |
|---|---|
fn:atomic-equal | Determines whether two atomic items are equal, under the rules used for comparing keys in a map. |
fn:compare | Returns -1, 0, or 1, depending on whether the first value is less than, equal to, or greater than the second value. |
fn:contains-subsequence | Determines whether one sequence contains another as a contiguous subsequence, using a supplied callback function to compare items. |
fn:deep-equal | This function assesses whether two sequences are deep-equal to each other. To be deep-equal, they must contain items that are pairwise deep-equal; and for two items to be deep-equal, they must either be atomic items that compare equal, or nodes of the same kind, with the same name, whose children are deep-equal, or maps with matching entries, or arrays with matching members. |
fn:distinct-values | Returns the values that appear in a sequence, with duplicates eliminated. |
fn:duplicate-values | Returns the values that appear in a sequence more than once. |
fn:ends-with-subsequence | Determines whether one sequence ends with another, using a supplied callback function to compare items. |
fn:index-of | Returns a sequence of positive integers giving the positions within the sequence $input of items that are contextually equal to $target. |
fn:starts-with-subsequence | Determines whether one sequence starts with another, using a supplied callback function to compare items. |
The following functions assert the cardinality of their sequence arguments.
| Function | Meaning |
|---|---|
fn:exactly-one | Returns $input if it contains exactly one item. Otherwise, raises an error. |
fn:one-or-more | Returns $input if it contains one or more items. Otherwise, raises an error. |
fn:zero-or-one | Returns input if it contains zero or one items. Otherwise, raises an error. |
The functions fn:zero-or-one, fn:one-or-more, and fn:exactly-one defined in this section, check that the cardinality of a sequence is in the expected range. These functions were originally defined for use with processors that enforced strict static typing. For example, the function call fn:remove($seq, fn:index-of($seq2, 'abc')) requires the result of the call on fn:index-of to be a singleton integer, but the static type system could not infer this; writing the expression as fn:remove($seq, fn:exactly-one(fn:index-of($seq2, 'abc'))) would provide a suitable static type at query analysis time, and ensure that the length of the sequence is correct with a dynamic check at query execution time.
The 4.0 specifications no longer define strict static typing as an option, so the utility of these functions has declined. They may still serve a purpose, however, as assertions signaling expected preconditions both to the processor and to anyone reading the code.
The type signatures for these functions deliberately declare the argument type as item()*, permitting a sequence of any length. A more restrictive signature would defeat the purpose of the function, which is to defer cardinality checking until query execution time.
Returns $input if it contains exactly one item. Otherwise, raises an error.
fn:exactly-one( | ||
$input | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
Except in error cases, the function returns $input unchanged.
The effect of the function is equivalent to the result of the following XPath expression, except in error cases.
if (count($input) eq 1)
then $input
else error(parse-QName('Q{http://www.w3.org/2005/xqt-errors}FORG0005'))if (count($input) eq 1)
then $input
else error(#Q{http://www.w3.org/2005/xqt-errors}FORG0005))A dynamic error is raised [err:FORG0005] if $input is an empty sequence or a sequence containing more than one item.
Returns $input if it contains one or more items. Otherwise, raises an error.
fn:one-or-more( | ||
$input | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
Except in error cases, the function returns $input unchanged.
The effect of the function is equivalent to the result of the following XPath expression, except in error cases.
if (count($input) ge 1)
then $input
else error(parse-QName('Q{http://www.w3.org/2005/xqt-errors}FORG0004'))if (count($input) ge 1)
then $input
else error(#Q{http://www.w3.org/2005/xqt-errors}FORG0004))A dynamic error is raised [err:FORG0004] if $input is an empty sequence.
Returns input if it contains zero or one items. Otherwise, raises an error.
fn:zero-or-one( | ||
$input | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
Except in error cases, the function returns $input unchanged.
The effect of the function is equivalent to the result of the following XPath expression, except in error cases.
if (count($input) le 1)
then $input
else error(parse-QName('Q{http://www.w3.org/2005/xqt-errors}FORG0003'))if (count($input) le 1)
then $input
else error(#Q{http://www.w3.org/2005/xqt-errors}FORG0003))A dynamic error is raised [err:FORG0003] if $input contains more than one item.
The following functions take function items as an argument.
| Function | Meaning |
|---|---|
fn:apply | Makes a dynamic call on a function with an argument list supplied in the form of an array. |
fn:do-until | Processes a supplied value repeatedly, continuing when some condition is false, and returning the value that satisfies the condition. |
fn:every | Returns true if every item in the input sequence matches a supplied predicate. |
fn:filter | Returns those items from the sequence $input for which the supplied function $predicate returns true. |
fn:fold-left | Processes the supplied sequence from left to right, applying the supplied function repeatedly to each item in turn, together with an accumulated result value. |
fn:fold-right | Processes the supplied sequence from right to left, applying the supplied function repeatedly to each item in turn, together with an accumulated result value. |
fn:for-each | Applies the function item $action to every item from the sequence $input in turn, returning the concatenation of the resulting sequences in order. |
fn:for-each-pair | Applies the function item $action to successive pairs of items taken one from $input1 and one from $input2, returning the concatenation of the resulting sequences in order. |
fn:highest | Returns a value that is greater than or equal to every other value appearing in the input sequence. |
fn:index-where | Returns the positions in an input sequence of items that match a supplied predicate. |
fn:lowest | Returns those items from a supplied sequence that have the lowest value of a sort key, where the sort key can be computed using a caller-supplied function. |
fn:partial-apply | Performs partial application of a function item by binding values to selected arguments. |
fn:partition | Partitions a sequence of items into a sequence of non-empty arrays containing the same items, starting a new partition when a supplied condition is true. |
fn:scan-left | Produces the sequence of successive partial results from the evaluation of fn:fold-left with the same arguments. |
fn:scan-right | Produces the sequence of successive partial results from the evaluation of fn:fold-right with the same arguments. |
fn:some | Returns true if at least one item in the input sequence matches a supplied predicate. |
fn:sort | Sorts a supplied sequence, based on the value of a sort key supplied as a function. |
fn:sort-by | Sorts a supplied sequence, based on the value of a number of sort keys supplied as functions. |
fn:sort-with | Sorts a supplied sequence, according to the order induced by the supplied comparator functions. |
fn:subsequence-where | Returns a contiguous sequence of items from $input, with the start and end points located by applying predicates. |
fn:take-while | Returns items from the input sequence prior to the first one that fails to match a supplied predicate. |
fn:transitive-closure | Returns all the GNodes reachable from a given start GNode by applying a supplied function repeatedly. |
fn:while-do | Processes a supplied value repeatedly, continuing while some condition remains true, and returning the first value that does not satisfy the condition. |
With all these functions, if the caller-supplied function fails with a dynamic error, this error is propagated as an error from the higher-order function itself.
Sorts a supplied sequence, based on the value of a number of sort keys supplied as functions.
fn:sort-by( | ||
$input | as , | |
$keys | as | |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on collations.
The result of the function is a sequence that contains all the items from $input, typically in a different order, the order being defined by the supplied sort key definitions.
A sort key definition is a record with three parts:
key: A sort key function, which is applied to each item in the input sequence to determine a sort key value. If no function is supplied, the default is fn:data#1, which atomizes the item.
collation: A collation, which is used when comparing sort key values that are of type xs:string or xs:untypedAtomic. If no collation is supplied, the default collation from the static context is used.
When comparing values of types other than xs:string or xs:untypedAtomic, the collation is ignored (but an error may be reported if it is invalid). For more information see 5.3.7 Choosing a collation.
order: An order direction, either "ascending" or "descending". The default is "ascending".
The number of sort key definitions is determined by the number of records supplied in the $keys argument. If the argument is absent or empty, the default is a single sort key definition using the function data#1, using the default collation from the static context, and with order ascending.
The result of the fn:sort-by function is obtained as follows:
The result sequence contains the same items as the input sequence $input, but generally in a different order.
The sort key definitions are established as described above. The sort key definitions are in major-to-minor order. That is, the position of two items $A and $B in the result sequence is determined first by the relative magnitude of their primary sort key values, which are computed by evaluating the sort key function in the first sort key definition. If those two sort key values are equal, then the position is determined by the relative magnitude of their secondary sort key values, computed by evaluating the sort key function in the second sort key definition, and so on.
When a pair of corresponding sort key values of $A and $B are found to be not equal, then $A precedes $B in the result sequence if both the following conditions are true, or if both conditions are false:
The sort key value for $A is less than the sort key value for $B, as defined below.
The order direction in the corresponding sort key definition is "ascending".
If all the sort key values for $A and $B are pairwise equal, then $A precedes $B in the result sequence if and only if $A precedes $B in the input sequence.
Note:
That is, the sort is stable.
Each sort key value for a given item is obtained by applying the sort key function of the corresponding sort key definition to that item. The result of this function is in the general case a sequence of atomic items. Two sort key values $a and $b are compared as follows:
Let $C be the collation in the corresponding sort key definition.
Let $REL be the result of evaluating op:lexicographic-compare($key($A), $key($B), $C) where op:lexicographic-compare($a, $b, $C) is defined as follows:
if (empty($a) and empty($b)) then 0
else if (empty($a)) then -1
else if (empty($b)) then +1
else let $rel = op:simple-compare(head($a), head($b), $C)
return if ($rel eq 0)
then op:lexicographic-compare(tail($a), tail($b), $C)
else $relHere op:simple-compare($k1, $k2) is defined as follows:
if ($k1 instance of (xs:string | xs:anyURI | xs:untypedAtomic)
and $k2 instance of (xs:string | xs:anyURI | xs:untypedAtomic))
then compare($k1, $k2, $C)
else if ($k1 instance of xs:numeric and $k2 instance of xs:numeric)
then compare($k1, $k2)
else if ($k1 eq $k2) then 0
else if ($k2 lt $k2) then -1
else +1Note:
This raises an error if two keys are not comparable, for example if one is a string and the other is a number, or if both belong to a non-ordered type such as xs:QName.
If $REL is zero, then the two sort key values are deemed equal; if $REL is -1 then $a is deemed less than $b, and if $REL is +1 then $a is deemed greater than $b
If the set of computed sort keys contains values that are not comparable using the lt operator then the sort operation will fail with a type error ([err:XPTY0004]XP).
The function is a generalization of the fn:sort function available in 3.1, which is retained for compatibility. The enhancements allow multiple sort keys to be defined, each potentially with a different collation, and allow sorting in descending order.
If the sort key for an item evaluates to the empty sequence, the effect of the rules is that this item precedes any value for which the key is non-empty. This is equivalent to the effect of the XQuery option empty least. The effect of the option empty greatest can be achieved by adding an extra sort key definition with { 'key': fn { empty(K(.) } }: when comparing boolean sort keys, false precedes true.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | let $SWEDISH := collation({ 'lang': 'se' })
return sort-by($in, { 'collation': $SWEDISH })let $SWEDISH := collation({ 'lang': 'se' })
return sort-by(//name, { 'collation': $SWEDISH }) |
| Result: | The names in |
| Expression: | sort-by(//employee, { 'key': fn { name ! (last, first) } }) |
| Result: | Sorts a sequence of employees by last name as the major sort key and first name as the minor sort key, using the default collation |
| Expression: | sort-by(
$employees,
({ 'key': fn { name/last }, 'collation': collation({ 'lang': 'se' }) },
{ 'key': fn { xs:decimal(salary) }, 'order': 'descending' }))sort-by(
//employee,
({ 'key': fn { name/last }, 'collation': collation({ 'lang': 'se' }) },
{ 'key': fn { xs:decimal(salary) }, 'order': 'descending' })) |
| Result: | Sorts a sequence of employees first by increasing last name (using Swedish collation order) and then by decreasing salary |
This section specifies arithmetic operators on the numeric datatypes defined in [XML Schema Part 2: Datatypes Second Edition].
| Function | Meaning |
|---|---|
fn:format-integer | Formats an integer according to a given picture string, using the conventions of a given natural language if specified. |
Formats an integer according to a given picture string, using the conventions of a given natural language if specified.
fn:format-integer( | ||
$value | as , | |
$picture | as , | |
$language | as | := () |
) as | ||
The two-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on default language.
The three-argument form of this function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence, the function returns a zero-length string.
In all other cases, the $picture argument describes the format in which $value is output.
The rules that follow describe how non-negative numbers are output. If the value of $value is negative, the rules below are applied to the absolute value of $value, and a minus sign is prepended to the result.
The value of $picture consists of the following, in order:
An optional radix, which is an integer in the range 2 to 36, written using ASCII digits (0-9) without any leading zero;
A circumflex (^), which is present if the radix is present, and absent otherwise.
A circumflex is recognized as marking the presence of a radix only if (a) it is immediately preceded by an integer in the range 2 to 36, and (b) it is followed (somewhere within the primary format token) by an "X" or "x". In other cases, the circumflex is treated as a grouping separator. For example, the picture 9^000 outputs the number 2345 as "2^345", whereas 9^XXX outputs "3185". This rule is to ensure backwards compatibility.
A primary format token. This is always present and must not be zero-length.
An optional format modifier.
If the string contains one or more semicolons then the last semicolon is taken as terminating the primary format token, and everything that follows is taken as the format modifier; if the string contains no semicolon then the format modifier is taken to be absent (which is equivalent to supplying a zero-length string).
If a radix is present, then the primary format token must follow the rules for a digit-pattern.
The primary format token is classified as one of the following:
A digit-pattern made up of optional-digit-signs, mandatory-digit-signs, and grouping-separator-signs.
The optional-digit-sign is the character #.
If the radix is absent, then a mandatory-digit-sign is a character in Unicode category Nd. All mandatory-digit-signs within the format token must be from the same digit family, where a digit family is a sequence of ten consecutive characters in Unicode category Nd, having digit values 0 through 9. Within the format token, these digits are interchangeable: a three-digit number may thus be indicated equivalently by 000, 001, or 999.
If the primary format token contains at least one Unicode digit, then the primary format token is taken as a decimal digit pattern, and in this case it must match the regular expression ^((\p{Nd}|#|[^\p{N}\p{L}])+?)$. If it contains a digit but does not match this pattern, a dynamic error is raised [err:FODF1310].
If the radix (call it R) is present (including the case where an explicit radix of 10 is used), then the character used as the mandatory-digit-sign is either "x" or "X". If any mandatory-digit-sign is upper-case "X", then all mandatory-digit-signs must be upper-case "X". The digit family used in the output comprises the first R characters of the alphabet 0123456789abcdefghijklmnopqrstuvwxyz, but using upper-case letters in place of lower-case if an upper-case "X" is used as the mandatory-digit-sign.
In this case the primary format token must match the regular expression ^(([Xx#]|[^\p{N}\p{L}])+?)$
a grouping-separator-sign is a non-alphanumeric character, that is a character whose Unicode category is other than Nd, Nl, No, Lu, Ll, Lt, Lm or Lo.
Note:
If a semicolon is to be used as a grouping separator, then the primary format token as a whole must be followed by another semicolon, to ensure that the grouping separator is not mistaken as a separator between the primary format token and the format modifier.
There must be at least one mandatory-digit-sign. There may be zero or more optional-digit-signs, and (if present) these must precede all mandatory-digit-signs. There may be zero or more grouping-separator-signs. A grouping-separator-signmust not appear at the start or end of the digit-pattern, nor adjacent to another grouping-separator-sign.
The corresponding output is a number in the specified radix, using this digit family, with at least as many digits as there are mandatory-digit-signs in the format token. Thus:
A format token 1 generates the sequence 0 1 2 ... 10 11 12 ...
A format token 01 (or equivalently, 00 or 99) generates the sequence 00 01 02 ... 09 10 11 12 ... 99 100 101
A format token of U+0661 (ARABIC-INDIC DIGIT ONE, ١) generates the sequence ١ then ٢ then ٣ ...
A format token of 16^xx generates the sequence 00 01 02 03 ... 08 09 0a 0b 0c 0d 0e 0f 10 11 ...
A format token of 16^X generates the sequence 0 1 2 3 ... 8 9 A B C D E F 10 11 ...
The grouping-separator-signs are handled as follows:
The position of grouping separators within the format token, counting backwards from the last digit, indicates the position of grouping separators to appear within the formatted number, and the character used as the grouping-separator-sign within the format token indicates the character to be used as the corresponding grouping separator in the formatted number.
More specifically, the position of a grouping separator is the number of optional-digit-signs and mandatory-digit-signs appearing between the grouping separator and the right-hand end of the primary format token.
Grouping separators are defined to be regular if the following conditions apply:
There is at least one grouping separator.
Every grouping separator is the same character (call it C).
There is a positive integer G (the grouping size) such that:
The position of every grouping separator is an integer multiple of G, and
Every positive integer multiple of G that is less than the number of optional-digit-signs and mandatory-digit-signs in the primary format token is the position of a grouping separator.
The grouping separator template is a (possibly infinite) set of (position, character) pairs.
If grouping separators are regular, then the grouping separator template contains one pair of the form (n×G, C) for every positive integer n where G is the grouping size and C is the grouping character.
Otherwise (when grouping separators are not regular), the grouping separator template contains one pair of the form (P, C) for every grouping separator found in the primary formatting token, where C is the grouping separator character and P is its position.
Note:
If there are no grouping separators, then the grouping separator template is the empty set.
The number is formatted as follows:
Let S1 be the result of formatting the supplied number in the appropriate radix: for radix 10 this will be the value obtained by casting it to xs:string.
Let S2 be the result of padding S1 on the left with as many leading zeroes as are needed to ensure that it contains at least as many digits as the number of mandatory-digit-signs in the primary format token.
Let S3 be the result of replacing all decimal digits (0-9) in S2 with the corresponding digits from the selected digit family. (This has no effect when the selected digit family uses ASCII digits (0-9), which will always be the case if a radix is specified.)
Let S4 be the result of inserting grouping separators into S3: for every (position P, character C) pair in the grouping separator template where P is less than the number of digits in S3, insert character C into S3 at position P, counting from the right-hand end.
Let S5 be the result of converting S4 into ordinal form, if an ordinal modifier is present, as described below.
The result of the function is then S5.
The format token A, which generates the sequence A B C ... Z AA AB AC....
The format token a, which generates the sequence a b c ... z aa ab ac....
The format token i, which generates the sequence i ii iii iv v vi vii viii ix x ....
The format token I, which generates the sequence I II III IV V VI VII VIII IX X ....
The format token w, which generates numbers written as lower-case words, for example in English, one two three four ...
The format token W, which generates numbers written as upper-case words, for example in English, ONE TWO THREE FOUR ...
The format token Ww, which generates numbers written as title-case words, for example in English, One Two Three Four ...
Any other format token, which indicates a numbering sequence in which that token represents the number 1 (one) (but see the note below). It is implementation-defined which numbering sequences, additional to those listed above, are supported. If an implementation does not support a numbering sequence represented by the given token, it must use a format token of 1.
Note:
In some traditional numbering sequences additional signs are added to denote that the letters should be interpreted as numbers, for example, in ancient Greek U+0374 (DEXIA KERAIA, ʹ) and sometimes U+0375 (ARISTERI KERAIA, ͵) . These should not be included in the format token.
For all format tokens other than a digit-pattern, there may be implementation-defined lower and upper bounds on the range of numbers that can be formatted using this format token; indeed, for some numbering sequences there may be intrinsic limits. For example, the format token U+2460 (CIRCLED DIGIT ONE, ①) has a range imposed by the Unicode character repertoire — zero to 20 in Unicode versions prior to 3.2, or zero to 50 in subsequent versions. For the numbering sequences described above any upper bound imposed by the implementation must not be less than 1000 (one thousand) and any lower bound must not be greater than 1. Numbers that fall outside this range must be formatted using the format token 1.
The above expansions of numbering sequences for format tokens such as a and i are indicative but not prescriptive. There are various conventions in use for how alphabetic sequences continue when the alphabet is exhausted, and differing conventions for how roman numerals are written (for example, IV versus IIII as the representation of the number 4). Sometimes alphabetic sequences are used that omit letters such as i and o. This specification does not prescribe the detail of any sequence other than those sequences consisting entirely of decimal digits.
Many numbering sequences are language-sensitive. This applies especially to the sequence selected by the tokens w, W, and Ww. It also applies to other sequences, for example different languages using the Cyrillic alphabet use different sequences of characters, each starting with the letter U+0410 (CYRILLIC CAPITAL LETTER A, А) . In such cases, the $language argument specifies which language conventions are to be used. If the argument is specified, the value should be either the empty sequence or a value that would be valid for the xml:lang attribute (see [Extensible Markup Language (XML) 1.0 (Fifth Edition)]). Note that this permits the identification of sublanguages based on country codes (from ISO 3166-1) as well as identification of dialects and regions within a country.
The set of languages for which numbering is supported is implementation-defined. If the $language argument is absent, or is set to the empty sequence, or is invalid, or is not a language supported by the implementation, then the number is formatted using the default language from the dynamic context.
The format modifier must be a string that matches the regular expression ^([co](\(.+\))?)?[at]?$. That is, if it is present it must consist of one or more of the following, in order:
either c or o, optionally followed by a sequence of characters enclosed between parentheses, to indicate cardinal or ordinal numbering respectively, the default being cardinal numbering
either a or t, to indicate alphabetic or traditional numbering respectively, the default being implementation-defined.
If the o modifier is present, this indicates a request to output ordinal numbers rather than cardinal numbers. For example, in English, when used with the format token 1, this outputs the sequence 1st 2nd 3rd 4th ..., and when used with the format token w outputs the sequence first second third fourth ....
The string of characters between the parentheses, if present, is used to select between other possible variations of cardinal or ordinal numbering sequences. The interpretation of this string is implementation-defined. No error occurs if the implementation does not define any interpretation for the defined string.
It is implementation-defined what combinations of values of the format token, the language, and the cardinal/ordinal modifier are supported. If ordinal numbering is not supported for the combination of the format token, the language, and the string appearing in parentheses, the request is ignored and cardinal numbers are generated instead.
The use of the a or t modifier disambiguates between numbering sequences that use letters. In many languages there are two commonly used numbering sequences that use letters. One numbering sequence assigns numeric values to letters in alphabetic sequence, and the other assigns numeric values to each letter in some other manner traditional in that language. In English, these would correspond to the numbering sequences specified by the format tokens a and i. In some languages, the first member of each sequence is the same, and so the format token alone would be ambiguous. In the absence of the a or t modifier, the default is implementation-defined.
A dynamic error is raised [err:FODF1310] if the format token is invalid, that is, if it violates any mandatory rules (indicated by an emphasized must or required keyword in the above rules). For example, the error is raised if the primary format token contains a digit but does not match the required regular expression.
Note the careful distinction between conditions that are errors and conditions where fallback occurs. The principle is that an error in the syntax of the format picture will be reported by all processors, while a construct that is recognized by some implementations but not others will never result in an error, but will instead cause a fallback representation of the integer to be used.
The following notes apply when a digit-pattern is used:
If grouping-separator-signs appear at regular intervals within the format token, then the sequence is extrapolated to the left, so grouping separators will be used in the formatted number at every multiple of N. For example, if the format token is 0'000 then the number one million will be formatted as 1'000'000, while the number fifteen will be formatted as 0'015.
The only purpose of optional-digit-signs is to mark the position of grouping-separator-signs. For example, if the format token is #'##0 then the number one million will be formatted as 1'000'000, while the number fifteen will be formatted as 15. A grouping separator is included in the formatted number only if there is a digit to its left, which will only be the case if either (a) the number is large enough to require that digit, or (b) the number of mandatory-digit-signs in the format token requires insignificant leading zeros to be present.
Grouping separators are not designed for effects such as formatting a US telephone number as (365)123-9876. In general they are not suitable for such purposes because (a) only single characters are allowed, and (b) they cannot appear at the beginning or end of the number.
Numbers will never be truncated. Given the digit-pattern01, the number three hundred will be output as 300, despite the absence of any optional-digit-sign.
The following notes apply when ordinal numbering is selected using the o modifier.
In some languages, the form of numbers (especially ordinal numbers) varies depending on the grammatical context: they may have different genders and may decline with the noun that they qualify. In such cases the string appearing in parentheses after the letter c or o may be used to indicate the variation of the cardinal or ordinal number required.
The way in which the variation is indicated will depend on the conventions of the language.
For inflected languages that vary the ending of the word, the approach recommended in the previous version of this specification was to indicate the required ending, preceded by a hyphen: for example in German, appropriate values might be o(-e), o(-er), o(-es), o(-en).
Another approach, which might usefully be adopted by an implementation based on the open-source ICU localization library [ICU], or any other library making use of the Unicode Common Locale Data Repository [Unicode CLDR], is to allow the value in parentheses to be the name of a registered numbering rule set for the language in question, conventionally prefixed with a percent sign: for example, o(%spellout-ordinal-masculine), or c(%spellout-cardinal-year).
The following notes apply when the primary format token is neither a digit-pattern nor one of the seven other defined format tokens (A, a, i, I, w, W, Ww), but is an arbitrary token representing the number 1:
Unexpected results may occur for traditional numbering. For example, in an implementation that supports traditional numbering system in Greek, the example format-integer(19, "α;t") might return δπιιιι or ιθ, depending upon whether the ancient acrophonic or late antique alphabetic system is supported.
Unexpected results may also occur for alphabetic numbering. For example, in an implementation that supports alphabetic numbering system in Greek, someone writing format-integer(19, "α;a") might expect the nineteenth Greek letter, U+03C4 (GREEK SMALL LETTER TAU, τ) , but the implementation might return the eighteenth one, U+03C3 (GREEK SMALL LETTER SIGMA, σ) , because the latter is the nineteenth item in the sequence of lowercase Greek letters in Unicode (the sequence is interrupted because of the final form of the sigma, U+03C2 (GREEK SMALL LETTER FINAL SIGMA, ς) ). Because Greek never had a final capital sigma, Unicode has marked U+03A2, the eighteenth codepoint in the sequence of Greek capital letters, as reserved, to ensure that every Greek uppercase letter is always 32 codepoints less than its lowercase counterpart. Therefore, someone writing format-integer(18, "Α;a") might expect the eighteenth Greek capital letter, U+03A3 (GREEK CAPITAL LETTER SIGMA, Σ) , but an implementation might return U+03A2, the eighteenth position in the sequence of Greek capital letters, but unassigned to any character.
| Expression | Result |
|---|---|
|
|
| Depending on the default language, the expression might return the string |
|
|
| If supported, might return the string |
| This requests ordinal numbering in Italian: if supported, this should produce the sequence: |
| This requests ordinal numbering in Italian, spelled out as words: if supported, this should produce the sequence: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This section defines a function for formatting decimal and floating point numbers.
| Function | Meaning |
|---|---|
fn:format-number | Returns a string containing a number formatted according to a given picture string and decimal format. |
Note:
This function can be used to format any numeric quantity, including an integer. For integers, however, the fn:format-integer function offers additional possibilities. Note also that the picture strings used by the two functions are not 100% compatible, though they share some options in common.
Decimal formats are defined in the static context, and the way they are defined is therefore outside the scope of this specification. XSLT and XQuery both provide custom syntax for creating a decimal format.
The static context provides a set of decimal formats. One of the decimal formats is unnamed, the others (if any) are identified by a QName. There is always an unnamed decimal format available, but its contents are implementation-defined.
Each decimal format provides a set of named properties.
Note:
A phrase such as "The minus-signXP31 character" is to be read as “the character assigned to the minus-signXP31 property in the relevant decimal format”.
[Definition: [Definition] The decimal digit family of a decimal format is the sequence of ten digits with consecutive Unicode codepoints starting with the character that is the value of the zero-digitXP31 property.]
[Definition: [Definition] The optional digit character is the character that is the value of the digitXP31 property.]
For any decimal format, the properties representing characters used in a picture string must have distinct values. These properties are decimal-separatorXP31 , grouping-separatorXP31, exponent-separatorXP31, percentXP31, per-milleXP31, digitXP31, and pattern-separatorXP31. Furthermore, none of these properties may be equal to any character in the decimal digit family.
Note:
This differs from the format-number function previously defined in XSLT 2.0 in that any digit can be used in the picture string to represent a mandatory digit: for example the picture strings "000", "001", and "999" are equivalent. The digits will all be from the same decimal digit family, specifically, the sequence of ten consecutive digits starting with the digit assigned to the zero-digit property. This change is to align format-number (which previously used "000") with format-dateTime (which used 001).
[Definition: [Definition] The formatting of a number is controlled by a picture string. The picture string is a sequence of characters, in which the characters assigned to the properties decimal-separatorXP31 , exponent-separatorXP31, grouping-separatorXP31, digitXP31, and pattern-separatorXP31 and the members of the decimal digit family, are classified as active characters, and all other characters (including the values of the properties percentXP31 and per-milleXP31) are classified as passive characters.]
A dynamic error is raised [err:FODF1310] if the picture string does not conform to the following rules. Note that in these rules the words "preceded" and "followed" refer to characters anywhere in the string; they are not to be read as "immediately preceded" and "immediately followed".
A picture-string consists either of a sub-picture, or of two sub-pictures separated by the pattern-separatorXP31 character. A picture-string must not contain more than one instance of the pattern-separatorXP31 character. If the picture-string contains two sub-pictures, the first is used for positive and unsigned zero values and the second for negative values.
A sub-picture must not contain more than one instance of the decimal-separatorXP31 character.
A sub-picture must not contain more than one instance of the percentXP31 or per-milleXP31 characters, and it must not contain one of each.
The mantissa part of a sub-picture (defined below) must contain at least one character that is either an optional digit character or a member of the decimal digit family.
A sub-picture must not contain a passive character that is preceded by an active character and that is followed by another active character.
A sub-picture must not contain a grouping-separatorXP31 character that appears adjacent to a decimal-separatorXP31 character, or in the absence of a decimal-separatorXP31 character, at the end of the integer part.
A sub-picture must not contain two adjacent instances of the grouping-separatorXP31 character.
The integer part of a sub-picture (defined below) must not contain a member of the decimal digit family that is followed by an instance of the optional digit character. The fractional part of a sub-picture (defined below) must not contain an instance of the optional digit character that is followed by a member of the decimal digit family.
A character that matches the exponent-separatorXP31 property is treated as an exponent-separator-sign if it is both preceded and followed within the sub-picture by an active character. Otherwise, it is treated as a passive character. A sub-picture must not contain more than one character that is treated as an exponent-separator-sign.
A sub-picture that contains a percentXP31 or per-milleXP31 character must not contain a character treated as an exponent-separator-sign.
If a sub-picture contains a character treated as an exponent-separator-sign then this must be followed by one or more characters that are members of the decimal digit family, and it must not be followed by any active character that is not a member of the decimal digit family.
The mantissa part of the sub-picture is defined as the part that appears to the left of the exponent-separator-sign if there is one, or the entire sub-picture otherwise. The exponent part of the subpicture is defined as the part that appears to the right of the exponent-separator-sign; if there is no exponent-separator-sign then the exponent part is absent.
The integer part of the sub-picture is defined as the part that appears to the left of the decimal-separatorXP31 character if there is one, or the entire mantissa part otherwise.
The fractional part of the sub-picture is defined as that part of the mantissa part that appears to the right of the decimal-separatorXP31 character if there is one, or the part that appears to the right of the rightmost active character otherwise. The fractional part may be zero-length.
This section specifies functions and operators on the [XML Schema Part 2: Datatypes Second Edition]xs:string datatype and the datatypes derived from it.
| Function | Meaning |
|---|---|
fn:codepoint-equal | Returns true if two strings are equal, considered codepoint-by-codepoint. |
fn:collation | Constructs a collation URI with requested properties. |
fn:collation-available | Asks whether a collation URI is recognized by the implementation, and whether it has required properties. |
fn:collation-key | Given a string value and a collation, generates an internal value called a collation key, with the property that the matching and ordering of collation keys reflects the matching and ordering of strings under the specified collation. |
fn:contains-token | Determines whether or not any of the supplied strings, when tokenized at whitespace boundaries, contains the supplied token, under the rules of the supplied collation. |
[Definition: [Definition] A collation is an algorithm that determines, for any two given strings S1 and S2, whether S1 is less than, equal to, or greater than S2. In this specification, a collation is identified by an absolute URI.]
The [Character Model for the World Wide Web 1.0: Fundamentals] observes that different applications may require different comparison and ordering behaviors. Similarly, different users with different linguistic expectations may require different behaviors. Consequently, the collation must be taken into account when comparing strings.
Collations can indicate that two different codepoints are to be considered equal for comparison purposes (for example, “v” and “w” are considered equivalent in some Swedish collations). Strings can be compared codepoint-by-codepoint or in a linguistically appropriate manner.
Note:
Some sources, for example [UTS #10] use the term collation to refer more generically to a set of sorting rules that can be further parameterized or “tailored”. In this specification the term is always used for a specific algorithm in which all such parameters have defined values.
This specification defines some collation URIs that provide interoperable sorting behavior across applications. Other collation URIs are defined only partially (leaving some aspects implementation-defined). Implementations may define further collation URIs, or may allow users or third parties to define them.
The Unicode codepoint collation is available in every implementation. This collation sorts based on codepoint values. For further details see 5.3.3 The Unicode Codepoint Collation.
Collations may or may not perform Unicode normalization on strings before comparing them.
This specification allows a collation name to be provided as an argument to many string functions. Although collations are defined to be URIs, they are supplied as instances of xs:string.
The XQuery/XPath static context supplies a default collation for use when the collation argument is not specified. (see 2.1.1 Static Context XP31). If the default collation is not specified by the user or the system, the default collation is the Unicode codepoint collation.
If the collation is specified using a relative URI reference, it is resolved relative to an implementation-defined base URI.
Note:
Previous versions of this specification stated that it must be resolved against the Static Base URIXP, but this is not always operationally convenient. It is recommended that processors should provide a means of setting the base URI for resolving collation URIs independently of the Static Base URIXP, though for backwards compatibility, the Static Base URIXP or Executable Base URIXP should be used as a default.
This specification does not define whether or not the collation URI is dereferenced. The collation URI may be an abstract identifier, or it may refer to an actual resource describing the collation. If it refers to a resource, this specification does not define the nature of that resource. One possible candidate is that the resource is a locale description expressed using the Locale Data Markup Language: see [UTS #35].
Note:
The ability to access external resources depends on whether the calling code is trustedXP.
Note:
XML allows elements to specify the xml:lang attribute to indicate the language associated with the content of such an element. This specification does not use xml:lang to identify the default collation because using xml:lang does not produce desired effects when the two strings to be compared have different xml:lang values or when a string is multilingual.
[Definition: [Definition] The collation URI http://www.w3.org/2005/xpath-functions/collation/codepoint identifies a collation which must be recognized by every implementation: it is referred to as the Unicode codepoint collation (not to be confused with the Unicode collation algorithm).]
The Unicode codepoint collation does not perform any normalization on the supplied strings.
The collation is defined as follows. Each of the two strings is converted to a sequence of integers using the fn:string-to-codepoints function. These two sequences $A and $B are then compared as follows:
If both sequences are empty, the strings are equal.
If one sequence is empty and the other is not, then the string corresponding to the empty sequence is less than the other string.
If the first integer in $A is less than the first integer in $B, then the string corresponding to $A is less than the string corresponding to $B.
If the first integer in $A is greater than the first integer in $B, then the string corresponding to $A is greater than the string corresponding to $B.
Otherwise (the first pair of integers are equal), the result is obtained by applying the same rules recursively to fn:tail($A) and fn:tail($B)
Note:
While the Unicode codepoint collation does not produce results suitable for quality publishing of printed indexes or directories, it is adequate for many purposes where a restricted alphabet is used, such as sorting of vehicle registrations.
Note:
The Unicode codepoint collation differs from the default sort order used in programming languages that sort strings based on UTF-16 code units, which may include surrogate pairs.
The functions described in this section examine a string $arg1 to see whether it contains another string $arg2 as a substring. The result depends on whether $arg2 is a substring of $arg1, and if so, on the range of characters in $arg1 which $arg2 matches.
When the Unicode codepoint collation is used, this simply involves determining whether $arg1 contains a contiguous sequence of characters whose codepoints are the same, one for one, with the codepoints of the characters in $arg2.
When a collation is specified, the rules are more complex.
All collations support the capability of deciding whether two strings are considered equal, and if not, which of the strings should be regarded as preceding the other. For functions such as fn:compare, this is all that is required. For other functions, such as fn:contains, the collation needs to support an additional property: it must be able to decompose the string into a sequence of collation units, each unit consisting of one or more characters, such that two strings can be compared by pairwise comparison of these units.
[Definition: [Definition] The term collation unit as used in this specification is equivalent to the term collation element used in [UTS #10].]
The string Q is then considered to contain P as a substring if the sequence of collation units corresponding to P is a subsequence of the sequence of collation units corresponding to Q. The characters in P that match are the characters corresponding to these collation units.
This rule may occasionally lead to surprises. For example, consider a collation that treats "Jaeger" and "Jäger" as equal. It might do this by treating "ä" as representing two collation units, in which case the expression fn:contains("Jäger", "eg") will return true. Alternatively, a collation might treat "ae" as a single collation unit, in which case the expression fn:contains("Jaeger", "eg") will return false. The results of these functions thus depend strongly on the properties of the collation that is used.
In addition, collations may specify that some collation units should be ignored during matching. If hyphen is an ignored collation unit, then fn:contains("code-point", "codepoint") will be true, and fn:contains("codepoint", "-") will also be true.
In the rules for the functions defined in this section, we use the following terms taken from [UTS #10]:
[Definition: [Definition] The term match is used in the sense of definition DS2 from [UTS #10].]
[Definition: [Definition] The term minimal match is used in the sense of definition DS4 from [UTS #10].]
In the definitions in [UTS #10], these rules involve a number of parameters. In the context of the functions defined in this section, these parameters are interpreted as follows:
C is the collation; that is, the value of the $collation argument if specified, otherwise the default collation.
P is the (candidate) substring, the value of the $substring argument to the function.
Q is the (candidate) containing string, the value of the $value argument to the function.
The boundary condition B is satisfied at the start and end of a string, and between any two characters that belong to different collation units (“collation elements” in the language of [UTS #10]). It is not satisfied between two characters that belong to the same collation unit.
It is possible to define collations that do not have the ability to decompose a string into units suitable for substring matching. An argument to a function defined in this section may be a URI that identifies a collation that is able to compare two strings, but that does not have the capability to split the string into collation units. Such a collation may cause the function to fail, or to give unexpected results, or it may be rejected as an unsuitable argument. The ability to decompose strings into collation units is an implementation-defined property of the collation. The fn:collation-available function can be used to ask whether a particular collation has this property.
| Function | Meaning |
|---|---|
fn:contains | Returns true if the string $value contains $substring as a substring, taking collations into account. |
fn:starts-with | Returns true if the string $value contains $substring as a leading substring, taking collations into account. |
fn:ends-with | Returns true if the string $value contains $substring as a trailing substring, taking collations into account. |
fn:substring-before | Returns the part of $value that precedes the first occurrence of $substring, taking collations into account. |
fn:substring-after | Returns the part of $value that follows the first occurrence of $substring, taking collations into account. |
The functions described in this section make use of a regular expression syntax for pattern matching. The syntax and semantics of regular expressions are defined in this section.
Changes in 4.0 (next | previous)
Regular expressions can include comments (starting and ending with #) if the c flag is set. [Issue 999 PR 1022 20 February 2024]
Word boundaries can be matched. Lookahead and lookbehind assertions are supported. Assertions (including ^ and $) can no longer be followed by a quantifier. [Issues 998 1006 PR 1856]
The regular expression syntax used by these functions is defined in terms of the regular expression syntax specified in XSD 1.1 (see [XSD 1.1 Part 2]), which in turn is based on the established conventions of languages such as Perl. However, because XML Schema uses regular expressions only for validity checking, it omits some facilities that are widely used with other languages. XPath, therefore, extends the XML Schema regular expression syntax to reinstate some of these capabilities.
Note:
Implementers should consult [UTS #18] for information on using regular expression processing on Unicode characters.
The regular expression syntax and semantics are identical to those defined in [XSD 1.1 Part 2] with the additions described in the following subsections.
Note:
In [XSD 1.1 Part 2] there are no substantive technical changes to the syntax or semantics of regular expressions relative to [XML Schema Part 2: Datatypes Second Edition], but a number of errors and ambiguities have been resolved. For example, the rules for the interpretation of hyphens within square brackets in a regular expression have been clarified; and the semantics of regular expressions are no longer tied to a specific version of Unicode.
XSD 1.1 is therefore used as the specification baseline, even for processors that only support XSD 1.0.
As well as extending the XSD 1.1 syntax for regular expressions, this specification also extends the processing model.
In XSD, a regular expression is defined to denote a set of strings, and the only functionality offered is to test whether a string matches a regular expression: that is, whether it is a member of the set of strings denoted by the regular expression.
In this specification, matching a string S against a regular expression delivers a more complex outcome.
First some terminology:
[Definition: [Definition] A string of length N has N+1character positions: one immediately before each character in the string, and one after the last character. In interfaces where character positions are exposed, they are numbered from 1 to N+1.]
[Definition: [Definition] A segment of a string S is a sequence of zero or more contiguous characters starting at a given character position within S.] Segments of a string are uniquely identified by their start position and length. The sequence of characters making up a segment is referred to as the string value of the segment.
[Definition: [Definition] The end position of a segment is the start position of the segment plus its length.]
The operation of matching a string S against a regular expression delivers:
A set of matching segments. The string S as a whole is said to match the regular expression if the set of matching segments is non-empty.
For each matching segmentM, a collection of captured groups. This is a mapping from positive integers to segments. The integer is called the group number, and corresponds to the ordinal sequence of opening parentheses of capturing subexpressions within the regular expression, as explained below. The corresponding segment is always a segment of S, but in the case of capturing expressions within lookahead assertions, it is not necessarily a segment of M.
The semantics of particular constructs in a regular expression are affected by a set of flags. The available flags and their effect are defined in 6.2 Flags.
The different functions available, such as fn:replace and fn:tokenize, are defined in terms of this outcome. For example:
The function fn:matches returns true if the set of matching segments is non-empty.
The function fn:replace replaces matching segments of the input string with a replacement string.
The function fn:tokenize returns the segments of the input string that appear between the matching segments.
In principle the set of segments that match a regular expression can be determined by enumerating all the segments of the input string and examining each one independently to establish whether it matches. In practice, however:
If several matching segments have the same starting position, then only one of them is returned. This is chosen as follows:
In the case of a choice (operator "|") the first matching branch is chosen.
In the case of a repetition with a greedy quantifier (for example "+" or "*") the longest matching segment is chosen.
In the case of a repetition with a reluctant quantifier (for example "+?" or "*?") the shortest matching segment is chosen.
A matching segment is not included in the result if it overlaps an earlier matching segment: specifically, a segment with start position S1 is excluded if there is a segment that has start position S0 and length L0, where S0 < S1 < S0+L0.
Note:
Two segments can be adjacent: that is, the start position of one segment can be equal to the end position of the previous segment. This is true even when the second segment is zero-length (the two segments are not considered to be overlapping, even though they have the same end position). This means, for example, that the regular expression a*(?=x) has two non-overlapping matches against the string aaax, one at position 1 and the other at position 4.
[Definition: [Definition] The disjoint matching segments obtained by applying a regular expression R to a string S in the presence of a set of flags F are the segments of S that match R (using flags F), after elimination of overlapping segments.]
The semantics of a regular expression are thus defined by stating which segments of an input string it matches, and what the captured groups corresponding to this match are. This is defined recursively for each construct that may appear within a regular expression, in terms of the outcome of applying its subexpressions.
For constructs defined in XSD 1.1 (branch, piece, NormalChar, charClass), XSD defines a set of strings denoted by the construct. The corresponding semantics for this specification are that the segments matched by such a construct are the segments whose string value is contained in this set.
For constructs added to the XSD 1.1 baseline by this specification, the semantics are defined in the sections that follow.
The regular expression syntax defined by [XML Schema Part 2: Datatypes Second Edition] allows a regular expression to contain parenthesized subexpressions, but attaches no special significance to them. Some operations associated with regular expressions (for example, back-references, and the fn:replace function) allow access to the parts of the input string that matched a parenthesized subexpression (called captured groups).
[Definition: [Definition] A left parenthesis is recognized as a capturing left parenthesis provided it is not immediately followed by ? or * (see below), is not within a character group (square brackets), and is not escaped with a backslash. The sub-expression enclosed by a capturing left parenthesis and its matching right parenthesis is referred to as a capturing subexpression.]
More specifically, the capturing subexpression enclosed by the Nth capturing left parenthesis within the regular expression (determined by its character position in left-to-right order, and counting from one) is referred to as the Nth capturing subexpression.
For example, in the regular expression A(BC(?:D(EF(GH[()])))), the subexpression BC(?:D(EF(GH[()]))) is capturing subexpression 1, the string subexpression EF(GH[()]) is capturing subexpression 2, and the subexpression GH[()] is capturing subexpression 3.
When, in the course of evaluating a regular expression, a particular segment of the input matches a capturing subexpression, that segment becomes available as a captured group. The segment matched by the Nth capturing subexpression is referred to as the Nth captured group. By convention, the segment captured by the entire regular expression is treated as captured group 0 (zero).
When a capturing subexpression is matched more than once (because it is within a construct that allows repetition), then only the last substring that it matched will be captured. Note that this rule is not sufficient in all cases to ensure an unambiguous result, especially in cases where (a) the regular expression contains nested repeating constructs, and/or (b) the repeating construct matches a zero-length string. In such cases it is implementation-dependent which substring is captured. For example given the regular expression (a*)+ and the input string "aaaa", an implementation might legitimately capture either "aaaa" or a zero length string as the content of the captured subgroup.
Parentheses that are required to group terms within the regular expression, but which are not required for capturing of substrings, can be represented using the syntax (?:xxxx).
In the absence of back-references (see below), the presence of the optional ?: has no effect on the set of strings that match the regular expression, but causes the left parenthesis not to be counted by operations (such as fn:replace and back-references) that number the capturing sub-expressions within a regular expression.
This section defines operations on the [XML Schema Part 2: Datatypes Second Edition] date and time types.
See [Working With Timezones] for a disquisition on working with date and time values with and without timezones.
[Definition: [Definition] The eight primitive types xs:dateTime, xs:date, xs:time, xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth, xs:gDay are referred to collectively as the Gregorian types.]
This section describes operations on atomic items of these types.
Values of these types are modeled as comprising one or more of the seven components year, month, day, hour, minute, second, and timezone.
The only operations defined on xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth, and xs:gDay values are equality comparison and component extraction. For other types, further operations are provided, including order comparisons, arithmetic, formatted display, and timezone adjustment.
The date and time datatypes may be considered to be composite datatypes in that they contain distinct properties or components. The extraction functions specified below extract a single component from a date or time value. In all cases the local value (that is, the original value as written, without any timezone adjustment) is used.
Note:
A time written as 24:00:00 is treated as 00:00:00 on the following day.
| Function | Meaning |
|---|---|
fn:year-from-dateTime | Returns the year component of a Gregorian value. |
fn:month-from-dateTime | Returns the month component of a Gregorian value. |
fn:day-from-dateTime | Returns the day component of a Gregorian value. |
fn:hours-from-dateTime | Returns the hours component of a Gregorian value. |
fn:minutes-from-dateTime | Returns the minute component of a Gregorian value. |
fn:seconds-from-dateTime | Returns the seconds component of a Gregorian value. |
fn:timezone-from-dateTime | Returns the timezone component of a Gregorian value. |
fn:year-from-date | Returns the year component of an xs:date. |
fn:month-from-date | Returns the month component of an xs:date. |
fn:day-from-date | Returns the day component of an xs:date. |
fn:timezone-from-date | Returns the timezone component of an xs:date. |
fn:hours-from-time | Returns the hours component of an xs:time. |
fn:minutes-from-time | Returns the minutes component of an xs:time. |
fn:seconds-from-time | Returns the seconds component of an xs:time. |
fn:timezone-from-time | Returns the timezone component of an xs:time. |
fn:parts-of-dateTime | Returns all the components of a Gregorian value. |
Returns all the components of a Gregorian value.
fn:parts-of-dateTime( | ||
$value | as | |
) as dateTime-record? | ||
This function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence, the function returns the empty sequence.
Otherwise, the function returns a record whose fields are as follows. All entries will be present, even when the value is an empty sequence.
| Key | Value |
|---|---|
year | The value of fn:year-from-dateTime($value) |
month | The value of fn:month-from-dateTime($value) |
day | The value of fn:day-from-dateTime($value) |
hours | The value of fn:hours-from-dateTime($value) |
minutes | The value of fn:minutes-from-dateTime($value) |
seconds | The value of fn:seconds-from-dateTime($value) |
timezone | The value of fn:timezone-from-dateTime($value) |
| Expression: | parts-of-dateTime(
xs:dateTime("1999-05-31T13:20:00-05:00")
) |
|---|---|
| Result: | { "year": 1999, "month": 5, "day": 31,
"hours": 13, "minutes": 20, "seconds": 0,
"timezone": xs:dayTimeDuration("-PT5H") } |
| Expression: | parts-of-dateTime(
xs:time("13:30:04.2678")
) |
| Result: | { "year": (), "month": (), "day": (),
"hours": 13, "minutes": 30, "seconds": 4.2678,
"timezone": () } |
| Expression: | parts-of-dateTime(
xs:gYearMonth("2007-05Z")
) |
| Result: | { "year": 2007, "month": 5, "day": (),
"hours": (), "minutes": (), "seconds": (),
"timezone": xs:dayTimeDuration("-PT0S") }{ "year": 2007, "month": 5, "day": (),
"hours": (), "minutes": (), "seconds": (),
"timezone": xs:dayTimeDuration("PT0S") } |
| Function | Meaning |
|---|---|
fn:format-dateTime | Returns a string containing an xs:dateTime value formatted for display. |
fn:format-date | Returns a string containing an xs:date value formatted for display. |
fn:format-time | Returns a string containing an xs:time value formatted for display. |
Three functions are provided to represent dates and times as a string, using the conventions of a selected calendar, language, and country. The functions are presented in their customary fashion, except for the rules and examples, which are described en bloc at 9.9.4 The date/time formatting functions and 9.9.5 Examples of date and time formatting.
The fn:format-dateTime, fn:format-date, and fn:format-time functions format $value as a string using the picture string specified by the $picture argument, the calendar specified by the $calendar argument, the language specified by the $language argument, and the country or other place name specified by the $place argument. The result of the function is the formatted string representation of the supplied xs:dateTime, xs:date, or xs:time value.
[Definition: [Definition] The three functions fn:format-dateTime, fn:format-date, and fn:format-time are referred to collectively as the date formatting functions.]
If $value is the empty sequence, the function returns the empty sequence.
Calling the two-argument form of each of the three functions is equivalent to calling the five-argument form with each of the last three arguments set to the empty sequence.
For details of the $language, $calendar, and $place arguments, see 9.9.4.8 The language, calendar, and place arguments.
In general, the use of an invalid $picture, $language, $calendar, or $place argument results in a dynamic error [err:FOFD1340]. By contrast, use of an option in any of these arguments that is valid but not supported by the implementation is not an error, and in these cases the implementation is required to output the value in a fallback representation. More detailed rules are given below.
In XPath 4.0, statically-known QNames can be expressed using a QName literal such as #xml:space. Where the QName is not known statically, the xs:QName constructor function can be used.
In addition to the xs:QName constructor function, QName values can be constructed by combining a namespace URI, prefix, and local name, or by resolving a lexical QName against the in-scope namespaces of an element node. This section defines functions that perform these operations. Leading and trailing whitespace, if present, is stripped from string arguments before the result is constructed.
| Function | Meaning |
|---|---|
fn:QName | Returns an xs:QName value formed using a supplied namespace URI and lexical QName. |
fn:parse-QName | Returns an xs:QName value formed by parsing an EQName. |
fn:resolve-QName | Returns an xs:QName value (that is, an expanded-QName) by taking an xs:string that has the lexical form of an xs:QName (a string in the form "prefix:local-name" or "local-name") and resolving it using the in-scope namespaces for a given element. |
Returns an xs:QName value formed by parsing an EQName.
fn:parse-QName( | ||
$value | as | |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on namespaces.
If $value is the empty sequence, the result is the empty sequence.
Otherwise, leading and trailing whitespace in $value is stripped: call the result V
If V is castable to xs:NCName, the result is fn:QName("", $value): that is, a QName in no namespace.
If V is in the lexical space of xs:QName (that is, if it is in the form prefix:local), the result is xs:QName($value). Note that this result depends on the in-scope prefixes in the static context, and may result in various error conditions.
If V takes the form of a XPath URIQualifiedNameXP (that is, Q{uri}local, where the uri part may be zero-length, or Q{uri}prefix:local), then the result is fn:QName(uri, local) or fn:QName(uri, prefix:local) respectively.
The rules used for parsing a BracedURILiteralXP within a URIQualifiedNameXP are the XPath rules, not the XQuery rules (the XQuery rules require special characters such as < and & to be escaped).
A dynamic error is raised [err:FOCA0002] if the supplied value of $value, after whitespace normalization, does not match the XPath production EQNameXP, or if the input is a URIQualifiedName in which the namespace prefix is present but the namespace URI is absent.
A dynamic error is raised [err:FONS0004] if the supplied value of $value, after whitespace normalization, is in the form prefix:local (with a non-absent prefix), and the prefix cannot be resolved to a namespace URI using the in-scope namespace bindings from the static context.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | declare namespace xmp = "http://www.example.com/ns";
fn:parse-QName("xmp:person") |
| Result: | fn:QName("http://www.example.com/ns", "xmp:person")#Q{http://www.example.com/ns}xmp:person |
This section specifies functions on QNames as defined in [XML Schema Part 2: Datatypes Second Edition].
| Function | Meaning |
|---|---|
fn:prefix-from-QName | Returns the prefix component of the supplied QName. |
fn:local-name-from-QName | Returns the local part of the supplied QName. |
fn:namespace-uri-from-QName | Returns the namespace URI part of the supplied QName. |
fn:expanded-QName | Returns a string representation of an xs:QName in the format Q{uri}local. |
Returns the local part of the supplied QName.
fn:local-name-from-QName( | ||
$value | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence the function returns the empty sequence.
Otherwise, the function returns an xs:NCName representing the local part of $value.
| Expression: | local-name-from-QName(
QName("http://www.example.com/example", "person")
)local-name-from-QName(#Q{http://www.example.com/ns}person) |
|---|---|
| Result: | "person" |
Returns the namespace URI part of the supplied QName.
fn:namespace-uri-from-QName( | ||
$value | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence the function returns the empty sequence.
Otherwise, the function returns an xs:anyURI representing the namespace URI part of $value.
If $value is in no namespace, the function returns the zero-length xs:anyURI.
| Expression: | namespace-uri-from-QName(
QName("http://www.example.com/example", "person")
)namespace-uri-from-QName(#Q{http://www.example.com/ns}person) |
|---|---|
| Result: | xs:anyURI("http://www.example.com/example")xs:anyURI("http://www.example.com/ns") |
This section specifies further functions that return properties of nodes. Nodes are formally defined in 6 Nodes DM31.
| Function | Meaning |
|---|---|
fn:has-children | Returns true if the supplied GNode has one or more child nodes (of any kind). |
fn:in-scope-namespaces | Returns the in-scope namespaces of an element node, as a map. |
fn:in-scope-prefixes | Returns the prefixes of the in-scope namespaces for an element node. |
fn:lang | This function tests whether the language of $node, or the context value if the second argument is omitted, as specified by xml:lang attributes is the same as, or is a sublanguage of, the language specified by $language. |
fn:local-name | Returns the local part of the name of $node as an xs:string that is either the zero-length string, or has the lexical form of an xs:NCName. |
fn:name | Returns the name of a node, as an xs:string that is either the zero-length string, or has the lexical form of an xs:QName. |
fn:namespace-uri | Returns the namespace URI part of the name of $node, as an xs:anyURI value. |
fn:namespace-uri-for-prefix | Returns the namespace URI of one of the in-scope namespaces for $element, identified by its namespace prefix. |
fn:path | Returns a path expression that can be used to select the supplied node relative to the root of its containing document. |
fn:root | Returns the root of the tree to which $node belongs. The function can be applied both to XNodesDM and to JNodesDM. |
fn:siblings | Returns the supplied GNode together with its siblings, in document order. |
Returns true if the supplied GNode has one or more child nodes (of any kind).
fn:has-children( | ||
$node | as | := . |
) as | ||
The zero-argument form of this function is deterministic, context-dependent, and focus-dependent.
The one-argument form of this function is deterministic, context-independent, and focus-independent.
If the argument is omitted, it defaults to the context value (.).
Provided that the supplied argument $node matches the expected type gnode()?, the result of the function call fn:has-children($node) is defined to be the same as the result of the expression fn:exists($node/child::gnode()).
The following errors may be raised when $node is omitted:
If the context value is absentDM, type error [err:XPDY0002]XP
If the context value is not an instance of the sequence type gnode()?, type error [err:XPTY0004]XP.
If $node is the empty sequence the result is false.
The motivation for this function is to support streamed evaluation. According to the streaming rules in [XSL Transformations (XSLT) Version 4.0], the following construct is not streamable:
<xsl:if test="exists(row)">
<ulist>
<xsl:for-each select="row">
<item><xsl:value-of select="."/></item>
</xsl:for-each>
</ulist>
</xsl:if>This is because it makes two downward selections to read the child row elements. The use of fn:has-children in the xsl:if conditional is intended to circumvent this restriction.
Although the function was introduced to support streaming use cases, it has general utility as a convenience function.
If the supplied argument is a map or an array, it will automatically be coerced to a JNode.
| Variables | |
|---|---|
let $e := <doc> <p id="alpha">One</p> <p/> <p>Three</p> <?pi 3.14159?> </doc> | |
| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The functions included in this section operate on function items, that is, values referring to a function.
[Definition: [Definition] Functions that accept functions among their arguments, or that return functions in their result, are described in this specification as higher-order functions.]
Note:
Some functions such as fn:parse-json allow the option of supplying a callback function for example to define exception behavior. Where this is not essential to the use of the function, the function has not been classified as higher-order for this purpose; in applications where function items cannot be created, these particular options will not be available.
| Function | Meaning |
|---|---|
fn:function-lookup | Returns a function item having a given name and arity, if there is one. |
fn:function-name | Returns the name of the function identified by a function item. |
fn:function-arity | Returns the arity of the function identified by a function item. |
fn:function-identity | Returns a string representing the identity of a function item. |
fn:function-annotations | Returns the annotations of the function item. |
Returns a function item having a given name and arity, if there is one.
fn:function-lookup( | ||
$name | as , | |
$arity | as | |
) as | ||
This function is deterministic, context-dependent, and focus-dependent.
A call to fn:function-lookup starts by looking for a function definitionXP in the named functions component of the dynamic context (specifically, the dynamic context of the call to fn:function-lookup), using the expanded QName supplied as $name and the arity supplied as $arity. There can be at most one such function definition.
If no function definition can be identified (by name and arity), then the empty sequence is returned.
If a function definition is identified, then a function item is obtained from the function definition using the same rules as for evaluation of a named function reference (see [XML Path Language (XPath) 4.0] section 4.6.5 Named Function References). The captured context of the returned function item (if it is context dependent) is the static and dynamic context of the call on fn:function-lookup.
If the arguments to fn:function-lookup identify a function that is present in the static context of the function call, the function will always return the same function that a static reference to this function would bind to. If there is no such function in the static context, then the results depend on what is present in the dynamic context, which is implementation-defined.
An error is raised if the identified function depends on components of the static or dynamic context that are not present, or that have unsuitable values. For example [err:XPDY0002]XP is raised for the call function-lookup( #fn:name, 0 ) if the context value is absent, and [err:FODC0001] is raised for the call function-lookup( #fn:id, 1 ) if the context value is not a single node in a tree that is rooted at a document node. The error that is raised is the same as the error that would be raised by the corresponding function if called with the same static and dynamic context.
This function can be useful where there is a need to make a dynamic decision on which of several statically known functions to call. It can thus be used as a substitute for polymorphism, in the case where the application has been designed so several functions implement the same interface.
The function can also be useful in cases where a query or stylesheet module is written to work with alternative versions of a library module. In such cases the author of the main module might wish to test whether an imported library module contains or does not contain a particular function, and to call a function in that module only if it is available in the version that was imported. A static call would cause a static error if the function is not available, whereas getting the function using fn:function-lookup allows the caller to take fallback action in this situation.
If the function that is retrieved by fn:function-lookup is context-dependent, that is, if it has dependencies on the static or dynamic context of its caller, the context that applies is the static and/or dynamic context of the call to the fn:function-lookup function itself. The context thus effectively forms part of the closure of the returned function. This mainly applies when the target of fn:function-lookup is a built-in function, because user-defined functions typically have no dependency on the static or dynamic context of the function call (an exception arises when the expressions used to define default values for parameters are context-dependent). The rule applies recursively, since fn:function-lookup is itself a context-dependent built-in function.
However, the static and dynamic context of the call to fn:function-lookup may play a role even when the selected function definition is not itself context dependent, if the expressions used to establish default parameter values are context dependent.
User-defined XSLT or XQuery functions should be accessible to fn:function-lookup only if they are statically visible at the location where the call to fn:function-lookup appears. This means that private functions, if they are not statically visible in the containing module, should not be accessible using fn:function-lookup.
The function identity is determined in the same way as for a named function reference. Specifically, if there is no context dependency, two calls on fn:function-lookup with the same name and arity must return the same function.
These specifications do not define any circumstances in which the dynamic context will contain functions that are not present in the static context, but neither do they rule this out. For example an API may provide the ability to add functions to the dynamic context, and such functions may potentially be context-dependent.
The mere fact that a function exists and has a name does not of itself mean that the function is present in the dynamic context. For example, functions obtained through use of the fn:load-xquery-module function are not added to the dynamic context.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: | (fn:function-lookup( #xs:dateTimeStamp, 1 ),
xs:dateTime#1)[1] ('2011-11-11T11:11:11Z') |
| Result: | An |
| Expression: | let $f := function-lookup( #zip:binary-entry, 2 )
return if (exists($f)) then $f($source, $entry) else ()declare namespace zip = "http://expath.org/ns/zip";
let $f := function-lookup( #zip:binary-entry, 2 )
return if (exists($f)) then $f("file:///temp.zip", "index.xml") else () |
| Result: | The result of
calling zip:binary-entry($source, $entry) if the function is available, or
the empty sequence otherwise.The result of calling |
Returns the name of the function identified by a function item.
fn:function-name( | ||
$function | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $function refers to a named function, fn:function-name($func) returns the name of that function.
Otherwise ($function refers to an anonymous function), fn:function-name($function) returns the empty sequence.
The prefix part of the returned QName is implementation-dependent.
| Expression: |
|
|---|---|
| Result: | QName("http://www.w3.org/2005/xpath-functions", "fn:substring")#Q{http://www.w3.org/2005/xpath-functions}substring(The namespace prefix of the returned QName is not predictable.) |
| Expression: |
|
| Result: |
|
Maps were introduced as a new datatype in XDM 3.1. This section describes functions that operate on maps.
A map is a kind of item.
[Definition: [Definition] A map consists of a sequence of entries, also known as key-value pairs. Each entry comprises a key which is an arbitrary atomic item, and an arbitrary sequence called the associated value.]
[Definition: [Definition] Within a map, no two entries have the same key. Two atomic items K1 and K2 are the same key for this purpose if the function call fn:atomic-equal($K1, $K2) returns true.]
It is not necessary that all the keys in a map should be of the same type (for example, they can include a mixture of integers and strings).
Maps are immutable, and have no identity separate from their content. For example, the map:remove function returns a map that differs from the supplied map by the omission (typically) of one entry, but the supplied map is not changed by the operation. Two calls on map:remove with the same arguments return maps that are indistinguishable from each other; there is no way of asking whether these are “the same map”.
A map can also be viewed as a function from keys to associated values. To achieve this, a map is also a function item. The function corresponding to the map has the signature function($key as xs:anyAtomicValue) as item()*. Calling the function has the same effect as calling the map:get function: the expression $map($key) returns the same result as get($map, $key). For example, if $books-by-isbn is a map whose keys are ISBNs and whose assocated values are book elements, then the expression $books-by-isbn("0470192747") returns the book element with the given ISBN. The fact that a map is a function item allows it to be passed as an argument to higher-order functions that expect a function item as one of their arguments.
It is often useful to decompose a map into a sequence of entries, or key-value pairs (in which the key is an atomic item and the value is an arbitrary sequence). Subsequently it may be necessary to reconstruct a map from these components, typically after modification.
There are two conventional ways of representing a map as a sequence of key-value pairs, each with its own advantages and disadvantages. These are described below:
A map can be represented as a sequence of single-entry maps.
[Definition: [Definition] A single-entry map is a map containing a single entry.]
It is possible to decompose any map into a sequence of single-entry maps, and to construct a map from a sequence of single-entry maps.
For example the map { "x": 1, "y": 2 } can be decomposed to the sequence ({ "x": 1 }, { "y": 2 }).
A map can be represented as a sequence of JNodes.
A JNode holds the map key in its ·selector· property and the corresponding value in its ·content· property.
The following table summarizes the way in which these two representations can be used to compose and decompose maps:
| Operation | Single-Entry Maps | JNodes |
|---|---|---|
Decompose a map |
|
|
Compose a map |
|
|
Create a single entry |
|
|
Extract the key part of a single entry |
|
|
Extract the value part of a single entry |
|
|
It is also possible to decompose a map using:
The function map:for-each
The expression for key $k value $v in $map return ....
The examples below show several ways of constructing a map with the same entries as an input map, but with the entries sorted by key.
Using map:entries and map:merge:
map:entries($map) => sort-by({'key': map:keys#1}) => map:merge()Using JNodes:
$map/* => sort-by({'key': jnode-selector#1}) => map:build(jnode-selector#1, jnode-content#1)Using map:for-each:
map:merge( map:for-each($map, map:entry#2) => sort-by({'key': map:keys#1}) )Using an XQuery FLWOR expression:
map:merge( for key $k value $v order by $k return {$k : $v} )The functions defined in this section use a conventional namespace prefix map, which is assumed to be bound to the namespace URI http://www.w3.org/2005/xpath-functions/map.
The function call map:get($map, $key) can be used to retrieve the value associated with a given key.
There is no operation to atomize a map or convert it to a string. The function fn:serialize can in some cases be used to produce a JSON representation of a map.
Note that when the required type of an argument to a function such as map:build is a map type, then the coercion rules ensure that a JNode can be supplied in the function call: if the ·content· property of the JNode is a map, then the map is automatically extracted as if by the jnode-content function.
| Function | Meaning |
|---|---|
map:build | Returns a map that typically contains one entry for each item in a supplied input sequence. |
map:contains | Tests whether a supplied map contains an entry for a given key. |
map:empty | Returns true if the supplied map contains no entries. |
map:entries | Returns a sequence containing all the key-value pairs present in a map, each represented as a single-entry map. |
map:entry | Returns a single-entry map that represents a single key-value pair. |
map:filter | Selects entries from a map, returning a new map. |
map:find | Searches the supplied input sequence and any contained maps and arrays for a map entry with the supplied key, and returns the corresponding values. |
map:for-each | Applies a supplied function to every entry in a map, returning the sequence concatenationXP of the results. |
map:get | Returns the value associated with a supplied key in a given map. |
map:items | Returns a sequence containing all the values present in a map, in order. |
map:keys | Returns a sequence containing all the keys present in a map. |
map:keys-where | Returns a sequence containing selected keys present in a map. |
map:merge | Returns a map that combines the entries from a number of existing maps. |
map:put | Returns a map containing all the contents of the supplied map, but with an additional entry, which replaces any existing entry for the same key. |
map:remove | Returns a map containing all the entries from a supplied map, except those having a specified key. |
map:size | Returns the number of entries in the supplied map. |
Returns a map that typically contains one entry for each item in a supplied input sequence.
map:build( | ||
$input | as , | |
$key | as | := fn:identity#1, |
$value | as | := fn:identity#1, |
$options | as | := {} |
) as | ||
This function is deterministic, context-independent, and focus-independent.
Informally, the function processes each item in $input in order. It calls the $key function on that item to obtain a sequence of key values, and the $value function to obtain an associated value. Then, for each key value:
If the key is not already present in the target map, the processor adds a new key-value pair to the map, with that key and that value.
If the key is already present, the processor combines the new value for the key with the existing value; the way they are combined is determined by the duplicates option.
By default, when two duplicate entries occur:
A single combined entry will be present in the result.
This entry will contain the sequence concatenationXP of the supplied values.
The position of the combined entry in the entry orderDM of the result map will correspond to the position of the first of the duplicates.
The key of the combined entry will correspond to the key of one of the duplicates: it is implementation-dependent which one is chosen. (It is possible for two keys to be considered duplicates even if they differ: for example, they may have different type annotations, or they may be xs:dateTime values in different timezones.)
The $options argument can be used to control the way in which duplicate keys are handled. The option parameter conventions apply. The entries that may appear in the $options map are as follows:
record( | |
duplicates? | as (enum( "reject", "use-first", "use-last", "use-any", "combine") | fn(item()*, item()*) as item()*)? |
) | |
| Key | Value | Meaning |
|---|---|---|
| Determines the policy for handling duplicate keys: specifically, the action to be taken if two entries in the input sequence have key values K1 and K2 where K1 and K2 are the same key.
| |
"reject" | Equivalent to supplying a function that raises a dynamic error with error code "FOJS0003". The effect is that duplicate keys result in an error. | |
"use-first" | Equivalent to supplying the function fn($a, $b){ $a }. The effect is that the first of the duplicates is chosen. | |
"use-last" | Equivalent to supplying the function fn($a, $b){ $b }. The effect is that the last of the duplicates is chosen. | |
"use-any" | Equivalent to supplying the function fn($a, $b){ one-of($a, $b) } where one-of chooses either $a or $b in an implementation-dependent way. The effect is that it is implementation-dependent which of the duplicates is chosen. | |
"combine" | Equivalent to supplying the function fn($a, $b){ $a, $b } (or equivalently, the function op(",")). The effect is that the result contains the sequence concatenationXP of the values having the same key, retaining order. | |
function(*) | A function with signature fn(item()*, item()*) as item()*. The function is called for any entry in the input sequence that has the same key as a previous entry. The first argument is the existing value associated with the key; the second argument is the value associated with the key in the duplicate input entry, and the result is the new value to be associated with the key. The effect is cumulative: for example if there are three values X, Y, and Z associated with the same key, and the supplied function is F, then the result is an entry whose value is X => F(Y) => F(Z). | |
The effect of the function is equivalent to the result of the following XPath expression.
for-each(
$input,
fn($item, $pos) {
for-each($key($item, $pos), fn($k) {
map:entry($k, $value($item, $pos))
}
)}
)
=> map:merge($options)An error is raised [err:FOJS0003] if the value of $options indicates that duplicates are to be rejected, and a duplicate key is encountered.
An error is raised [err:FOJS0005] if the value of $options includes an entry whose key is defined in this specification, and whose value is not a permitted value for that key.
The default function for both $key and $value is the identity function. Although it is permitted to default both, this serves little purpose: usually at least one of these arguments will be supplied.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
(Returns a map with one entry for each distinct value of |
| Expression: | map:build( 1 to 5, value := format-integer(?, "w") ) |
| Result: | { 1: "one", 2: "two", 3: "three", 4: "four", 5: "five" }(Returns a map with five entries. The function to compute the key is an identity function, the function to compute the value invokes |
| Expression: | map:build(
("January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"),
substring(?, 1, 1)
) |
| Result: | {
"A": ("April", "August"),
"D": ("December"),
"F": ("February"),
"J": ("January", "June", "July"),
"M": ("March", "May"),
"N": ("November"),
"O": ("October"),
"S": ("September")
} |
| Expression: | map:build(1 to 5, {
1: ("eins", "one"),
4: ("vier", "four")
}) |
| Result: | {
"eins": 1,
"one": 1,
"vier": 4,
"four": 4
} |
| Expression: | map:build(
("apple", "apricot", "banana", "blueberry", "cherry"),
substring(?, 1, 1),
string-length#1,
{ "duplicates": op("+") }
) |
| Result: | { "a": 12, "b": 15, "c": 6 }(Constructs a map where the key is the first character of an input item, and where the corresponding value is the total string-length of the items starting with that character.) |
| Expression: | map:build(
('Wang', 'Liu', 'Zhao'),
key := fn($name, $pos) { $name },
value := fn($name, $pos) { $pos }
) |
| Result: | { "Wang": 1, "Liu": 2, "Zhao": 3 }(Returns an inverted index for the input sequence with the string stored as key and the position stored as value.) |
| Expression: | let $titles := <titles>
<title>A Beginner’s Guide to <ix>Java</ix></title>
<title>Learning <ix>XML</ix></title>
<title>Using <ix>XML</ix> with <ix>Java</ix></title>
</titles>
return map:build($titles/title, fn($title) { $title/ix }) |
| Result: | {
"Java": (
<title>A Beginner’s Guide to <ix>Java</ix></title>,
<title>Using <ix>XML</ix> with <ix>Java</ix></title>
),
"XML": (
<title>Learning <ix>XML</ix></title>,
<title>Using <ix>XML</ix> with <ix>Java</ix></title>
)
} |
| Expression: | map:build(//employee, fn { @ssn }) |
| Result: | A map whose keys are employee |
| Expression: | map:build(//employee, fn { @location }, fn { 1 }, { "duplicates": op("+") }) |
| Result: | A map whose keys are employee |
| Expression: | map:build(
//employee,
key := fn { @location },
combine := fn($a, $b) { highest(($a, $b), fn { xs:decimal(@salary) }) }
)map:build(
//employee,
key := fn { @location },
options := {"duplicates" : fn($a, $b) { highest(($a, $b), (), fn { xs:decimal(@salary) }) } }
) |
| Result: | A map whose keys are employee |
| Expression: | map:build(//*, generate-id#1) |
| Result: | A map allowing efficient access to every element in a document by means of its |
| Expression: | let $tree := parse-json('{
"type": "package",
"name": "org",
"content": [
{ "type": "package",
"name": "xml,
"content: [
{ "type": "package",
"name": "sax",
"content": [
{ "type": "class",
"name": "Attributes"},
{ "type": "class",
"name": "ContentHandler"},
{ "type": "class",
"name": "XMLReader"}
]
}]
}]
}')
return map:build($tree ? descendant::~[record(type, name, content?)],
fn{?ancestor-or-self::name => reverse() => string-join(,)},
fn{`{?type} {?name}`})let $tree := parse-json('{
"type": "package",
"name": "org",
"content": [
{ "type": "package",
"name": "xml",
"content": [
{ "type": "package",
"name": "sax",
"content": [
{ "type": "class",
"name": "Attributes"},
{ "type": "class",
"name": "ContentHandler"},
{ "type": "class",
"name": "XMLReader"}
]
}]
}]
}')
return map:build($tree/descendant-or-self::jnode(*, record(type, name, content?)),
fn{./ancestor-or-self::jnode()?name => string-join(".")},
fn{`{?type} {?name}`}) |
| Result: | { "org.xml.sax.Attributes": "class Attributes",
"org.xml.sax.ContentHandler": "class ContentHandler",
"org.xml.sax.XMLReader": "class XMLReader" }{ "org":"package org",
"org.xml":"package xml",
"org.xml.sax":"package sax",
"org.xml.sax.Attributes": "class Attributes",
"org.xml.sax.ContentHandler": "class ContentHandler",
"org.xml.sax.XMLReader": "class XMLReader" }(Constructs a map allowing efficient access to values in a recursive JSON structure using hierarchic paths). |
Arrays were introduced as a new datatype in XDM 3.1. This section describes functions that operate on arrays.
An array is an additional kind of item. An array of size N is a mapping from the integers (1 to N) to a set of values, called the members of the array, each of which is an arbitrary sequence. Because an array is an item, and therefore a sequence, arrays can be nested.
An array acts as a function from integer positions to associated values, so the function call $array($index) can be used to retrieve the array member at a given position. The function corresponding to the array has the signature function($index as xs:integer) as item()*. The fact that an array is a function item allows it to be passed as an argument to higher-order functions that expect a function item as one of their arguments.
The functions defined in this section use a conventional namespace prefix array, which is assumed to be bound to the namespace URI http://www.w3.org/2005/xpath-functions/array.
As with all other values, arrays are treated as immutable. For example, the array:reverse function returns an array that differs from the supplied array in the order of its members, but the supplied array is not changed by the operation. Two calls on array:reverse with the same argument will return arrays that are indistinguishable from each other; there is no way of asking whether these are “the same array”. Like sequences, arrays have no identity.
All functionality on arrays is defined in terms of two primitives:
The function array:members decomposes an array to a sequence of value records.
The function array:of-members composes an array from a sequence of value records.
A value record here is an item that encapsulates an arbitrary value; the representation chosen for a value record is record(value as item()*), that is, a map containing a single entry whose key is the string "value" and whose value is the encapsulated sequence.
Note that when the required type of an argument to a function such as array:build is an array type, then the coercion rules ensure that a JNode can be supplied in the function call: if the ·content· property of the JNode is an array, then the array is automatically extracted as if by the jnode-content function.
| Function | Meaning |
|---|---|
array:append | Returns an array containing all the members of a supplied array, plus one additional member at the end. |
array:build | Returns an array obtained by evaluating the supplied function once for each item in the input sequence. |
array:empty | Returns true if the supplied array contains no members. |
array:filter | Returns an array containing those members of the $array for which $predicate returns true. A return value of () is treated as false. |
array:flatten | Replaces any array appearing in a supplied sequence with the members of the array, recursively. |
array:fold-left | Evaluates the supplied function cumulatively on successive members of the supplied array. |
array:fold-right | Evaluates the supplied function cumulatively on successive values of the supplied array. |
array:foot | Returns the last member of an array. |
array:for-each | Returns an array whose size is the same as array:size($array), in which each member is computed by applying $action to the corresponding member of $array. |
array:for-each-pair | Returns an array obtained by evaluating the supplied function once for each pair of members at the same position in the two supplied arrays. |
array:get | Returns the value at the specified position in the supplied array (counting from 1). |
array:head | Returns the first member of an array, that is $array(1). |
array:index-of | Returns a sequence of positive integers giving the positions within the array $array of members that are equal to $target. |
array:index-where | Returns the positions in an input array of members that match a supplied predicate. |
array:insert-before | Returns an array containing all the members of the supplied array, with one additional member at a specified position. |
array:items | Returns the sequence concatenation of the members of an array. |
array:join | Concatenates the contents of several arrays into a single array. |
array:members | Delivers the contents of an array as a sequence of value records. |
array:of-members | Constructs an array from the contents of a sequence of value records. |
array:put | Returns an array containing all the members of a supplied array, except for one member which is replaced with a new value. |
array:remove | Returns an array containing all the members of the supplied array, except for the members at specified positions. |
array:reverse | Returns an array containing all the members of a supplied array, but in reverse order. |
array:size | Returns the number of members in the supplied array. |
array:slice | Returns an array containing selected members of a supplied input array based on their position. |
array:sort | Sorts a supplied array, based on the value of a sort key supplied as a function. |
array:sort-by | Sorts a supplied array, based on the value of a number of sort keys supplied as functions. |
array:sort-with | Sorts a supplied array, according to the order induced by the supplied comparator functions. |
array:split | Delivers the contents of an array as a sequence of single-member arrays. |
array:subarray | Returns an array containing all members from a supplied array starting at a supplied position, up to a specified length. |
array:tail | Returns an array containing all members except the first from a supplied array. |
array:trunk | Returns an array containing all members except the last from a supplied array. |
Sorts a supplied array, based on the value of a number of sort keys supplied as functions.
array:sort-by( | ||
$array | as , | |
$keys | as | |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on collations.
The result of the function is an array that contains the same members as $array, typically in a different order, the order being defined by the supplied sort key definitions.
A sort key definition is a record with three parts:
key: A sort key function, which is applied to each member of the input sequence to determine a sort key value. If no function is supplied, the default is fn:data#1, which atomizes the value of the array member.
collation: A collation, which is used when comparing sort key values that are of type xs:string or xs:untypedAtomic. If no collation is supplied, the default collation from the static context is used.
When comparing values of types other than xs:string or xs:untypedAtomic, the collation is ignored (but an error may be reported if it is invalid). For more information see 5.3.7 Choosing a collation.
order: An order direction, either "ascending" or "descending". The default is "ascending".
The number of sort key definitions is determined by the number of records supplied in the $keys argument. If the argument is absent or empty, the default is a single sort key definition using the function data#1, using the default collation from the static context, and with order ascending.
The result of the array:sort-by function is obtained as follows:
The result array contains the same members as $array, but generally in a different order.
The sort key definitions are established as described above. The sort key definitions are in major-to-minor order. That is, the position of two values $A and $B in the result sequence is determined first by the relative magnitude of their primary sort key values, which are computed by evaluating the sort key function in the first sort key definition. If those two sort key values are equal, then the position is determined by the relative magnitude of their secondary sort key values, computed by evaluating the sort key function in the second sort key definition, and so on.
When a pair of corresponding sort key values of $A and $B are found to be not equal, then $A precedes $B in the result sequence if both the following conditions are true, or if both conditions are false:
The sort key value for $A is less than the sort key value for $B, as defined below.
The order direction in the corresponding sort key definition is "ascending".
If all the sort key values for $A and $B are pairwise equal, then $A precedes $B in the result sequence if and only if $A precedes $B in the input sequence.
Note:
That is, the sort is stable.
Each sort key value for a given array member is obtained by applying the sort key function of the corresponding sort key definition to that member. The result of this function is in the general case a sequence of atomic items. Two sort key values $a and $b are compared as follows:
Let $C be the collation in the corresponding sort key definition.
Let $REL be the result of evaluating op:lexicographic-compare($key($A), $key($B), $C) where op:lexicographic-compare($a, $b, $C) is defined as follows:
if (empty($a) and empty($b)) then 0
else if (empty($a)) then -1
else if (empty($b)) then +1
else let $rel = op:simple-compare(head($a), head($b), $C)
return if ($rel eq 0)
then op:lexicographic-compare(tail($a), tail($b), $C)
else $relHere op:simple-compare($k1, $k2) is defined as follows:
if ($k1 instance of (xs:string | xs:anyURI | xs:untypedAtomic)
and $k2 instance of (xs:string | xs:anyURI | xs:untypedAtomic))
then compare($k1, $k2, $C)
else if ($k1 instance of xs:numeric and $k2 instance of xs:numeric)
then compare($k1, $k2)
else if ($k1 eq $k2) then 0
else if ($k2 lt $k2) then -1
else +1Note:
This raises an error if two keys are not comparable, for example if one is a string and the other is a number, or if both belong to a non-ordered type such as xs:QName.
If $REL is zero, then the two sort key values are deemed equal; if $REL is -1 then $a is deemed less than $b, and if $REL is +1 then $a is deemed greater than $b
The effect of the function is equivalent to the result of the following XPath expression.
$array
=> array:members()
=> fn:sort-by(
for $key-spec in ($keys otherwise {})
return map:put{$key-spec, 'key', fn($member as record(value)) as xs:anyAtomicType* {
map:get($key-spec, 'key', fn:data#1)(map:get($member, 'value'))
}
)
=> array:of-members()$array
=> array:members()
=> fn:sort-by(
for $key-spec in ($keys otherwise {})
return map:put($key-spec, 'key', fn($member as record(value)) as xs:anyAtomicType* {
map:get($key-spec, 'key', fn:data#1)(map:get($member, 'value'))
}
)
=> array:of-members()If the set of computed sort keys contains values that are not comparable using the lt operator then the sort operation will fail with a type error ([err:XPTY0004]XP).
The function is a generalization of the array:sort function available in 3.1, which is retained for compatibility. The enhancements allow multiple sort keys to be defined, each potentially with a different collation, and allow sorting in descending order.
If the sort key for an item evaluates to the empty sequence, the effect of the rules is that this item precedes any value for which the key is non-empty. This is equivalent to the effect of the XQuery option empty least. The effect of the option empty greatest can be achieved by adding an extra sort key definition with {'key': fn{empty(K(.)}}: when comparing boolean sort keys, false precedes true.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
A JNodeDM is a wrapper around a map or array, or around a value that appears within the content of a map or array. JNodes are described at [XQuery and XPath Data Model (XDM) 4.0] section 8.4 JNodes. Wrapping a map or array in a JNode enables the use of path expressions such as $jnode/descendant::title, as described at [XML Path Language (XPath) 4.0] section 4.7 Path Expressions.
In addition to the functions defined in this section, functions that operate on JNodes include:
fn:distinct-ordered-nodesfn:generate-idfn:has-childrenfn:innermostfn:outermostfn:pathfn:rootfn:siblingsfn:transitive-closure
Delivers a root JNodeDM wrapping a map or array, enabling the use of lookup expression to navigate a JTreeDM rooted at that map or array.
fn:jtree( | ||
$input | as | |
) as | ||
This function is nondeterministic, context-independent, and focus-independent.
The function creates a JNodeDM that wraps the supplied map or array. Specifically, it creates a root JNode whose ·content· property is $input, and whose ·parent·, ·position·, and ·selector· properties are absent.
This has the effect that lookup expressions starting from this JNode retain information for subsequent navigation.
A JNode has unique identity. If two maps or arrays M1 and M2 have the same function identity, as determined by the function-identity function, then jtree(M1) is jtree(M2)must return true: that is, the same JNode must be delivered for both.
It is to some extent implementation-defined whether two maps or arrays have the same function identity. Processors should ensure as a minimum that when a variable $m is bound to a map or array, calling jtree($m) more than once (with the same variable reference) will deliver the same JNode each time.
The effect of the coercion rules is technically that if an existing JNode is supplied as $input, the wrapped value will be extracted, and then rewrapped as a JNode: in practice, this can be short-circuited by returning the supplied JNode unchanged.
Although fn:jnode is available as a function for user applications to call explicitly, it is also invoked implicitly by some expressions, notably when a path expression is written in a form such as $map/child::*. Specifically, if the left-hand operand of the / operator is a map or array, then the supplied map or array is implicitly wrapped in a JNode.
The effect of applying fn:jnode to a map or array is that subsequent retrieval operations within the wrapped map or array return results that retain useful information about where the results were found. For example, consider an expression such as json-doc($source)//name. This expression returns a set of JNodes representing all entries in the JTree having the key "name"; each of these JNodes contains not only the value of the relevant "name" entry, but also the key (which in this simple example is always "name" and the containing map. This means, for example, if $result is the result of the expression json-doc($source) // name, then:
$result / .. / ssn locates the map that contained each name, and returns the value of the ssn entry in that map.
$result / ancestor::course returns any course entries in containing maps.
$result / ancestor::* => jnode-selector() returns a sequence of map keys and array index values representing the location of the found entries within the JSON structure.
An alternative way of wrapping a map or array, rather than calling jtree($X), is to use the path expression $X/..
There are two situations where a map or array is implicitly wrapped in a JNode:
When the value of the left-hand operand of the / operator includes a map or array;
When the context value for evaluation of an AxisStep includes a map or array.
| Expression: |
|
|---|---|
| Result: |
(The call on |
| Expression: |
|
| Result: |
|
| Expression: | let $data := {
"fr": { "capital": "Paris", "languages": [ "French" ] },
"de": { "capital": "Berlin", "languages": [ "German" ] }
}
return jtree($data)//languages[. = 'German']/../capital =!> string() |
| Result: | "Berlin" |
Returns the ·position· property of a JNode.
fn:jnode-position( | ||
$input | as | := . |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If the argument is omitted, it defaults to the context value (.).
If $input is the empty sequence, the function returns the empty sequence.
If $input is a root JNode (one in which the ·position· property is absent), the function returns the empty sequence.
Otherwise, the function returns the ·position· property of $input. The value of this property will be 1 (one) except in cases where the value of an entry in a map, or a member in an array, is a sequence that contains multiple items including maps and/or arrays; in such cases the position will be the 1-based position of the relevant map or array.
The following errors may be raised when $node is omitted:
If the context value is absentDM, type error [err:XPDY0002]XP.
If the context value is not an instance of the sequence type jnode()?, type error [err:XPTY0004]XP.
This function is relevant only when there are maps whose entries are multi-item sequences that include maps and arrays, or arrays whose members include such multi-item sequences. Such structures are uncommon, and never arise from parsing of JSON source text. It is generally best to avoid such structures by using arrays rather than sequences within array and map content; apart from other considerations, this allows the data to be serialized in JSON format.
If an entry within a map, or a member of an array, contains a sequence of items that mixes arrays and maps with other content (for example the array [1, 2, ([1,2], [3,4], 5)), then a lookup using the child axis will only construct JNodes in respect of those items that are non-empty maps or arrays. This may leave gaps in the position numbering sequence, as illustrated in the examples below.
| Expression: | let $input := {
"a": [10, 20, 30],
"b": ([40, 50, 60], [], 0, [70, 80, (90, 100)])
}
return $input / child::b / *
! { "position": jnode-position(),
"index": jnode-selector(),
"value": jnode-content()
} |
|---|---|
| Result: | { "position": 1, "index": 1, "value": 40 },
{ "position": 1, "index": 2, "value": 50 },
{ "position": 1, "index": 3, "value": 60 },
{ "position": 4, "index": 1, "value": 70 },
{ "position": 4, "index": 2, "value": 80 },
{ "position": 4, "index": 3, "value": (90, 100) } |
| Expression: | let $input := {
"a": {"x": 10, "y": 20, "z": 30},
"b": ( {"x": 40, "y": 50, "z": 60},
{},
{"x": 70, "y": 80, "z": (90, 100)})
}
return $input / child::b / *
! { "position": jnode-position(),
"key": jnode-selector(),
"value": jnode-content()
} |
| Result: | { "position": 1, "key": "x", "value": 40 },
{ "position": 1, "key": "y", "value": 50 },
{ "position": 1, "key": "z", "value": 60 },
{ "position": 3, "key": "x", "value": 70 },
{ "position": 3, "key": "y", "value": 80 },
{ "position": 3, "key": "z", "value": (90, 100) } |
These functions in this section access resources external to a query or stylesheet, and convert between external file formats and their XPath and XQuery data model representation.
This section describes functions that parse CSV data.
[Definition: [Definition] The term comma separated values or CSV refers to a wide variety of plain-text tabular data formats with fields and records separated by standard character delimiters (often, but not invariably, commas).]
A CSV is a 2-dimensional tabular data structure consisting of multiple rows (also known as records). Each row contains multiple fields. Fields occupying the same position in successive rows constitute a column. Columns are identified by position and optionally by name. Column names can be assigned within a CSV using an optional header row.
CSV has developed informally for decades, and many variations are found. This specification refers to [RFC 4180], which provides a standardized grammar. This specification extends the grammar defined in [RFC 4180] as follows:
This specification uses the term row where RFC 4180 uses record.
Line endings are normalized: specifically, the character sequences U+000D (CARRIAGE RETURN) , or U+000D (CARRIAGE RETURN) followed by U+000A (NEWLINE) , are converted to a single U+000A (NEWLINE) character. This applies whether or not the line ending appears within a quoted string, and whether or not U+000A (NEWLINE) is the chosen row delimiter.
Row delimiters other than newline are recognized.
Field delimiters other than U+002C (COMMA, ,) are recognized.
Quote characters other than U+0022 (QUOTATION MARK, ") are recognized.
Non-ASCII characters are recognized.
This specification defines a mapping from this extended grammar to constructs in the XDM model, and provides illustrative examples of how these constructs can be combined with other language features to process CSV data.
| Function | Meaning |
|---|---|
fn:csv-to-arrays | Parses CSV data supplied as a string, returning the results in the form of a sequence of arrays of strings. |
fn:parse-csv | Parses CSV data, returning the results in the form of a record containing information about the names in the header, as well as the data itself. |
fn:csv-doc | Reads an external resource containing CSV, and returns the results as a record containing information about the names in the header, as well as the data itself. |
fn:csv-to-xml | Parses CSV data supplied as a string, returning the results as an XML document, as described by 17.5.9 Representing CSV data as XML. |
The most basic function for parsing CSV is fn:csv-to-arrays which recognizes the delimiters for rows and fields and returns a sequence of arrays each corresponding to one row. The fields within each array are represented as instances of xs:string.
The other two functions recognize column names, and make it easier to address individual fields using these names. The parse-csv function delivers this capability using XDM maps and functions, while csv-to-xml function represents the information using XDM element nodes.
Provides an execution trace intended to be used in debugging queries.
fn:trace( | ||
$input | as , | |
$label | as | := () |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function returns $input, unchanged.
In addition, the values of $input, typically serialized and converted to an xs:string, and $label (if supplied and non-empty) may be output to an implementation-defined destination.
Any serialization of the implementation’s trace output must not raise an error. This can be achieved (for example) by using a serialization method that can handle arbitrary input, such as the adaptive output method (see 10 Adaptive Output Method SER31).
The format of the trace output and its order are implementation-dependent. Therefore, the order in which the output appears is not predictable. This also means that if dynamic errors occur (whether or not they are caught using try/catch), it may be unpredictable whether any output is reported before the error occurs.
If the trace information is unrelated to a specific value, fn:message can be used instead.
| Expression: | fn:trace($v, 'the value of $v is: ')let $v := 124.84
return fn:trace($v, 'the value of $v is: ') |
|---|---|
| Result: | Supposing thatThe function $vreturns is anthe |
| Expression: | //book[xs:decimal(@price) gt 100]
=> trace('books more expensive than €100:') |
| Result: | The result of the expression is the same as the result of |
It is implementation-defined which version of Unicode is supported, but it is recommended that the most recent version of Unicode be used. (See Conformance.)
It is implementation-defined whether the type system is based on XML Schema 1.0 or XML Schema 1.1. (See Conformance.)
It is implementation-defined whether definitions that rely on XML (for example, the set of valid XML characters) should use the definitions in XML 1.0 or XML 1.1. (See Conformance.)
Implementations may attach an implementation-defined meaning to options in the map that are not described in this specification. These options should use values of type xs:QName as the option names, using an appropriate namespace. (See Options.)
It is implementation-defined which version of [The Unicode Standard] is supported, but it is recommended that the most recent version of Unicode be used. (See Strings, characters, and codepoints.)
[Definition] [Definition: Some functions (such as fn:in-scope-prefixes, fn:load-xquery-module, and fn:unordered) produce result sequences or result maps in an implementation-defined or implementation-dependent order. In such cases two calls with the same arguments are not guaranteed to produce the results in the same order. These functions are said to be nondeterministic with respect to ordering.] (See Properties of functions.)
Where the results of a function are described as being (to a greater or lesser extent) implementation-defined or implementation-dependent, this does not by itself remove the requirement that the results should be deterministic: that is, that repeated calls with the same explicit and implicit arguments must return identical results. (See Properties of functions.)
They may provide an implementation-defined mechanism that allows users to choose between raising an error and returning a result that is modulo the largest representable integer value. See [ISO 10967]. (See Arithmetic operators on numeric values.)
For xs:decimal values, let N be the number of digits of precision supported by the implementation, and let M (M <= N) be the minimum limit on the number of digits required for conformance (18 digits for XSD 1.0, 16 digits for XSD 1.1). Then for addition, subtraction, and multiplication operations, the returned result should be accurate to N digits of precision, and for division and modulus operations, the returned result should be accurate to at least M digits of precision. The actual precision is implementation-defined. If the number of digits in the mathematical result exceeds the number of digits that the implementation retains for that operation, the result is truncated or rounded in an implementation-defined manner. (See Arithmetic operators on numeric values.)
The [IEEE 754-2019] specification also describes handling of two exception conditions called divideByZero and invalidOperation. The IEEE divideByZero exception is raised not only by a direct attempt to divide by zero, but also by operations such as log(0). The IEEE invalidOperation exception is raised by attempts to call a function with an argument that is outside the function’s domain (for example, sqrt(-1) or log(-1)). Although IEEE defines these as exceptions, it also defines “default non-stop exception handling” in which the operation returns a defined result, typically positive or negative infinity, or NaN. With this function library, these IEEE exceptions do not cause a dynamic error at the application level; rather they result in the relevant function or operator returning the defined non-error result. The underlying IEEE exception may be notified to the application or to the user by some implementation-defined warning condition, but the observable effect on an application using the functions and operators defined in this specification is simply to return the defined result (typically -INF, +INF, or NaN) with no error. (See Arithmetic operators on numeric values.)
The [IEEE 754-2019] specification distinguishes two NaN values: a quiet NaN and a signaling NaN. These two values are not distinguishable in the XDM model: the value spaces of xs:float and xs:double each include only a single NaN value. This does not prevent the implementation distinguishing them internally, and triggering different implementation-defined warning conditions, but such distinctions do not affect the observable behavior of an application using the functions and operators defined in this specification. (See Arithmetic operators on numeric values.)
The implementation may adopt a different algorithm provided that it is equivalent to this formulation in all cases where implementation-dependent or implementation-defined behavior does not affect the outcome, for example, the implementation-defined precision of the result of xs:decimal division. (See op:numeric-integer-divide.)
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation. (See fn:divide-decimals.)
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation. (See fn:round.)
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation. (See fn:round-half-to-even.)
XSD 1.1 allows the string +INF as a representation of positive infinity; XSD 1.0 does not. It is implementation-defined whether XSD 1.1 is supported. (See fn:number.)
Any other format token, which indicates a numbering sequence in which that token represents the number 1 (one) (but see the note below). It is implementation-defined which numbering sequences, additional to those listed above, are supported. If an implementation does not support a numbering sequence represented by the given token, it must use a format token of 1. (See fn:format-integer.)
For all format tokens other than a digit-pattern, there may be implementation-defined lower and upper bounds on the range of numbers that can be formatted using this format token; indeed, for some numbering sequences there may be intrinsic limits. For example, the format token U+2460 (CIRCLED DIGIT ONE, ①) has a range imposed by the Unicode character repertoire — zero to 20 in Unicode versions prior to 3.2, or zero to 50 in subsequent versions. For the numbering sequences described above any upper bound imposed by the implementation must not be less than 1000 (one thousand) and any lower bound must not be greater than 1. Numbers that fall outside this range must be formatted using the format token 1. (See fn:format-integer.)
The set of languages for which numbering is supported is implementation-defined. If the $language argument is absent, or is set to the empty sequence, or is invalid, or is not a language supported by the implementation, then the number is formatted using the default language from the dynamic context. (See fn:format-integer.)
...either a or t, to indicate alphabetic or traditional numbering respectively, the default being implementation-defined. (See fn:format-integer.)
The string of characters between the parentheses, if present, is used to select between other possible variations of cardinal or ordinal numbering sequences. The interpretation of this string is implementation-defined. No error occurs if the implementation does not define any interpretation for the defined string. (See fn:format-integer.)
It is implementation-defined what combinations of values of the format token, the language, and the cardinal/ordinal modifier are supported. If ordinal numbering is not supported for the combination of the format token, the language, and the string appearing in parentheses, the request is ignored and cardinal numbers are generated instead. (See fn:format-integer.)
The use of the a or t modifier disambiguates between numbering sequences that use letters. In many languages there are two commonly used numbering sequences that use letters. One numbering sequence assigns numeric values to letters in alphabetic sequence, and the other assigns numeric values to each letter in some other manner traditional in that language. In English, these would correspond to the numbering sequences specified by the format tokens a and i. In some languages, the first member of each sequence is the same, and so the format token alone would be ambiguous. In the absence of the a or t modifier, the default is implementation-defined. (See fn:format-integer.)
The static context provides a set of decimal formats. One of the decimal formats is unnamed, the others (if any) are identified by a QName. There is always an unnamed decimal format available, but its contents are implementation-defined. (See Defining a decimal format.)
IEEE states that the preferred quantum is language-defined. In this specification, it is implementation-defined. (See Trigonometric and exponential functions.)
IEEE defines various rounding algorithms for inexact results, and states that the choice of rounding direction, and the mechanisms for influencing this choice, are language-defined. In this specification, the rounding direction and any mechanisms for influencing it are implementation-defined. (See Trigonometric and exponential functions.)
The map returned by the fn:random-number-generator function may contain additional entries beyond those specified here, but it must match the record type defined above. The meaning of any additional entries is implementation-defined. To avoid conflict with any future version of this specification, the keys of any such entries should start with an underscore character. (See fn:random-number-generator.)
It is no longer automatically an error if the input contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. (See fn:codepoints-to-string.)
If two query parameters use the same keyword then the last one wins. If a query parameter uses a keyword or value which is not defined in this specification then the meaning is implementation-defined. If the implementation recognizes the meaning of the keyword and value then it should interpret it accordingly; if it does not recognize the keyword or value then if the fallback parameter is present with the value no it should reject the collation as unsupported, otherwise it should ignore the unrecognized parameter. (See The Unicode Collation Algorithm.)
The following query parameters are defined. If any parameter is absent, the default is implementation-defined except where otherwise stated. The meaning given for each parameter is non-normative; the normative specification is found in [UTS #35]. (See The Unicode Collation Algorithm.)
Because the set of collations that are supported is implementation-defined, an implementation has the option to support all collation URIs, in which case it will never raise this error. (See Choosing a collation.)
The properties available are as defined for the Unicode Collation Algorithm (see 5.3.4 The Unicode Collation Algorithm). Additional implementation-defined properties may be specified as described in the rules for UCA collation URIs. (See fn:collation.)
It is possible to define collations that do not have the ability to generate collation keys. Supplying such a collation will cause the function to fail. The ability to generate collation keys is an implementation-defined property of the collation. (See fn:collation-key.)
Conforming implementations must support normalization form NFC and may support normalization forms NFD, NFKC, NFKD, and FULLY-NORMALIZED. They may also support other normalization forms with implementation-defined semantics. (See fn:normalize-unicode.)
It is implementation-defined which version of Unicode (and therefore, of the normalization algorithms and their underlying data) is supported by the implementation. See [UAX #15] for details of the stability policy regarding changes to the normalization rules in future versions of Unicode. If the input string contains codepoints that are unassigned in the relevant version of Unicode, or for which no normalization rules are defined, the fn:normalize-unicode function leaves such codepoints unchanged. If the implementation supports the requested normalization form then it must be able to handle every input string without raising an error. (See fn:normalize-unicode.)
It is possible to define collations that do not have the ability to decompose a string into units suitable for substring matching. An argument to a function defined in this section may be a URI that identifies a collation that is able to compare two strings, but that does not have the capability to split the string into collation units. Such a collation may cause the function to fail, or to give unexpected results, or it may be rejected as an unsuitable argument. The ability to decompose strings into collation units is an implementation-defined property of the collation. The fn:collation-available function can be used to ask whether a particular collation has this property. (See Functions based on substring matching.)
The result of the function will always be such that validation against this schema would succeed. However, it is implementation-defined whether the result is typed or untyped, that is, whether the elements and attributes in the returned tree have type annotations that reflect the result of validating against this schema. (See fn:analyze-string.)
Some URI schemes are hierarchical and some are non-hierarchical. Implementations must treat the following schemes as non-hierarchical: jar, mailto, news, tag, tel, and urn. Whether additional schemes are known to be non-hierarchical implementation-defined. If a scheme is not known to be non-hierarchical, it must be treated as hierarchical. (See Parsing and building URIs.)
If the omit-default-ports option is true, the port is discarded and set to the empty sequence if the port number is the same as the default port for the given scheme. Implementations should recognize the default ports for http (80), https (443), ftp (21), and ssh (22). Exactly which ports are recognized is implementation-defined. (See fn:parse-uri.)
If the omit-default-ports option is true then the $port is set to the empty sequence if the port number is the same as the default port for the given scheme. Implementations should recognize the default ports for http (80), https (443), ftp (21), and ssh (22). Exactly which ports are recognized is implementation-defined. (See fn:build-uri.)
Processors may support a greater range and/or precision. The limits are implementation-defined. (See Limits and precision.)
Similarly, a processor may be unable accurately to represent the result of dividing a duration by 2, or multiplying a duration by 0.5. A processor that limits the precision of the seconds component of duration values must deliver a result that is as close as possible to the mathematically precise result, given these limits; if two values are equally close, the one that is chosen is implementation-defined. (See Limits and precision.)
All conforming processors must support year values in the range 1 to 9999, and a minimum fractional second precision of 1 millisecond or three digits (that is, s.sss). However, processors may set larger implementation-defined limits on the maximum number of digits they support in these two situations. Processors may also choose to support the year 0 and years with negative values. The results of operations on dates that cross the year 0 are implementation-defined. (See Limits and precision.)
Similarly, a processor that limits the precision of the seconds component of date and time or duration values may need to deliver a rounded result for arithmetic operations. Such a processor must deliver a result that is as close as possible to the mathematically precise result, given these limits: if two values are equally close, the one that is chosen is implementation-defined. (See Limits and precision.)
...the format token n, N, or Nn, indicating that the value of the component is to be output by name, in lower-case, upper-case, or title-case respectively. Components that can be output by name include (but are not limited to) months, days of the week, timezones, and eras. If the processor cannot output these components by name for the chosen calendar and language then it must use an implementation-defined fallback representation. (See The picture string.)
...indicates alphabetic or traditional numbering respectively, the default being implementation-defined. This has the same meaning as in the second argument of fn:format-integer. (See The picture string.)
The sequence of characters in the (adjusted) first presentation modifier is reversed (for example, 999'### becomes ###'999). If the result is not a valid decimal digit pattern, then the output is implementation-defined. (See Formatting Fractional Seconds.)
The output for these components is entirely implementation-defined. The default presentation modifier for these components is n, indicating that they are output as names (or conventional abbreviations), and the chosen names will in many cases depend on the chosen language: see 9.9.4.8 The language, calendar, and place arguments. (See Formatting Other Components.)
The set of languages, calendars, and places that are supported in the date formatting functions is implementation-defined. When any of these arguments is omitted or is the empty sequence, an implementation-defined default value is used. (See The language, calendar, and place arguments.)
The choice of the names and abbreviations used in any given language is implementation-defined. For example, one implementation might abbreviate July as Jul while another uses Jly. In German, one implementation might represent Saturday as Samstag while another uses Sonnabend. Implementations may provide mechanisms allowing users to control such choices. (See The language, calendar, and place arguments.)
The choice of the names and abbreviations used in any given language for calendar units such as days of the week and months of the year is implementation-defined. (See The language, calendar, and place arguments.)
The calendar value if present must be a valid EQName (dynamic error: [err:FOFD1340]). If it is a lexical QName then it is expanded into an expanded QName using the statically known namespaces; if it has no prefix then it represents an expanded-QName in no namespace. If the expanded QName is in no namespace, then it must identify a calendar with a designator specified below (dynamic error: [err:FOFD1340]). If the expanded QName is in a namespace then it identifies the calendar in an implementation-defined way. (See The language, calendar, and place arguments.)
At least one of the above calendars must be supported. It is implementation-defined which calendars are supported. (See The language, calendar, and place arguments.)
If the arguments to fn:function-lookup identify a function that is present in the static context of the function call, the function will always return the same function that a static reference to this function would bind to. If there is no such function in the static context, then the results depend on what is present in the dynamic context, which is implementation-defined. (See fn:function-lookup.)
It is to some extent implementation-defined whether two maps or arrays have the same function identity. Processors should ensure as a minimum that when a variable $m is bound to a map or array, calling jtree($m) more than once (with the same variable reference) will deliver the same JNode each time. (See fn:jtree.)
The requirement to deliver a deterministic result has performance implications, and for this reason implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call of the function must either return a deterministic result or must raise a dynamic error [err:FODC0003]. (See fn:doc.)
Various aspects of this processing are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:... (See fn:doc.)
It is implementation-defined whether DTD validation and/or schema validation is applied to the source document. (See fn:doc.)
The effect of a fragment identifier in the supplied URI is implementation-defined. One possible interpretation is to treat the fragment identifier as an ID attribute value, and to return a document node having the element with the selected ID value as its only child. (See fn:doc.)
By default, this function is deterministic. This means that repeated calls on the function with the same argument will return the same result. However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call to this function must either return a deterministic result or must raise a dynamic error [err:FODC0003]. (See fn:collection.)
By default, this function is deterministic. This means that repeated calls on the function with the same argument will return the same result. However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call to this function must either return a deterministic result or must raise a dynamic error [err:FODC0003]. (See fn:uri-collection.)
It is no longer automatically an error if the resource (after decoding) contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. (See fn:unparsed-text.)
...the encoding inferred from the initial octets of the resource, or from implementation-defined heuristics as defined by the rules of the bin:infer-encoding function. (See fn:unparsed-text.)
The fact that the resolution of URIs is defined by a mapping in the dynamic context means that in effect, various aspects of the behavior of this function are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:... (See fn:unparsed-text.)
The fact that the resolution of URIs is defined by a mapping in the dynamic context means that in effect, various aspects of the behavior of this function are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:... (See fn:unparsed-binary.)
The collation used for matching names is implementation-defined, but must be the same as the collation used to ensure that the names of all environment variables are unique. (See fn:environment-variable.)
Except to the extent defined by these options, the precise process used to construct the XDM instance is implementation-defined. In particular, it is implementation-defined whether an XML 1.0 or XML 1.1 parser is used. (See fn:parse-xml.)
Options set in $options may be supplemented or modified based on configuration options defined externally using implementation-defined mechanisms. (See fn:parse-xml.)
Except as explicitly defined, the precise process used to construct the XDM instance is implementation-defined. In particular, it is implementation-defined whether an XML 1.0 or XML 1.1 parser is used. (See fn:parse-xml-fragment.)
If the second argument is omitted, or is supplied in the form of an output:serialization-parameters element, then the values of any serialization parameters that are not explicitly specified is implementation-defined, and may depend on the context. (See fn:serialize.)
A list of target namespaces identifying schema components to be used for validation. The way in which the processor locates schema components for the specified target namespaces is implementation-defined. A zero-length string denotes a no-namespace schema.... (See fn:xsd-validator.)
Set to the decimal value 1.0 or 1.1 to indicate which version of XSD is to be used. The default is implementation-defined. A processor may use a later version of XSD than the version requested, but must not use an earlier version.... (See fn:xsd-validator.)
The XSD specification allows a schema to be used for validation even when it contains unresolved references to absent schema components. It is implementation-defined whether this function allows the schema to be incomplete in this way. For example, some processors might allow validation using a schema in which an element declaration contains a reference to a type declaration that is not present in the schema, provided that the element declaration is never needed in the course of a particular validation episode. (See fn:xsd-validator.)
...error-details as map(*)*. This field is present only when (a) the option return-error-details was set to true, and (b) the supplied document was found to be invalid. The value is a sequence of maps, each containing details of one invalidity that was found. The precise details of the invalidities are implementation-defined, but they may include the following fields, if the information is available:... (See fn:xsd-validator.)
Because the [DOM: Living Standard] and [HTML: Living Standard] are not fixed, it is implementation-defined which versions are used. (See XDM Mapping from HTML DOM Nodes.)
If an implementation allows these nodes to be passed in via an API or similar mechanism, their behaviour is implementation-defined. (See XDM Mapping from HTML DOM Nodes.)
If the local name contains a character that is not a valid XML NameStartChar or NameChar, then an implementation-defined replacement string is used. The result must be a valid NCName. (See node-name Accessor.)
If the local name contains a character that is not a valid XML NameStartChar or NameChar, then an implementation-defined replacement string is used. The result must be a valid NCName. (See node-name Accessor.)
The input may contain deviations from the grammar of [RFC 7159], which are handled in an implementation-defined way. (Note: some popular extensions include allowing quotes on keys to be omitted, allowing a comma to appear after the last item in an array, allowing leading zeroes in numbers, and allowing control characters such as tab and newline to be present in unescaped form.) Since the extensions accepted are implementation-defined, an error may be raised [err:FOJS0001] if the input does not conform to the grammar. (See fn:parse-json.)
The supplied function is called to process the string value of any JSON number in the input. By default, numbers are processed by converting to xs:double using the XPath casting rules. Supplying the value xs:decimal#1 will instead convert to xs:decimal (which potentially retains more precision, but disallows exponential notation), while supplying a function that casts to (xs:decimal | xs:double) will treat the value as xs:decimal if there is no exponent, or as xs:double otherwise. Supplying the value fn:identity#1 causes the value to be retained unchanged as an xs:untypedAtomic. If the liberal option is false (the default), then the supplied number-parser is called if and only if the value conforms to the JSON grammar for numbers (for example, a leading plus sign and redundant leading zeroes are not allowed). If the liberal option is true then it is also called if the value conforms to an implementation-defined extension of this grammar. (See fn:parse-json.)
It is no longer automatically an error if the input contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. (See fn:json-doc.)
The input may contain deviations from the grammar of [RFC 7159], which are handled in an implementation-defined way. (Note: some popular extensions include allowing quotes on keys to be omitted, allowing a comma to appear after the last item in an array, allowing leading zeroes in numbers, and allowing control characters such as tab and newline to be present in unescaped form.) Since the extensions accepted are implementation-defined, an error may be raised (see below) if the input does not conform to the grammar. (See fn:json-to-xml.)
Default: Implementation-defined. (See fn:json-to-xml.)
Indicates that the resulting XDM instance must be typed; that is, the element and attribute nodes must carry the type annotations that result from validation against the schema given at D.2 Schema for the result of fn:json-to-xml, or against an implementation-defined schema if the liberal option has the value true. (See fn:json-to-xml.)
The result of the function will always be such that validation against this schema would succeed. However, it is implementation-defined whether the result is typed or untyped, that is, whether the elements and attributes in the returned tree have type annotations that reflect the result of validating against this schema. (See fn:csv-to-xml.)
Additional, implementation-defined options may be available, for example, to control aspects of the XML serialization, to specify the grammar start symbol, or to produce output formats other than XML. (See fn:invisible-xml.)
Default: The version given in the prolog of the library module; or implementation-defined if this is absent. (See fn:load-xquery-module.)
A sequence of URIs (in the form of xs:string values) which may be used or ignored in an implementation-defined way.... (See fn:load-xquery-module.)
Values for vendor-defined configuration options for the XQuery processor used to process the request. The key is the name of an option, expressed as a QName: the namespace URI of the QName should be a URI controlled by the vendor of the XQuery processor. The meaning of the associated value is implementation-defined. Implementations should ignore options whose names are in an unrecognized namespace. The option parameter conventions do not apply to this contained map.... (See fn:load-xquery-module.)
It is implementation-defined whether constructs in the library module are evaluated in the same execution scope as the calling module. (See fn:load-xquery-module.)
The library module that is loaded may import schema declarations using an import schema declaration. It is implementation-defined whether schema components in the in-scope schema definitions of the calling module are automatically added to the in-scope schema definitions of the dynamically loaded module. The in-scope schema definitions of the calling and called modules must be consistent, according to the rules defined in 2.2.5 Consistency Constraints XQ31. (See fn:load-xquery-module.)
The serialized result is written to persistent storage. This means that the fn:transform function has side-effects and becomes nondeterministic, so the option should be used with care, and the precise behavior may be implementation-defined. When this option is used, the URIs used for the base-output-uri and the URIs of any secondary result documents must be writable locations. (See fn:transform.)
Indicates whether any xsl:message instructions in the stylesheet are to be evaluated. The destination and formatting of any such messages is implementation-defined. (See fn:transform.)
Default: Implementation-defined. (See fn:transform.)
Default: Implementation-defined. (See fn:transform.)
If the implementation provides a way of writing or invoking functions with side-effects, this post-processing function might be used to save a copy of the result document to persistent storage. For example, if the implementation provides access to the EXPath File library [EXPath], then a serialized document might be written to filestore by calling the file:write function. Similar mechanisms might be used to issue an HTTP POST request that posts the result to an HTTP server, or to send the document to an email recipient. The semantics of calling functions with side-effects are entirely implementation-defined. (See fn:transform.)
Calls to fn:transform can potentially have side-effects even in the absence of the post-processing option, because the XSLT specification allows a stylesheet to invoke extension functions that have side-effects. The semantics in this case are implementation-defined. (See fn:transform.)
A string intended to be used as the static base URI of the principal stylesheet module. This value must be used if no other static base URI is available. If the supplied stylesheet already has a base URI (which will generally be the case if the stylesheet is supplied using stylesheet-node or stylesheet-location) then it is implementation-defined whether this parameter has any effect. If the value is a relative reference, it is resolved against the executable base URIXP of the fn:transform function call.... (See fn:transform.)
Values for vendor-defined configuration options for the XSLT processor used to process the request. The key is the name of an option, expressed as a QName: the namespace URI of the QName should be a URI controlled by the vendor of the XSLT processor. The meaning of the associated value is implementation-defined. Implementations should ignore options whose names are in an unrecognized namespace. Default is the empty map.... (See fn:transform.)
It is implementation-defined whether the XSLT transformation is executed within the same execution scope as the calling code. (See fn:transform.)
XSLT 1.0 does not define any error codes, so this is the likely outcome with an XSLT 1.0 processor. XSLT 2.0 and 3.0 do define error codes, but some APIs do not expose them. If multiple errors are signaled by the transformation (which is most likely to happen with static errors) then the error code should where possible be that of one of these errors, chosen arbitrarily; the processor may make details of additional errors available to the application in an implementation-defined way. (See fn:transform.)
In addition, the values of $input, typically serialized and converted to an xs:string, and $label (if supplied and non-empty) may be output to an implementation-defined destination. (See fn:trace.)
Supposing thatThe function $vreturns is anthe xs:decimal with the value 124.84, the function returns the value 124.84, while outputting a message such as "the value of $v is: 124.84" to an implementation-defined destination. The format of the message is also implementation-defined. (See fn:trace.)
Similar to fn:trace, the values of $input, typically serialized and converted to an xs:string, and $label (if supplied and non-empty) may be output to an implementation-defined destination. (See fn:message.)
If ST is xs:float or xs:double, then TV is the xs:decimal value, within the set of xs:decimal values that the implementation is capable of representing, that is numerically closest to SV. If two values are equally close, then the one that is closest to zero is chosen. If SV is too large to be accommodated as an xs:decimal, (see [XML Schema Part 2: Datatypes Second Edition] for implementation-defined limits on numeric values) a dynamic error is raised [err:FOCA0001]. If SV is one of the special xs:float or xs:double values NaN, INF, or -INF, a dynamic error is raised [err:FOCA0002]. (See Casting to xs:decimal.)
In casting to xs:decimal or to a type derived from xs:decimal, if the value is not too large or too small but nevertheless cannot be represented accurately with the number of decimal digits available to the implementation, the implementation may round to the nearest representable value or may raise a dynamic error [err:FOCA0006]. The choice of rounding algorithm and the choice between rounding and error behavior is implementation-defined. (See Casting from xs:string and xs:untypedAtomic.)
If ST is xs:decimal, xs:float or xs:double, then TV is SV with the fractional part discarded and the value converted to xs:integer. Thus, casting 3.1456 returns 3 while -17.89 returns -17. Casting 3.124E1 returns 31. If SV is too large to be accommodated as an integer, (see [XML Schema Part 2: Datatypes Second Edition] for implementation-defined limits on numeric values) a dynamic error is raised [err:FOCA0003]. If SV is one of the special xs:float or xs:double values NaN, INF, or -INF, a dynamic error is raised [err:FOCA0002]. (See Casting to xs:integer.)
The tz timezone database, available at http://www.iana.org/time-zones. It is implementation-defined which version of the database is used. (See IANA Timezone Database.)
Unicode Standard Annex #15: Unicode Normalization Forms. Ed. Mark Davis and Ken Whistler, Unicode Consortium. The current version is 16.0.0, dated 2024-08-14. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr15/. (See UAX #15.)
Unicode Standard Annex #29: Unicode Text Segmentation. Ed. Josh Hadley, Unicode Consortium. The current version is 16.0.0, dated 2024-08-28. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr29/. (See UAX #29.)
The Unicode Consortium, Reading, MA, Addison-Wesley, 2016. The Unicode Standard as updated from time to time by the publication of new versions. See http://www.unicode.org/standard/versions/ for the latest version and additional information on versions of the standard and of the Unicode Character Database. The version of Unicode to be used is implementation-defined, but implementations are recommended to use the latest Unicode version; currently, Version 9.0.0. (See The Unicode Standard.)
Unicode Technical Standard #10: Unicode Collation Algorithm. Ed. Mark Davis and Ken Whistler, Unicode Consortium. The current version is 16.0.0, dated 2024-08-22. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr10/. (See UTS #10.)
Unicode Technical Standard #35: Unicode Locale Data Markup Language. Ed Mark Davis et al, Unicode Consortium. The current version is 47, dated 2025-03-11. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr35/. (See UTS #35.)