This document is also available in these non-normative formats: Specification in XML format and XML function catalog.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
This document defines constructor functions, operators, and functions on the datatypes defined in [XML Schema Part 2: Datatypes Second Edition] and the datatypes defined in [XQuery and XPath Data Model (XDM) 4.0]. It also defines functions and operators on nodes and node sequences as defined in the [XQuery and XPath Data Model (XDM) 4.0]. These functions and operators are defined for use in [XML Path Language (XPath) 4.0] and [XQuery 4.0: An XML Query Language] and [XSL Transformations (XSLT) Version 4.0] and other related XML standards. The signatures and summaries of functions defined in this document are available at: http://www.w3.org/2005/xpath-functions/.
A summary of changes since version 3.1 is provided at H Changes since 3.1.
This section describes the status of this document at the time of its publication. Other documents may supersede this document.
This document is a working draft developed and maintained by a W3C Community Group, the XQuery and XSLT Extensions Community Group unofficially known as QT4CG (where "QT" denotes Query and Transformation). This draft is work in progress and should not be considered either stable or complete. Standard W3C copyright and patent conditions apply.
The community group welcomes comments on the specification. Comments are best submitted as issues on the group's GitHub repository.
As the Community Group moves towards publishing dated, stable drafts, some features that the group thinks may likely be removed or substantially changed are marked “at risk” in their changes section. In this draft:
The community group maintains two extensive test suites, one oriented to XQuery and XPath, the other to XSLT. These can be found at qt4tests and xslt40-test respectively. New tests, or suggestions for correcting existing tests, are welcome. The test suites include extensive metadata describing the conditions for applicability of each test case as well as the expected results. They do not include any test drivers for executing the tests: each implementation is expected to provide its own test driver.
The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).
Changes in 4.0 (next)
If a section of this specification has been updated since version 3.1, an overview of the changes is provided, along with links to navigate to the next or previous change.
Sections with significant changes are marked with a ✭ symbol in the table of contents. New functions are indicated by ✚.
The purpose of this document is to define functions and operators for inclusion in XPath 4.0, XQuery 4.0, and XSLT 4.0. The exact syntax used to call these functions and operators is specified in [XML Path Language (XPath) 4.0], [XQuery 4.0: An XML Query Language] and [XSL Transformations (XSLT) Version 4.0].
This document defines three classes of functions:
General purpose functions, available for direct use in user-written queries, stylesheets, and XPath expressions, whose arguments and results are values defined by the [XQuery and XPath Data Model (XDM) 4.0].
Constructor functions, used for creating instances of a datatype from values of (in general) a different datatype. These functions are also available for general use; they are named after the datatype that they return, and they always take a single argument.
Functions that specify the semantics of operators defined in [XML Path Language (XPath) 4.0] and [XQuery 4.0: An XML Query Language]. These exist for specification purposes only, and are not intended for direct calling from user-written code.
[XML Schema Part 2: Datatypes Second Edition] defines a number of primitive and derived datatypes, collectively known as built-in datatypes. This document defines functions and operations on these datatypes as well as the other types (for example, nodes and sequences of nodes) defined in 2.7 Schema Information DM31 of the [XQuery and XPath Data Model (XDM) 4.0]. These functions and operations are available for use in [XML Path Language (XPath) 4.0], [XQuery 4.0: An XML Query Language] and any other host language that chooses to reference them. In particular, they may be referenced in future versions of XSLT and related XML standards.
[XSD 1.1 Part 2] adds to the datatypes defined in [XML Schema Part 2: Datatypes Second Edition]. It introduces a new derived type xs:dateTimeStamp, and it incorporates as built-in types the two types xs:yearMonthDuration and xs:dayTimeDuration which were previously XDM additions to the type system. In addition, XSD 1.1 clarifies and updates many aspects of the definitions of the existing datatypes: for example, it extends the value space of xs:double to allow both positive and negative zero, and extends the lexical space to allow +INF; it modifies the value space of xs:Name to permit additional Unicode characters; it allows year zero and disallows leap seconds in xs:dateTime values; and it allows any character string to appear as the value of an xs:anyURI item. Implementations of this specification may support either XSD 1.0 or XSD 1.1 or both.
In some cases, this specification references XSD for the semantics of operations such as the effect of matching using regular expressions, or conversion of atomic items to strings. In most such cases there is no intended technical difference between the XSD 1.0 and XSD 1.1 specifications, but the 1.1 version often provides clearer explanations and sometimes also corrects technical errors. In such cases this specification often chooses to reference the XSD 1.1 specification. This should not be taken as implying that it is necessary to invoke an XSD 1.1 processor.
References to specific sections of some of the above documents are indicated by cross-document links in this document. Each such link consists of a pointer to a specific section followed a superscript specifying the linked document. The superscripts have the following meanings: XQ [XQuery 4.0: An XML Query Language], XT [XSL Transformations (XSLT) Version 4.0], XP [XML Path Language (XPath) 4.0], and DM [XQuery and XPath Data Model (XDM) 4.0].
As a matter of convention, a number of functions defined in this document take a parameter whose value is a map, defining options controlling the detail of how the function is evaluated. Maps are a new datatype introduced in XPath 3.1.
For example, the function fn:xml-to-json has an options parameter allowing specification of whether the output is to be indented. A call might be written:
xml-to-json($input, { 'indent': true() })[Definition] Functions that take an options parameter adopt common conventions on how the options are used. These are referred to as the option parameter conventions. These rules apply only to functions that explicitly refer to them.
Where a function adopts the option parameter conventions, the following rules apply:
The value of the relevant argument must be a map. The entries in the map are referred to as options: the key of the entry is called the option name, and the associated value is the option value. Option names defined in this specification are always strings (single xs:string values). Option values may be of any type.
The type of the options parameter in the function signature is always given as map(*).
Although option names are described above as strings, the actual key may be any value that is the same key as the required string. For example, instances of xs:untypedAtomic or xs:anyURI are equally acceptable.
Note:
This means that the implementation of the function can check for the presence and value of particular options using the functions map:contains and/or map:get.
Implementations may attach an implementation-defined meaning to options in the map that are not described in this specification. These options should use values of type xs:QName as the option names, using an appropriate namespace.
If an option is present whose key is not described in the specification, then a type error [err:XPTY0004]XPmust be raised unless either (a) the key is recognized by the implementation, or (b) the key is a value of type xs:QName with a non-absent namespace.
All entries in the options map are optional, and supplying anthe empty map has the same effect as omitting the relevant argument in the function call, assuming this is permitted.
The ordering of the options map is immaterial.
For each named option, the function specification defines a required type for the option value. The value that is actually supplied in the map is converted to this required type using the coercion rulesXP. This will result in an error (typically [err:XPTY0004]XP or [err:FORG0001]FO) if conversion of the supplied value to the required type is not possible. A type error also occurs if this conversion delivers a coerced function whose invocation fails with a type error. A dynamic error occurs if the supplied value after conversion is not one of the permitted values for the option in question: the error codes for this error are defined in the specification of each function.
Note:
It is the responsibility of each function implementation to invoke this conversion; it does not happen automatically as a consequence of the function-calling rules.
In cases where the value of an option is itself a map, the specification of the particular function must indicate whether or not these rules apply recursively to the contents of that map.
A sequence is an ordered collection of zero or more items. An item is a node, an atomic item, or a function, such as a map or an array. The terms sequence and item are defined formally in [XQuery 4.0: An XML Query Language] and [XML Path Language (XPath) 4.0].
The following functions are defined on sequences. These functions work on any sequence, without performing any operations that are sensitive to the individual items in the sequence.
| Function | Meaning |
|---|---|
fn:empty | Returns true if the argument is the empty sequence. |
fn:exists | Returns true if the argument is a non-empty sequence. |
fn:foot | Returns the last item in a sequence. |
fn:head | Returns the first item in a sequence. |
fn:identity | Returns its argument value. |
fn:insert-before | Returns a sequence constructed by inserting an item or a sequence of items at a given position within an existing sequence. |
fn:insert-separator | Inserts a separator between adjacent items in a sequence. |
fn:items-at | Returns a sequence containing the items from $input at positions defined by $at, in the order specified. |
fn:remove | Returns a new sequence containing all the items of $inputexcept those at specified positions. |
fn:replicate | Produces multiple copies of a sequence. |
fn:reverse | Reverses the order of items in a sequence. |
fn:slice | Returns a sequence containing selected items from a supplied input sequence based on their position. |
fn:subsequence | Returns the contiguous sequence of items in $input beginning at the position indicated by $start and continuing for the number of items indicated by $length. |
fn:tail | Returns all but the first item in a sequence. |
fn:trunk | Returns all but the last item in a sequence. |
fn:unordered | Returns the items of $input in an implementation-dependent order. |
fn:void | Absorbs the argument. |
As in the previous section, for the illustrative examples below, assume an XQuery or transformation operating on a non-empty Purchase Order document containing a number of line-item elements. The variable $seq is bound to the sequence of line-item nodes in document order. The variables $item1, $item2, etc. are bound to separate, individual line-item nodes in the sequence.
Returns the first item in a sequence.
fn:head( | ||
$input | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function returns the first item in $input; if $input is empty, it returns anthe empty sequence.
The effect of the function is equivalent to the result of the following XPath expression.
filter($input, fn($item, $pos) { $pos eq 1 })| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
Returns a sequence containing the items from $input at positions defined by $at, in the order specified.
fn:items-at( | ||
$input | as , | |
$at | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
Returns the items in $input at the positions listed in $at, in order of the integers in the $at argument.
The effect of the function is equivalent to the result of the following XPath expression.
for-each($at, fn($index) { subsequence($input, $index, 1) })In the simplest case where $at is a single integer, fn:items-at($input, 3) returns the same result as $input[3].
Compared with a simple positional filter expression, the function is useful because:
It can select items at multiple positions, and unlike fn:subsequence, these do not need to be contiguous.
The $at expression can depend on the focus.
The order of the returned items can differ from their order in the $input sequence.
If any integer in $at is outside the range 1 to count($input), that integer is effectively ignored: no error occurs.
If either of the arguments is anthe empty sequence, the result is anthe empty sequence.
If $at contains duplicate integers, the result also contains duplicates. No de-duplication occurs. If the input sequence contains nodes, these are not copied: instead, the result sequence contains multiple references to the same node.
| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Returns a new sequence containing all the items of $inputexcept those at specified positions.
fn:remove( | ||
$input | as , | |
$positions | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function returns a sequence consisting of all items of $inputwhose 1-based position is not equal to any of the integers in $positions.
The effect of the function is equivalent to the result of the following XPath expression.
filter($input, fn($item, $pos) { not($pos = $positions) })Any integer in $positions that is less than 1 or greater than the number of items in $input is effectively ignored.
If $input is the empty sequence, the empty sequence is returned.
If $positions is anthe empty sequence, the input sequence $input is returned unchanged.
| Variables | |
|---|---|
let $abc := ("a", "b", "c") | |
| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Returns the contiguous sequence of items in $input beginning at the position indicated by $start and continuing for the number of items indicated by $length.
fn:subsequence( | ||
$input | as , | |
$start | as , | |
$length | as | := () |
) as | ||
This function is deterministic, context-independent, and focus-independent.
In the two-argument case (or where the third argument is anthe empty sequence), the function returns:
$input[round($start) le position()]
In the three-argument case, the function returns:
$input[round($start) le position()
and position() lt round($start) + round($length)]The effect of the function is equivalent to the result of the following XPath expression.
filter(
$input,
if (empty($length)) then (
fn($item, $pos) { round($start) le $pos }
) else (
fn($item, $pos) { round($start) le $pos and $pos lt round($start) + round($length) }
)
)The first item of a sequence is located at position 1, not position 0.
If $input is the empty sequence, the empty sequence is returned.
In the two-argument case, the function returns a sequence comprising those items of $input whose 1-based position is greater than or equal to $start (rounded to an integer). No error occurs if $start is zero or negative.
In the three-argument case, The function returns a sequence comprising those items of $input whose 1-based position is greater than or equal to $start (rounded to an integer), and less than the sum of $start and $length (both rounded to integers). No error occurs if $start is zero or negative, or if $start plus $length exceeds the number of items in the sequence, or if $length is negative.
As a consequence of the general rules, if $start is -INF and $length is +INF, then fn:round($start) + fn:round($length) is NaN; since position() lt NaN always returns false, the result is anthe empty sequence.
The reason the function accepts arguments of type xs:double is that many computations on untyped data return an xs:double result; and the reason for the rounding rules is to compensate for any imprecision in these floating-point computations.
| Variables | |
|---|---|
let $seq := ("item1", "item2", "item3", "item4", "item5") | |
| Expression | Result |
|---|---|
|
|
|
|
Absorbs the argument.
fn:void( | ||
$input | as | := () |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function absorbs the supplied $input argument and returns anthe empty sequence.
The function can be used to discard unneeded output of expressions (functions, third-party libraries, etc.).
It can also be used to discard results during development.
It is implementation-dependent whether the supplied argument is evaluated or ignored. An implementation may decide to evaluate nondeterministic expressions and ignore deterministic ones.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | let $mapping := () return for-each(1 to 10, $mapping otherwise void#0) |
| Result: | () (Indicates that if no mapping is supplied, all items are dropped.) |
The functions in this section perform comparisons between the items in one or more sequences.
Many of these functions require atomic items to be compared for equality.
[Definition] Two atomic items A and B are said to be contextually equal if the function call fn:compare(A, B) returns zero when evaluated with a specified or context-determined collation and implicit timezone. If two values are not contextually equal, they are considered to be contextually unequal, even in the case when comparing them using fn:compare raises an error.
Note:
Except where explicitly stated otherwise, an appeal to contextual equality implies that NaN is treated as equal to NaN.
| Function | Meaning |
|---|---|
fn:atomic-equal | Determines whether two atomic items are equal, under the rules used for comparing keys in a map. |
fn:compare | Returns -1, 0, or 1, depending on whether the first value is less than, equal to, or greater than the second value. |
fn:contains-subsequence | Determines whether one sequence contains another as a contiguous subsequence, using a supplied callback function to compare items. |
fn:deep-equal | This function assesses whether two sequences are deep-equal to each other. To be deep-equal, they must contain items that are pairwise deep-equal; and for two items to be deep-equal, they must either be atomic items that compare equal, or nodes of the same kind, with the same name, whose children are deep-equal, or maps with matching entries, or arrays with matching members. |
fn:distinct-values | Returns the values that appear in a sequence, with duplicates eliminated. |
fn:duplicate-values | Returns the values that appear in a sequence more than once. |
fn:ends-with-subsequence | Determines whether one sequence ends with another, using a supplied callback function to compare items. |
fn:index-of | Returns a sequence of positive integers giving the positions within the sequence $input of items that are contextually equal to $target. |
fn:starts-with-subsequence | Determines whether one sequence starts with another, using a supplied callback function to compare items. |
Returns -1, 0, or 1, depending on whether the first value is less than, equal to, or greater than the second value.
fn:compare( | ||
$value1 | as , | |
$value2 | as , | |
$collation | as | := fn:default-collation() |
) as | ||
The two-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations, and implicit timezone.
The three-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations, and implicit timezone, and static base URI.
The function compares two atomic items $value1 and $value2 for order, and returns the integer value -1, 0, or +1, depending on whether $value1 is less than, equal to, or greater than $value2, respectively.
This function is transitive and symmetric. For example:
If compare(A, B) returns zero, then compare(B, A) returns zero.
If compare(A, B) returns -1, then compare(B, A) returns +1.
If compare(A, B) and compare(B, C) both return -1, then compare(A, C) also returns -1.
If either $value1 or $value2 is the empty sequence, the function returns the empty sequence.
Otherwise, the result is determined as follows:
If $value1 is an instance of xs:string, xs:anyURI or xs:untypedAtomic, and if $value2 is an instance of xs:string, xs:anyURI or xs:untypedAtomic, the values are compared as strings, and the result reflects the order according to the rules of the collation that is used.
The collation is determined according to the rules in 5.3.7 Choosing a collation.
Note:
Using the default collation may be inappropriate for some strings, for example URIs or manufacturing part numbers. In such cases it is safest to supply "http://www.w3.org/2005/xpath-functions/collation/codepoint" explicitly as the third argument.
If both $value1 and $value2 are instances of xs:numeric, the function relies on a total order, which is defined as follows:
A value $f of type xs:float is in all cases equal to the value xs:double($f). The remaining rules therefore only consider instances of xs:double and xs:decimal.
NaN is equal to itself and less than any other value.
Negative infinity is equal to itself and less than any other value except NaN.
Positive infinity is equal to itself and greater than any other value.
Negative zero is equal to positive zero.
Other xs:double and xs:decimal values (that is, values other than the infinities, NaN, and negative zero) are ordered according to their mathematical magnitude, the comparison being done without any rounding or loss of precision. This effect can be achieved by converting xs:double values to xs:decimal using an implementation of xs:decimal that imposes no limits on precision or scale, or an implementation whose limits are such that all xs:double values can be represented precisely.
Note:
Every xs:double other than NaN and ±INF, has a mathematical value of the form m × 2^e, where m is an integer whose absolute value is less than 2^53, and e is an integer between -1075 and 970, inclusive. This is the value that is used in comparisons.
Practical difficulties arise because the typical string representations of an xs:double, such as 3.1, cannot be precisely represented by values of the form m × 2^e, but are instead converted to the best available approximation, which will often not be exactly equal to an xs:decimal expressed using the same lexical form.
If both $value1 and $value2 are instances of xs:boolean, the result is fn:compare(xs:integer($value1), xs:integer($value2))
Note:
This means that false is treated as less than true.
If $value1 is an instance of xs:hexBinary or xs:base64Binary, and if $value2 is an instance of xs:hexBinary or xs:base64Binary, then:
Let $A be the sequence of integers, in the range (0 to 255), representing the octets of $value1, in order; and let $B similarly be the sequence of integers representing the octets of $value2.
If $A is empty and $B is empty return zero.
If $A is empty and $B is not empty return -1.
Let $C be the value of fn:compare(fn:head($A), fn:head($B)).
If $C is non-zero, then return $C.
Otherwise, return the result of applying these rules recursively to fn:tail($A) and fn:tail($B)
If both $value1 and $value2 are instances of the same primitive type T, where T is one of the types xs:dateTime, xs:date, xs:time, xs:gYear, xs:gYearMonth, xs:gMonth, gMonthDay, or gDay, then:
Each of the values is converted to an xs:dateTime value as follows:
The value is considered as a tuple with seven fields (year, month, day, hours, minutes, seconds, timezone) as defined by the functions fn:year-from-dateTime, fn:months-from-dateTimefn:month-from-dateTime, and so on.
Any absent components, other than the timezone, are substituted with the corresponding components of the xs:dateTime value 1972-01-01T00:00:00 to produce an xs:dateTime value.
If the timezone component is absent, it is substituted with the implicit timezone from the dynamic context.
Note:
The xs:dateTime1972-01-01T00:00:00 is arbitrary. The only constraint is that the year must be a leap year (so that the xs:gYearMonth value --02-29 expands to a valid date). XSD originally chose this as the being historically the first date on which there was a leap second, but this is irrelevant as leap seconds are not supported in XDM.
The result of the function is then the result of comparing the starting instants of these two xs:dateTime values according to the algorithm defined in section 3.2.7.4 of [XML Schema Part 2: Datatypes Second Edition] ( “Order relation on dateTime” for xs:dateTime values with timezones).
If both $value1 and $value2 are instances of xs:duration, then:
Let $M1 and $M2 be the months components of the two durations, and let $S1 and $S2 be the seconds components of the two durations.
Let $C be fn:compare($M1, $M2).
If $C is non-zero, return $C.
Otherwise, return fn:compare($S1, $S2).
Note:
The result matches the real-world semantics of durations in many cases, for example:
When both values are zero-length durations.
When both values are have an equal months component (in particular when both have a zero months component).
When both values are have an equal seconds component (in particular when both have a zero seconds component).
When both values have a seconds component that is less than the number of seconds in the shortest month.
In other cases the result is well defined and well behaved (for example it is symmetric and transitive) but may be counter-intuitive. For example, one month (PT1M) is considered greater than one hundred days (PT100D).
Previous versions of this specification allowed durations to be compared only if both were instances of xs:dateTimeDuration or xs:yearMonthDuration. This requirement has been relaxed in the interests of allowing all atomic items to be sorted; in some applications the actual sort order matters little, so long as it is consistent.
If both $value1 and $value2 are instances of xs:QName, then:
Let $N1 and $N2 be the result of applying the function fn:namespace-uri-from-QName to the two values, and let $L1 and $L2 be the result of applying the function local-name-from-QName to the two values.
Let $CPC be "http://www.w3.org/2005/xpath-functions/collation/codepoint".
Let $C be fn:compare($N1, $N2, $CPC).
If $C is non-zero, return $C.
Otherwise, return fn:compare($L1, $L2, $CPC).
If both $value1 and $value2 are instances of xs:NOTATION, return fn:compare(xs:QName($value1), xs:QName($value2)).
For any other combination of types, a type error [err:XPTY0004]XP is raised. In particular, this means that an error is raised when comparing two atomic items that belong to different [XQuery and XPath Data Model (XDM) 4.0] section .
For numeric values, consider the xs:double value written as 0.1e0 and the xs:decimal value written as 0.1: The mathematical magnitude of this xs:double value is 0.1000000000000000055511151231257827021181583404541015625. Therefore, compare(0.1e0, 0.1) returns +1. By contrast, 0.1e0 lt 0.1 is false and 0.1e0 eq 0.1 is true, because those expressions convert the xs:decimal value 0.1 to the xs:double value 0.1e0 before the comparison.
Although operations such as sorting and the fn:min and fn:max functions invoke fn:compare to perform numeric comparison, these functions in some cases treat NaN differently.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
(Assuming the default collation equates “ss” and the German letter “ß”.) |
| Expression: | compare(
'Strasse',
'Straße',
collation({ 'lang': 'de', 'strength': 'primary' })
) |
| Result: | 0 (The specified collation equates “ss” and the German letter “ß”.) |
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
Changes in 4.0 (next | previous)
When comments and processing instructions are ignored, any text nodes either side of the comment or processing instruction are now merged prior to comparison. [Issue 930 PR 933 16 January 2024]
The $options parameter has been added, absorbing the $collation parameter. [Issues 934 1167 PR 1191 21 May 2024]
A callback function can be supplied for comparing individual items. [Issues 99 1142 PRs 1120 1150 9 April 2024]
Atomic items of types xs:hexBinary and xs:base64Binary are now mutually comparable. In rare cases, where an application uses both types and assumes they are distinct, this can represent a backwards incompatibility. [Issue 2139 PR 2168 19 August 2025]
This function assesses whether two sequences are deep-equal to each other. To be deep-equal, they must contain items that are pairwise deep-equal; and for two items to be deep-equal, they must either be atomic items that compare equal, or nodes of the same kind, with the same name, whose children are deep-equal, or maps with matching entries, or arrays with matching members.
fn:deep-equal( | ||
$input1 | as , | |
$input2 | as , | |
$options | as | := {} |
) as | ||
The two-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations, and implicit timezone.
The three-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations, and static base URI, and implicit timezone.
The $options argument, if present, defines additional parameters controlling how the comparison is done. If it is supplied as a map, then the option parameter conventions apply.
For backwards compatibility reasons, the $options argument can also be set to a string containing a collation name. Supplying a string $S for this argument is equivalent to supplying the map { 'collation': $S }. Omitting the argument, or supplying the empty sequence, is equivalent to supplying anthe empty map.
If the two sequences ($input1 and $input2) are both empty, the function returns true.
If the two sequences are of different lengths, the function returns false.
If the two sequences are of the same length, the comparison is controlled by the ordered option:
By default, the option is true: The function returns true if and only if every item in the sequence $input1 is deep-equal to the item at the same position in the sequence $input2.
If the option is set to false, the function returns false if and only if every item in the sequence $input1 is deep-equal to an item at some position in the sequence $input2, and vice versa.
The rules for deciding whether two items are deep-equal appear below.
The entries that may appear in the $options map are as follows. The detailed rules for the interpretation of each option appear later.
record( | |
base-uri? | as xs:boolean, |
collation? | as xs:string, |
comments? | as xs:boolean, |
debug? | as xs:boolean, |
id-property? | as xs:boolean, |
idrefs-property? | as xs:boolean, |
in-scope-namespaces? | as xs:boolean, |
items-equal? | as fn(item(), item()) as xs:boolean?, |
map-order? | as xs:boolean, |
namespace-prefixes? | as xs:boolean, |
nilled-property? | as xs:boolean, |
normalization-form? | as xs:string?, |
ordered? | as xs:boolean, |
processing-instructions? | as xs:boolean, |
timezones? | as xs:boolean, |
type-annotations? | as xs:boolean, |
type-variety? | as xs:boolean, |
typed-values? | as xs:boolean, |
unordered-elements? | as xs:QName*, |
whitespace? | as enum("preserve", "strip", "normalize") |
) | |
| Key | Meaning |
|---|---|
| Determines whether the base-uri of a node is significant.
|
| Identifies a collation which is used at all levels of recursion when strings are compared (but not when names are compared), according to the rules in 5.3.7 Choosing a collation. If the argument is not supplied, or if it is empty, then the default collation from the dynamic context of the caller is used.
|
| Determines whether comments are significant.
|
| Requests diagnostics in the case where the function returns false. When this option is set and the two inputs are found to be not equal, the implementation should output messages (in an implementation-dependent format and to an implementation-dependent destination) indicating the nature of the differences that were found.
|
| Determines whether the id property of elements and attributes is significant.
|
| Determines whether the idrefs property of elements and attributes is significant.
|
| Determines whether the in-scope namespaces of elements are significant.
|
| A user-supplied function to test whether two items are considered equal. The function can return true or false to indicate that two items are or are not equal, overriding the normal rules that would apply to those items; or it can return anthe empty sequence, to indicate that the normal rules should be followed. Note that returning () is not equivalent to returning false.
|
| Determines whether the order of entries in maps is significant.
|
| Determines whether namespace prefixes in xs:QName values (particularly the names of elements and attributes) are significant.
|
| Determines whether the nilled property of elements and attributes is significant.
|
| If present, indicates that text and attributes are converted to the specified Unicode normalization form prior to comparison. The value is as for the corresponding argument of fn:normalize-unicode.
|
| Controls whether the top-level order of the items of the input sequences is considered.
|
| Determines whether processing instructions are significant.
|
| Determines whether timezones in date/time values are significant.
|
| Determines whether type annotations are significant.
|
| Determines whether the variety of the type annotation of an element (whether it has complex content or simple content) is significant.
|
| Determines whether nodes are compared using their typed values rather than their string values.
|
| A list of QNames of elements considered to be unordered: that is, their child elements may appear in any order.
|
| Determines the extent to which whitespace is treated as significant. The value preserve retains all whitespace. The value strip ignores text nodes consisting entirely of whitespace. The value normalize ignores whitespace text nodes in the same way as the strip option, and additionally compares text and attribute nodes after normalizing whitespace in accordance with the rules of the fn:normalize-space function. The detailed rules, given below, also take into account type annotations and xml:space attributes.
|
Note:
As a general rule for boolean options (but not invariably), the value true indicates that the comparison is more strict.
In the following rules, where a recursive call on fn:deep-equal is made, this is assumed to use the same values of $options as the original call.
The rules reference a function equal-strings which compares two xs:string or xs:anyURI values as follows:
If the whitespace option is set to normalize, then each string is processed by calling the fn:normalize-space function.
If the normalization-form option is present, each string is then normalized by calling the fn:normalize-unicode function, supplying the specified normalization form.
The two strings are then compared for equality under the requested collation.
More formally, the equal-strings function is equivalent to the following implementation in XQuery:
declare function equal-strings(
$string1 as xs:string,
$string2 as xs:string,
$options as map(*)
) as xs:boolean {
let $n1 := if ($options?normalization-form)
then normalize-unicode(?, $options?normalization-form)
else identity#1
let $n2 := if ($options?whitespace = "normalize")
then normalize-space#1
else identity#1
return compare($n1($n2($string1)), $n1($n2($string2)), $options?collation) eq 0
}The rules for deciding whether two items $i1 and $i2 are deep-equal are as follows.
The two items are first compared using the function supplied in the items-equal option. If this returns true then the items are deep-equal. If it returns false then the items are not deep-equal. If it returns anthe empty sequence (which is always the case if the option is not explicitly specified) then the two items are deep-equal if one or more of the following conditions are true:
All of the following conditions are true:
$i1 is an atomic item.
$i2 is an atomic item.
Either the type-annotations option is false, or both atomic items have the same type annotation.
One of the following conditions is true:
If both $i1 and $i2 are instances of xs:string, xs:untypedAtomic, or xs:anyURI, equal-strings($i1, $i2, $collation, $options) returns true.
Otherwise, fn:compare($i1, $i2) returns zero.
If $i1 and $i2 are not comparable, that is, if the expression compare($i1, $i2) raises an error, then the function returns false; it does not report an error.
One of the following conditions is true:
Option namespace-prefixes is false.
Neither $i1 nor $i2 is of type xs:QName or xs:NOTATION.
$i1 and $i2 are qualified names with the same namespace prefix.
One of the following conditions is true:
Option timezones is false.
Neither $i1 nor $i2 is of type xs:date, xs:time, xs:dateTime, xs:gYear, xs:gYearMonth, xs:gMonth, xs:gMonthDay, or xs:gDay.
Neither $i1 nor $i2 has a timezone component.
Both $i1 and $i2 have a timezone component and the timezone components are equal.
All of the following conditions are true:
$i1 is a map.
$i2 is a map.
Both maps have the same number of entries.
For every entry in the first map, there is an entry in the second map that:
has the same key (note that the collation is not used when comparing keys), and
has the same associated value (compared using the fn:deep-equal function, recursively).
Either map-order is false, or the entries in both maps appear in the same order, that is, the Nth key in the first map is the same key as the Nth key in the second map, for all N.
All the following conditions are true:
$i1 is an array.
$i2 is an array.
Both arrays have the same number of members (array:size($i1) eq array:size($i2)).
Members in the same position of both arrays are deep-equal to each other: that is, every $p in 1 to array:size($i1) satisfies deep-equal($i1($p), $i2($p), $collation, $options).
All the following conditions are true:
$i1 is a function item and is not a map or array.
$i2 is a function item and is not a map or array.
$i1 and $i2 have the same function identity. The concept of function identity is explained in [XQuery and XPath Data Model (XDM) 4.0] section 8.1 Function Items.
All the following conditions are true:
$i1 is a node (specifically, an XNode).
$i2 is a node (specifically, an XNode).
Both nodes have the same node kind.
Either the base-uri option is false, or both nodes have the same value for their base URI property, or both nodes have an absent base URI.
Let significant-children($parent) be the sequence of nodes obtained by applying the following steps to the children of $parent, in turn:
Comment nodes are discarded if the option comments is false.
Processing instruction nodes are discarded if the option processing-instructions is false.
Adjacent text nodes are merged.
Whitespace-only text nodes are discarded if both the following conditions are true:
The option whitespace is set to strip or normalize; and
The text node is not within the scope of an element that has the attribute xml:space="preserve".
Note:
Whitespace text nodes will already have been discarded if $parent is a schema-validated element node whose type annotation is a complex type with an element-only or empty content model.
One of the following conditions is true.
Both nodes are document nodes, and the sequence significant-children($i1) is deep-equal to the sequence significant-children($i2).
Both nodes are element nodes, and all the following conditions are true:
The two nodes have the same name, that is (node-name($i1) eq node-name($i2)).
Either the option namespace-prefixes is false, or both element names have the same prefix.
Either the option in-scope-namespaces is false, or both element nodes have the same in-scope namespace bindings.
Either the option type-annotations is false, or both element nodes have the same type annotation.
Either the option id-property is false, or both element nodes have the same value for their is-id property.
Either the option idrefs-property is false, or both element nodes have the same value for their is-idrefs property.
Either the option nilled-property is false, or both element nodes have the same value for their nilled property.
One of the following conditions is true:
The option type-variety is false.
Both nodes are annotated as having simple content. For this purpose simple content means either a simple type or a complex type with simple content.
Both nodes are annotated as having complex content. For this purpose complex content means a complex type whose variety is mixed, element-only, or empty.
Note:
It is a consequence of this rule that, by default, validating a document D against a schema will usually (but not necessarily) result in a document that is not deep-equal to D. The exception is when the schema allows all elements to have mixed content.
The two nodes have the same number of attributes, and for every attribute $a1 in $i1/@* there exists an attribute $a2 in $i2/@* such that node-name($a1) eq node-name($a2) and $a1 and $a2 are deep-equal.
Note:
Attributes, like other items, may be compared using the supplied items-equal function. However, this function will not be called to compare two attribute nodes unless they have the same name.
One of the following conditions holds:
Both element nodes are annotated as having simple content (as defined above), the typed-values option is true, and the typed value of $i1 is deep-equal to the typed value of $i2.
Note:
The typed value of an element node is used only when the element has simple content, which means that no error can occur as a result of atomizing a node with no typed value.
Both element nodes are annotated as having simple content (as defined above), the typed-values option is false, and the equal-strings function returns true when applied to the string value of $i1 and the string value of $i2.
Both element nodes have a type annotation that is a complex type with element-only, mixed, or empty content, the (common) element name is not present in the unordered-elements option, and the sequence significant-children($i1) is deep-equal to the sequence significant-children($i2).
Both element nodes have a type annotation that is a complex type with element-only, mixed, or empty content, the (common) element name is present in the unordered-elements option, and the sequence significant-children($i1) is deep-equal to some permutation of the sequence significant-children($i2).
Note:
Elements annotated as xs:untyped fall into this category.
Including an element name in the unordered-elements list is unlikely to be useful except when the relevant elements have element-only content, but this is not a requirement: the rules apply equally to elements with mixed content, or even (trivially) to elements with empty content.
Both nodes are attribute nodes, and all the following conditions are true:
The two attribute nodes have the same name, that is (node-name($i1) eq node-name($i2)).
Either the option namespace-prefixes is false, or both attribute names have the same prefix.
Either the option type-annotations is false, or both attribute nodes have the same type annotation.
Either the option id-property is false, or both attribute nodes have the same value for their is-id property.
Either the option idrefs-property is false, or both attribute nodes have the same value for their is-idrefs property.
Let T be true if the option typed-value is true and both attributes $i1 and $i2 have a type annotation other than xs:untypedAtomic.
Then either T is true and the typed value of $i1 is deep-equal to the typed value of $i2, or T is false and the equal-strings function returns true when applied to the string value of $i1 and the string value of $i2.
Both nodes are processing instruction nodes, and all the following conditions are true:
The two nodes have the same name, that is (node-name($i1) eq node-name($i2)).
The equal-strings function returns true when applied to the string value of $i1 and the string value of $i2.
Both nodes are namespace nodes, and all the following conditions are true:
The two nodes either have the same name or are both nameless, that is fn:deep-equal(node-name($i1), node-name($i2)).
The string value of $i1 is equal to the string value of $i2 when compared using the Unicode codepoint collation.
Note:
Namespace nodes are not considered directly unless they appear in the top-level sequences passed explicitly to the fn:deep-equal function.
Both nodes are comment nodes, and the equal-strings function returns true when applied to their string values.
Both nodes are text nodes, and the equal-strings function returns true when applied to their string values.
All the following conditions are true:
$i1 is a JNode.
$i2 is a JNode.
The ·content· property of $i1 is deep-equal to the ·content· property of $i2.
Note:
The other properties of the two JNodes, such as ·parent· and ·selector·, are ignored. As with XNodes, deep equality considers only the subtree rooted at the node, and not its position within a containing tree.
In all other cases the result is false.
A type error is raised [err:XPTY0004]XP if the value of $options includes an entry whose key is defined in this specification, and whose value is not of the permitted type for that key.
A dynamic error is raised [err:FOJS0005] if the value of $options includes an entry whose key is defined in this specification, and whose value is not a permitted value for that key.
By default, whitespace in text nodes and attributes is considered significant. There are various ways whitespace differences can be ignored:
If nodes have been schema-validated, setting the typed-values option to true causes the typed values rather than the string values to be compared. This will typically cause whitespace to be ignored except where the type of the value is xs:string.
Setting the whitespace option to normalize causes all text and attribute nodes to have leading and trailing whitespace removed, and intermediate whitespace reduced to a single character.
By default, two nodes are not required to have the same type annotation, and they are not required to have the same in-scope namespaces. They may also differ in their parent, their base URI, and the values returned by the is-id and is-idrefs accessors (see [XQuery and XPath Data Model (XDM) 4.0] section 7.6.5 is-id Accessor and [XQuery and XPath Data Model (XDM) 4.0] section 7.6.6 is-idrefs Accessor). The order of children is significant, but the order of attributes is insignificant.
By default, the contents of comments and processing instructions are significant only if these nodes appear directly as items in the two sequences being compared. The content of a comment or processing instruction that appears as a descendant of an item in one of the sequences being compared does not affect the result. In previous versions of this specification, the presence of a comment or processing instruction, if it caused text to be split across two text nodes, might affect the result; this has been changed in 4.0 so that adjacent text nodes are merged after comments and processing instructions have been stripped.
Comparing items of different kind (for example, comparing an atomic item to a node, or a map to an array, or an integer to an xs:date) returns false, it does not return an error. So the result of fn:deep-equal(1, current-dateTime()) is false.
The items-equal callback function may be used to override the default rules for comparing individual items. For example, it might return true unconditionally when comparing two @timestamp attributes, if there is no expectation that the two trees will have identical timestamps. Given two nodes $n1 and $n2, it might compare them using the is operator, so that instead of comparing the descendants of the two nodes, the function simply checks whether they are the same node. Given two function items $f1 and $f2 it might return true unconditionally, knowing that there is no effective way to test if the functions are equivalent. Given two numeric values, it might return true if they are equal to six decimal places.
It is good practice for the items-equal callback function to be reflexive, symmetric, and transitive; if it is not, then the fn:deep-equal function itself will lack these qualities. Reflexive means that every item (including NaN) should be equal to itself; symmetric means that items-equal(A, B) should return the same result as items-equal(B, A), and transitive means that items-equal(A, B) and items-equal(B, C) should imply items-equal(A, C).
Setting the ordered option to false or supplying the unordered-elements option may result in poor performance when comparing long sequences, especially if the items-equal callback function is supplied.
| Variables | |
|---|---|
let $at := <attendees> <name last="Parker" first="Peter"/> <name last="Barker" first="Bob"/> <name last="Parker" first="Peter"/> </attendees> | |
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | deep-equal(
$at//name[@first="Bob"],
$at//name[@last="Barker"],
options := { 'items-equal': op('is') }
) |
| Result: | true() (Tests whether the two input sequences contain exactly the same nodes.) |
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | deep-equal(
{ 1: 'a', 2: 'b' },
{ 2: 'b', 1: 'a' }
) |
| Result: | true() |
| Expression: | deep-equal(
(1, 2, 3, 4),
(1, 4, 3, 2),
options := { 'ordered': false() }
) |
| Result: | true() |
| Expression: | deep-equal(
(1, 1, 2, 3),
(1, 2, 3, 3),
options := { 'ordered': false() }
) |
| Result: | false() |
| Expression: | deep-equal(
parse-xml("<a xmlns='AA'/>"),
parse-xml("<p:a xmlns:p='AA'/>")
) |
| Result: | true() (By default, namespace prefixes are ignored). |
| Expression: | deep-equal(
parse-xml("<a xmlns='AA'/>"),
parse-xml("<p:a xmlns:p='AA'/>"),
options := { 'namespace-prefixes': true() }
) |
| Result: | false() (False because the namespace prefixes differ). |
| Expression: | deep-equal(
parse-xml("<a xmlns='AA'/>"),
parse-xml("<p:a xmlns:p='AA'/>"),
options := { 'in-scope-namespaces': true() }
) |
| Result: | false() (False because the in-scope namespace bindings differ). |
| Expression: | deep-equal(
parse-xml("<a><b/><c/></a>"),
parse-xml("<a><c/><b/></a>")
) |
| Result: | false() (By default, order of elements is significant). |
| Expression: | deep-equal(
parse-xml("<a><b/><c/></a>"),
parse-xml("<a><c/><b/></a>"),
options := { 'unordered-elements': #a }
) |
| Result: | true() (The |
| Expression: | deep-equal(
parse-xml("<para style='bold'><span>x</span></para>"),
parse-xml("<para style=' bold'> <span>x</span></para>")
) |
| Result: | false() (By default, both the leading whitespace in the |
| Expression: | deep-equal(
parse-xml("<para style='bold'><span>x</span></para>"),
parse-xml("<para style=' bold'> <span>x</span></para>"),
options := { 'whitespace': 'normalize' }
) |
| Result: | true() (The |
| Expression: | deep-equal(
(1, 2, 3),
(1.0007, 1.9998, 3.0005),
options := { 'items-equal': fn($x, $y) {
if (($x, $y) instance of xs:numeric+) {
abs($x - $y) lt 0.001
}
} }
) |
| Result: | true() (For numeric values, the callback function tests whether they are approximately equal. For any other items, it returns anthe empty sequence, so the normal comparison rules apply.) |
| Expression: | deep-equal(
(1, 2, 3, 4, 5),
(1, 2, 3, 8, 5),
options := { 'items-equal': fn($x, $y) {
trace((), `comparing { $x } and { $y }`)
} }
) |
| Result: | false() (The callback function traces which items are being compared, without changing the result of the comparison.) |
Returns the values that appear in a sequence more than once.
fn:duplicate-values( | ||
$values | as , | |
$collation | as | := fn:default-collation() |
) as | ||
The one-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations, and implicit timezone.
The two-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations, and static base URI, and implicit timezone.
The items of $values are compared against each other, according to the rules of fn:distinct-values and with $collcollation as the collation selected according to the rules in 5.3.7 Choosing a collation.
From each resulting set of values that are considered equal, one value will be returned if the set contains more than one value.
Specifically, the function returns those items in $values that are contextually equal to exactly one item appearing earlier in the sequence.
This means that the ordering of the result is as follows:
For any set of values that compare equal, the one that is returned is the one that appears second in $values.
The items that are returned appear in the order of their second appearance within $values.
The effect of the function is equivalent to the result of the following XPath expression.
filter(
$values,
fn($item, $pos) {
count(
filter(
subsequence($values, 1, $pos - 1),
deep-equal(?, $item, $collation)
)
) eq 1
}
)The comparison rules are exactly the same as the fn:distinct-values function.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
(The string |
Raise an error for duplicates in an ID sequence: | |
let $ids := duplicate-values(//@id) where exists($ids) return error((), 'Duplicate IDs found: ' || string-join($ids, ', ')) | |
The following functions take function items as an argument.
| Function | Meaning |
|---|---|
fn:apply | Makes a dynamic call on a function with an argument list supplied in the form of an array. |
fn:do-until | Processes a supplied value repeatedly, continuing when some condition is false, and returning the value that satisfies the condition. |
fn:every | Returns true if every item in the input sequence matches a supplied predicate. |
fn:filter | Returns those items from the sequence $input for which the supplied function $predicate returns true. |
fn:fold-left | Processes the supplied sequence from left to right, applying the supplied function repeatedly to each item in turn, together with an accumulated result value. |
fn:fold-right | Processes the supplied sequence from right to left, applying the supplied function repeatedly to each item in turn, together with an accumulated result value. |
fn:for-each | Applies the function item $action to every item from the sequence $input in turn, returning the concatenation of the resulting sequences in order. |
fn:for-each-pair | Applies the function item $action to successive pairs of items taken one from $input1 and one from $input2, returning the concatenation of the resulting sequences in order. |
fn:highest | Returns a value that is greater than or equal to every other value appearing in the input sequence. |
fn:index-where | Returns the positions in an input sequence of items that match a supplied predicate. |
fn:lowest | Returns those items from a supplied sequence that have the lowest value of a sort key, where the sort key can be computed using a caller-supplied function. |
fn:partial-apply | Performs partial application of a function item by binding values to selected arguments. |
fn:partition | Partitions a sequence of items into a sequence of non-empty arrays containing the same items, starting a new partition when a supplied condition is true. |
fn:scan-left | Produces the sequence of successive partial results from the evaluation of fn:fold-left with the same arguments. |
fn:scan-right | Produces the sequence of successive partial results from the evaluation of fn:fold-right with the same arguments. |
fn:some | Returns true if at least one item in the input sequence matches a supplied predicate. |
fn:sort | Sorts a supplied sequence, based on the value of a sort key supplied as a function. |
fn:sort-by | Sorts a supplied sequence, based on the value of a number of sort keys supplied as functions. |
fn:sort-with | Sorts a supplied sequence, according to the order induced by the supplied comparator functions. |
fn:subsequence-where | Returns a contiguous sequence of items from $input, with the start and end points located by applying predicates. |
fn:take-while | Returns items from the input sequence prior to the first one that fails to match a supplied predicate. |
fn:transitive-closure | Returns all the GNodes reachable from a given start GNode by applying a supplied function repeatedly. |
fn:while-do | Processes a supplied value repeatedly, continuing while some condition remains true, and returning the first value that does not satisfy the condition. |
With all these functions, if the caller-supplied function fails with a dynamic error, this error is propagated as an error from the higher-order function itself.
Processes a supplied value repeatedly, continuing when some condition is false, and returning the value that satisfies the condition.
fn:do-until( | ||
$input | as , | |
$action | as , | |
$predicate | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
Informally, the function behaves as follows:
$pos is initially set to 1.
$action($input, $pos) is evaluated, and the resulting value is used as a new $input.
$predicate($input, $pos) is evaluated. If the result is true, the function returns the value of $input. Otherwise, the process repeats from step 2 with $pos incremented by 1.
When the predicate returns anthe empty sequence, this is treated as false.
The function delivers the same result as the following XQuery implementation.
declare %private function do-until-helper(
$input as item()*,
$action as fn(item()*, xs:integer) as item()*,
$predicate as fn(item()*, xs:integer) as xs:boolean?,
$pos as xs:integer
) as item()* {
let $result := $action($input, $pos)
return if ($predicate($result, $pos)) then (
$result
) else (
do-until-helper($result, $action, $predicate, $pos + 1)
)
};
declare function do-until(
$input as item()*,
$action as fn(item()*, xs:integer) as item()*,
$predicate as fn(item()*, xs:integer) as xs:boolean?
) as item()* {
do-until-helper($input, $action, $predicate, 1)
};Do-until loops are very common in procedural programming languages, and this function provides a way to write functionally clean and interruptible iterations without side-effects. A new value is computed and tested until a given condition fails. Depending on the use case, the value can be a simple atomic item or an arbitrarily complex data structure.
The function fn:while-do can be used to perform the action after the first predicate test.
Note that, just as when writing recursive functions, it is easy to construct infinite loops.
| Expression: | do-until(
(),
fn($value, $pos) { $value, $pos * $pos },
fn($value) { foot($value) > 50 }
) |
|---|---|
| Result: | 1, 4, 9, 16, 25, 36, 49, 64 (The loop is interrupted once the last value of the generated sequence is greater than 50.) |
| Expression: | do-until(
(1, 0),
fn($value) { $value[1] + $value[2], $value },
fn($value) { avg($value) > 10 }
) |
| Result: | 55, 34, 21, 13, 8, 5, 3, 2, 1, 1, 0 (The computation is continued as long as the average of the first Fibonacci numbers is smaller than 10.) |
Returns true if every item in the input sequence matches a supplied predicate.
fn:every( | ||
$input | as , | |
$predicate | as | := fn:boolean#1 |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function returns true if $input is empty, or if $predicate($item, $pos) returns true for every item $item at position $pos (1-based) in $input.
The effect of the function is equivalent to the result of the following XPath expression.
count(filter($input, $predicate)) = count($input)
An error is raised if the $predicate function raises an error. In particular, when the default predicate fn:boolean#1 is used, an error is raised if an item has no effective boolean value.
If the second argument is omitted or anthe empty sequence, the predicate defaults to fn:boolean#1, which takes the effective boolean value of each item.
It is possible for the supplied $predicate to be a function whose arity is less than two. The coercion rules mean that the additional parameters are effectively ignored. Frequently a predicate function will only consider the item itself, and disregard its position in the sequence.
The predicate is required to return either true, false, or anthe empty sequence (which is treated as false). A predicate such as fn { self::h1 } results in a type error because it returns a node, not a boolean.
The implementation may deliver a result as soon as one item is found for which the predicate returns false; it is not required to evaluate the predicate for every item, nor is it required to examine items sequentially from left to right.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | every(
("January", "February", "March", "April",
"September", "October", "November", "December"),
contains(?, "r")
) |
| Result: | true() |
| Expression: | every(
("January", "February", "March", "April",
"September", "October", "November", "December")
=!> contains("r")
) |
| Result: | true() |
| Expression: |
|
| Result: |
(The effective boolean value of NaN is false.) |
| Expression: |
|
| Result: |
|
| Expression: | let $dl := <dl><dt>Morgawr</dt><dd>Sea giant</dd></dl>
return every($dl/*, fn($elem, $pos) {
name($elem) = (
if (($pos mod 2)) then "dt" else "dd"
)
}) |
| Result: | true() |
Returns those items from the sequence $input for which the supplied function $predicate returns true.
fn:filter( | ||
$input | as , | |
$predicate | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function returns a sequence containing those items from $input for which $predicate($item, $pos) returns true, where $item is the item in question, and $pos is its 1-based ordinal position within $input.
The effect of the function is equivalent to the result of the following XPath expression.
for-each(
$input,
fn($item, $pos) { if ($predicate($item, $pos)) { $item } }
)As a consequence of the function signature and the function calling rules, a type error occurs if the supplied $predicate function returns anything other than a single xs:boolean item or anthe empty sequence; there is no conversion to an effective boolean value, but anthe empty sequence is interpreted as false.
If $predicate is an arity-1 function, the function call fn:filter($input, $predicate) has a very similar effect to the expression $input[$predicate(.)]. There are some differences, however. In the case of fn:filter, the function $F is required to return an optional boolean; there is no special treatment for numeric predicate values, and no conversion to an effective boolean value. Also, with a filter expression $input[$predicate(.)], the focus within the predicate is different from that outside; this means that the use of a context-sensitive function such as fn:lang#1 will give different results in the two cases.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: | filter(parse-xml('<doc><a id="2"/><a/></doc>')//a, fn { @id eq "2" }) |
| Result: | <a id="2"/> (The function returns |
| Expression: |
|
| Result: |
|
| Expression: | let $sequence := (1, 1, 2, 3, 4, 4, 5)
return filter(
$sequence,
fn($item, $pos) { $item = $sequence[$pos - 1] }
) |
| Result: | 1, 4 |
Returns the positions in an input sequence of items that match a supplied predicate.
fn:index-where( | ||
$input | as , | |
$predicate | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The result of the function is a sequence of integers, in monotonic ascending order, representing the 1-based positions in the input sequence of those items for which the supplied predicate function returns true. A return value of () from the predicate function is treated as false.
The effect of the function is equivalent to the result of the following XPath expression.
for-each(
$input,
fn($item, $pos) { if ($predicate($item, $pos)) { $pos } }
)| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | index-where(
("January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"),
contains(?, "r")
) |
| Result: | 1, 2, 3, 4, 9, 10, 11, 12 |
| Expression: | index-where(
( 1, 8, 2, 7, 3 ),
fn($item, $pos) { $item < 5 and $pos > 2 }
) |
| Result: | 3, 5 |
Returns those items from a supplied sequence that have the lowest value of a sort key, where the sort key can be computed using a caller-supplied function.
fn:lowest( | ||
$input | as , | |
$collation | as | := fn:default-collation(), |
$key | as | := fn:data#1 |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on collations.
The second argument, $collation, defaults to ().
Supplying anthe empty sequence as $collation is equivalent to supplying fn:default-collation(). For more information on collations see 5.3.7 Choosing a collation.
The third argument defaults to the function data#1.
Let $modified-key be the function:
fn($item) {
$key($item) => data() ! (
if (. instance of xs:untypedAtomic) then xs:double(.) else .
)
}That is, the supplied function for computing key values is wrapped in a function that converts any xs:untypedAtomic values in its result to xs:double. This makes the function consistent with the behavior of fn:min and fn:max, but inconsistent with fn:sort, which treats untyped values as strings.
The result of the function is obtained as follows:
If the input is anthe empty sequence, the result is anthe empty sequence.
The input sequence is sorted, by applying the function fn:sort($input, $collation, $modified-key).
Let $C be the selected collation, or the default collation where applicable.
Let $B be the first item in the sorted sequence.
The function returns those items $A from the input sequence that are contextually equal to $B, retaining their order.
If the set of computed keys contains xs:untypedAtomic values that are not castable to xs:double then the operation will fail with a dynamic error ([err:FORG0001]FO).
If the set of computed keys contains values that are not comparable using the lt operator then the sort operation will fail with a type error ([err:XPTY0004]XP).
| Variables | |
|---|---|
let $e := <a x="10" y="5" z="2"/> | |
| Expression: |
|
|---|---|
| Result: |
(By default, untyped values are compared as numbers.) |
| Expression: |
|
| Result: |
(Here, the attribute values are compared as strings.) |
| Expression: |
|
| Result: |
|
| Expression: | lowest(
("red", "green", "blue"),
key := {
"red" : xs:hexBinary('FF0000'),
"green": xs:hexBinary('008000'),
"blue" : xs:hexBinary('0000FF')
}
) |
| Result: | "blue" |
| Expression: | lowest(
("April", "June", "July", "August"),
key := string-length#1
) |
| Result: | "June", "July" |
| Expression: |
|
| Result: |
|
To find employees having the lowest salary: | |
lowest($employees, (), fn { xs:decimal(salary) }) | |
Performs partial application of a function item by binding values to selected arguments.
fn:partial-apply( | ||
$function | as , | |
$arguments | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The result is a function obtained by binding values to selected arguments of the function item $function. The arguments to be bound are represented by entries in the $arguments map: an entry with key $i and value $v causes the argument at position $i (1-based) to be bound to $v.
Any entries in $arguments whose keys are greater than the arity of $function are ignored.
If $arguments is anthe empty map then the function returns $function unchanged.
For example, the effect of calling fn:partial-apply($f, { 2: $x }) is the same as the effect of the partial appplication $f(?, $x, ?, ?, ....). The coercion rules are applied to the supplied arguments in the usual way.
Unlike a partial application using place-holder arguments:
The arity of $function need not be statically known.
It is possible to bind all the arguments of $function: the effect is to return a zero-arity function.
The result is a partially applied functionXP having the following properties (which are defined in [XQuery and XPath Data Model (XDM) 4.0] section 8.1 Function Items):
name: absent.
identity: A new function identity distinct from the identity of any other function item.
Note:
See also [XML Path Language (XPath) 4.0] section 4.6.7 Function Identity.
arity: The arity of $function minus the number of parameters in $function that map to supplied arguments in $arguments.
parameter names: The names of the parameters of $function that do not map to supplied arguments in $arguments.
signature: The parameters in the returned function are the parameters of $function that do not map to supplied arguments in $arguments, retaining order. The result type of the returned function is the same as the result type of $function.
An implementation that can determine a more specific signature (for example, through use of type analysis) is permitted to do so.
body: The body of $function.
captured context: The static and dynamic context of $function, augmented, for each supplied argument, with a binding of the converted argument value to the corresponding parameter name.
A type error is raised if any of the supplied arguments, after applying the coercion rules, does not match the required type of the corresponding function parameter.
In addition, a dynamic error may be raised if any of the supplied arguments does not match other constraints on the value of that argument (for example, if the value supplied for a parameter expecting a regular expression is not a valid regular expression); or if the processor is able to establish that evaluation of the resulting function will fail for any other reason (for example, if an error is raised while evaluating a subexpression in the function body that depends only on explicitly supplied and defaulted parameters).
See also [XML Path Language (XPath) 4.0] section 4.6.4 Partial Function Application.
The function is useful where the arity of a function item is not known statically, or where all arguments in a function are to be bound, returning a zero-arity function.
| Expression: | let $f := partial-apply(dateTime#2, {2: xs:time('00:00:00') })
return $f(xs:date('2025-03-01')) |
|---|---|
| Result: | xs:dateTime('2025-03-01T00:00:00') |
Returns true if at least one item in the input sequence matches a supplied predicate.
fn:some( | ||
$input | as , | |
$predicate | as | := fn:boolean#1 |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function returns true if (and only if) there is an item $item at position $pos in the input sequence such that $predicate($item, $pos) returns true.
The effect of the function is equivalent to the result of the following XPath expression.
exists(filter($input, $predicate))
An error is raised if the $predicate function raises an error. In particular, when the default predicate fn:boolean#1 is used, an error is raised if an item has no effective boolean value.
If the second argument is omitted or anthe empty sequence, the predicate defaults to fn:boolean#1, which takes the effective boolean value of each item.
It is possible for the supplied $predicate to be a function whose arity is less than two. The coercion rules mean that the additional parameters are effectively ignored. Frequently a predicate function will only consider the item itself, and disregard its position in the sequence.
The predicate is required to return either true, false, or anthe empty sequence (which is treated as false). A predicate such as fn { self::h1 } results in a type error because it returns a node, not a boolean.
The implementation may deliver a result as soon as one item is found for which the predicate returns true; it is not required to evaluate the predicate for every item, nor is it required to examine items sequentially from left to right.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | some(
("January", "February", "March", "April",
"September", "October", "November", "December"),
contains(?, "z")
) |
| Result: | false() |
| Expression: | some(
("January", "February", "March", "April",
"September", "October", "November", "December")
=!> contains("r")
) |
| Result: | true() |
| Expression: |
|
| Result: |
(The effective boolean value in each case is false.) |
| Expression: |
|
| Result: |
|
Sorts a supplied sequence, based on the value of a number of sort keys supplied as functions.
fn:sort-by( | ||
$input | as , | |
$keys | as | |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on collations.
The result of the function is a sequence that contains all the items from $input, typically in a different order, the order being defined by the supplied sort key definitions.
A sort key definition is a record with three parts:
key: A sort key function, which is applied to each item in the input sequence to determine a sort key value. If no function is supplied, the default is fn:data#1, which atomizes the item.
collation: A collation, which is used when comparing sort key values that are of type xs:string or xs:untypedAtomic. If no collation is supplied, the default collation from the static context is used.
When comparing values of types other than xs:string or xs:untypedAtomic, the collation is ignored (but an error may be reported if it is invalid). For more information see 5.3.7 Choosing a collation.
order: An order direction, either "ascending" or "descending". The default is "ascending".
The number of sort key definitions is determined by the number of records supplied in the $keys argument. If the argument is absent or empty, the default is a single sort key definition using the function data#1, using the default collation from the static context, and with order ascending.
The result of the fn:sort-by function is obtained as follows:
The result sequence contains the same items as the input sequence $input, but generally in a different order.
The sort key definitions are established as described above. The sort key definitions are in major-to-minor order. That is, the position of two items $A and $B in the result sequence is determined first by the relative magnitude of their primary sort key values, which are computed by evaluating the sort key function in the first sort key definition. If those two sort key values are equal, then the position is determined by the relative magnitude of their secondary sort key values, computed by evaluating the sort key function in the second sort key definition, and so on.
When a pair of corresponding sort key values of $A and $B are found to be not equal, then $A precedes $B in the result sequence if both the following conditions are true, or if both conditions are false:
The sort key value for $A is less than the sort key value for $B, as defined below.
The order direction in the corresponding sort key definition is "ascending".
If all the sort key values for $A and $B are pairwise equal, then $A precedes $B in the result sequence if and only if $A precedes $B in the input sequence.
Note:
That is, the sort is stable.
Each sort key value for a given item is obtained by applying the sort key function of the corresponding sort key definition to that item. The result of this function is in the general case a sequence of atomic items. Two sort key values $a and $b are compared as follows:
Let $C be the collation in the corresponding sort key definition.
Let $REL be the result of evaluating op:lexicographic-compare($key($A), $key($B), $C) where op:lexicographic-compare($a, $b, $C) is defined as follows:
if (empty($a) and empty($b)) then 0
else if (empty($a)) then -1
else if (empty($b)) then +1
else let $rel = op:simple-compare(head($a), head($b), $C)
return if ($rel eq 0)
then op:lexicographic-compare(tail($a), tail($b), $C)
else $relHere op:simple-compare($k1, $k2) is defined as follows:
if ($k1 instance of (xs:string | xs:anyURI | xs:untypedAtomic)
and $k2 instance of (xs:string | xs:anyURI | xs:untypedAtomic))
then compare($k1, $k2, $C)
else if ($k1 instance of xs:numeric and $k2 instance of xs:numeric)
then compare($k1, $k2)
else if ($k1 eq $k2) then 0
else if ($k2 lt $k2) then -1
else +1Note:
This raises an error if two keys are not comparable, for example if one is a string and the other is a number, or if both belong to a non-ordered type such as xs:QName.
If $REL is zero, then the two sort key values are deemed equal; if $REL is -1 then $a is deemed less than $b, and if $REL is +1 then $a is deemed greater than $b
If the set of computed sort keys contains values that are not comparable using the lt operator then the sort operation will fail with a type error ([err:XPTY0004]XP).
The function is a generalization of the fn:sort function available in 3.1, which is retained for compatibility. The enhancements allow multiple sort keys to be defined, each potentially with a different collation, and allow sorting in descending order.
If the sort key for an item evaluates to anthe empty sequence, the effect of the rules is that this item precedes any value for which the key is non-empty. This is equivalent to the effect of the XQuery option empty least. The effect of the option empty greatest can be achieved by adding an extra sort key definition with { 'key': fn { empty(K(.) } }: when comparing boolean sort keys, false precedes true.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
To sort a set of strings | |
let $SWEDISH := collation({ 'lang': 'se' })
return sort-by($in, { 'collation': $SWEDISH }) | |
To sort a sequence of employees by last name as the major sort key and first name as the minor sort key, using the default collation: | |
sort-by($employees, { 'key': fn { name ! (last, first) } }) | |
To sort a sequence of employees first by increasing last name (using Swedish collation order) and then by decreasing salary: | |
sort-by(
$employees,
({ 'key': fn { name/last }, 'collation': collation({ 'lang': 'se' }) },
{ 'key': fn { xs:decimal(salary) }, 'order': 'descending' })) | |
Returns items from the input sequence prior to the first one that fails to match a supplied predicate.
fn:take-while( | ||
$input | as , | |
$predicate | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function returns all items in the sequence prior to the first one where the result of calling the supplied $predicate function, with the current item and its position as arguments, returns the value false or ().
If every item in the sequence satisfies the predicate, then $input is returned in its entirety.
The effect of the function is equivalent to the result of the following XQuery expression.
for $item at $pos in $input while $predicate($item, $pos) return $item
There is no analogous drop-while or skip-while function, as found in some functional programming languages. The effect of drop-while($input, $predicate) can be achieved by calling fn:subsequence-where($input, fn { not($predicate(.)) }).
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | take-while(
("A", "B", "C", " ", "E"),
fn { boolean(normalize-space()) }
) |
| Result: | "A", "B", "C" |
| Expression: | parse-xml("<doc><p/><p/><h2/><img/><p/></doc>")/doc/*
=> take-while(fn { boolean(self::p) })
=> count() |
| Result: | 2 |
| Expression: | ("Aardvark", "Antelope", "Bison", "Buffalo", "Camel", "Dingo")
=> take-while(starts-with(?, "A")) |
| Result: | "Aardvark", "Antelope" |
| Expression: | take-while(10 to 20, fn($num, $pos) { $num lt 18 and $pos lt 4 }) |
| Result: | 10, 11, 12 |
| Expression: | take-while(
characters("ABCD-123"),
fn($ch, $pos) { $pos lt 4 and $ch ne '-' }
) => string-join() |
| Result: | "ABC" |
| Expression: | take-while(
("A", "a", "B", "b", "C", "D", "d"),
fn($ch, $pos) {
matches($ch, if ($pos mod 2 eq 1) then "\p{Lu}" else "\p{Ll}")
}
) |
| Result: | "A", "a", "B", "b", "C" |
Returns all the GNodes reachable from a given start GNode by applying a supplied function repeatedly.
fn:transitive-closure( | ||
$node | as , | |
$step | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function works with both XNodes and JNodes.
The value of $node is a node from which navigation starts. If $node is an empty sequence, the function returns anthe empty sequence.
The value of $step is a function that takes a single GNode as input, and returns a set of GNodes as its result.
The result of the fn:transitive-closure function is the set of GNodes that are reachable from $node by applying the $step function one or more times.
Although $step may return any sequence of GNodes, the result is treated as a set: the order of GNodes in the sequence is ignored, and duplicates are ignored. The result of of the transitive-closure function will always be a sequence of GNodes in document order with no duplicates.
The function delivers the same result as the following XQuery implementation.
declare %private function tc-inclusive(
$nodes as gnode()*,
$step as fn(gnode()) as gnode()*
) as gnode()* {
let $nextStep := $nodes/$step(.)
let $newNodes := $nextStep except $nodes
return if (exists($newNodes))
then $nodes union tc-inclusive($newNodes, $step)
else $nodes
};
declare function transitive-closure (
$node as gnode(),
$step as fn(gnode()) as gnode()*
) as gnode()* {
tc-inclusive($node/$step(.), $step)
};
(: Explanation:
The private helper function tc-inclusive takes a set of GNodes as input,
and calls the $step function on each one of those GNodes; if the result
includes GNodes that are not already present in the input, then it makes
a recursive call to find GNodes reachable from these new GNodes, and returns
the union of the supplied GNodes and the GNodes returned from the recursive
call (which will always include the new GNodes selected in the first step).
If there are no new GNodes, the recursion ends, returning the GNodes that
have been found up to this point.
The main function fn:transitive-closure finds the nodes that are reachable
from the start GNodes in a single step, and then invokes the helper function
tc-inclusive to add GNodes that are reachable in multiple steps.
:)Cycles in the data are not a problem; the function stops searching when it finds no new GNodes.
The function may fail to terminate if the supplied $step function constructs and returns new GNodes. A processor may detect this condition but is not required to do so.
The $node GNodes is not included in the result, unless it is reachable by applying the $step function one or more times. If a result is required that does include $node, it can be readily added to the result using the union operator: $node | transitive-closure($node, $step).
| Variables | |
|---|---|
let $data := document { <doc>
<person id="0"/>
<person id="1" manager="0"/>
<person id="2" manager="0"/>
<person id="3" manager="2"/>
<person id="4" manager="2"/>
<person id="5" manager="1"/>
<person id="6" manager="3"/>
<person id="7" manager="6"/>
<person id="8" manager="6"/>
</doc> } | |
let $direct-reports := fn($p as element(person)) as element(person)* {
$p/../person[@manager = $p/@id]
} | |
| Expression: | transitive-closure( $data//person[@id = "2"], $direct-reports )/string(@id) |
|---|---|
| Result: | "3", "4", "6", "7", "8" |
| Expression: | transitive-closure(
$data,
function { child::* }
)/@id ! string() |
| Result: | "0", "1", "2", "3", "4", "5", "6", "7", "8" |
The following example, given | |
transitive-closure($root, fn { document(//(xsl:import|xsl:include)/@href) })
=!> document-uri() | |
This example uses the XSLT-defined | |
Processes a supplied value repeatedly, continuing while some condition remains true, and returning the first value that does not satisfy the condition.
fn:while-do( | ||
$input | as , | |
$predicate | as , | |
$action | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
Informally, the function behaves as follows:
$pos is initially set to 1.
$predicate($input, $pos) is evaluated. If the result is false or (), the function returns the value of $input.
Otherwise, $action($input, $pos) is evaluated, the resulting value is used as a new $input, and the process repeats from step 2 with $pos incremented by 1.
The function delivers the same result as the following XQuery implementation.
declare %private function while-do-helper(
$input as item()*,
$predicate as fn(item()*, xs:integer) as xs:boolean?,
$action as fn(item()*, xs:integer) as item()*,
$pos as xs:integer
) as item()* {
if ($predicate($input, $pos))
then while-do-helper($action($input, $pos), $predicate, $action, $pos + 1)
else $input
};
declare function while-do(
$input as item()*,
$predicate as fn(item()*, xs:integer) as xs:boolean?,
$action as fn(item()*, xs:integer) as item()*
) as item()* {
while-do-helper($input, $predicate, $action, 1)
};While-do loops are very common in procedural programming languages, and this function provides a way to write functionally clean and interruptible iterations without side-effects. As long as a given condition is met, an new value is computed and tested again. Depending on the use case, the value can be a simple atomic item or an arbitrarily complex data structure.
The function fn:do-until can be used to perform the action before the first predicate test.
Note that, just as when writing recursive functions, it is easy to construct infinite loops.
| Expression: | while-do(2, fn { . <= 100 }, fn { . * . }) |
|---|---|
| Result: | 256 (The loop is interrupted as soon as the computed product is greater than 100.) |
| Expression: | while-do(
1,
fn($num, $pos) { $pos <= 10 },
fn($num, $pos) { $num * $pos }
) |
| Result: | 3628800 (This returns the factorial of 10, i.e., the product of all integers from 1 to 10.) |
| Expression: | let $input := (0 to 4, 6 to 10)
return while-do(
0,
fn($n) { $n = $input },
fn($n) { $n + 1 }
) |
| Result: | 5 (This returns the first positive number missing in a sequence.) |
| Expression: | while-do(
1 to 9,
fn($value) { head($value) < 5 },
fn($value) { tail($value) }
) |
| Result: | 5, 6, 7, 8, 9 (The first number of a sequence is removed as long as it is smaller than 5.) |
| Expression: | let $input := 3936256
return while-do(
$input,
fn($result) { abs($result * $result - $input) >= 0.0000000001 },
fn($guess) { ($guess + $input div $guess) div 2 }
) => round(5) |
| Result: | 1984 (This computes the square root of a number.) |
The following example generates random doubles. It is interrupted once a number exceeds a given limit: | |
let $result := while-do(
random-number-generator(),
fn($random) {
$random?number < 0.8
},
fn($random) {
map:put($random?next(), 'numbers', ($random?numbers, $random?number))
}
)
return $result?numbers | |
This section specifies arithmetic operators on the numeric datatypes defined in [XML Schema Part 2: Datatypes Second Edition].
The following functions are defined on numeric types. Each function returns a value of the same type as the type of its argument.
If the argument is the empty sequence, the empty sequence is returned.
For xs:float and xs:double arguments, if the argument is NaN, NaN is returned.
With the exception of fn:abs, functions with arguments of type xs:float and xs:double that are positive or negative infinity return positive or negative infinity.
| Function | Meaning |
|---|---|
fn:abs | Returns the absolute value of $value. |
fn:ceiling | Rounds $value upwards to a whole number. |
fn:divide-decimals | Divides one xs:decimal by another to a defined precision, returning both the quotient and the remainder. |
fn:floor | Rounds $value downwards to a whole number. |
fn:is-NaN | Returns true if the argument is the xs:float or xs:double value NaN. |
fn:round | Rounds a value to a specified number of decimal places, with control over how the rounding takes place. |
fn:round-half-to-even | Rounds a value to a specified number of decimal places, rounding to make the last digit even if two such values are equally near. |
Note:
The fn:round function has been extended with a third argument in version 4.0 of this specification; this means that the fn:ceiling, fn:floor, and fn:round-half-to-even functions are now technically redundant. They are retained, however, both for backwards compatibility and for convenience.
Rounds a value to a specified number of decimal places, with control over how the rounding takes place.
fn:round( | ||
$value | as , | |
$precision | as | := 0, |
$mode | as | := 'half-to-ceiling' |
) as | ||
This function is deterministic, context-independent, and focus-independent.
General rules: see 4.4 Functions on numeric values.
The function returns a value that is close to $value and that is a multiple of ten to the power of minus $precision. The default value of $precision is zero, in which case the function returns a whole number (but not necessarily an xs:integer).
The detailed way in which rounding is performed depends on the value of $mode, as follows. Here L means the highest multiple of ten to the power of minus $precision that is less than or equal to $value, U means the lowest multiple of ten to the power of minus $precision that is greater than or equal to $value, N means the multiple of ten to the power of minus $precision that is numerically closest to $value, and midway means that $value is equal to the arithmetic mean of L and U.
| Rounding Mode | Meaning |
|---|---|
| Returns L. |
| Returns U. |
| Returns L if |
| Returns U if |
| Returns N, unless midway, in which case L. |
| Returns N, unless midway, in which case U. This is the default. |
| Returns N, unless midway, in which case it returns L if |
| Returns N, unless midway, in which case it returns U if |
| Returns N, unless midway, in which case it returns whichever of L and U has a last significant digit that is even. |
For the four types xs:float, xs:double, xs:decimal and xs:integer, it is guaranteed that if the type of $value is an instance of type T then the result will also be an instance of T. The result may also be an instance of a type derived from one of these four by restriction. For example, if $value is an instance of xs:decimal and $precision is less than one, then the result may be an instance of xs:integer.
If the second argument is omitted or is anthe empty sequence, the function produces the same result as when $precision = 0 (that is, it rounds to a whole number).
When $value is of type xs:float and xs:double:
If $value is NaN, positive or negative zero, or positive or negative infinity, then the result is the same as the argument.
For other values, the argument is cast to xs:decimal using an implementation of xs:decimal that imposes no limits on the number of digits that can be represented. The function is applied to this xs:decimal value, and the resulting xs:decimal is cast back to xs:float or xs:double as appropriate to form the function result. If the resulting xs:decimal value is zero, then positive or negative zero is returned according to the sign of $value.
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation.
This function is typically used with a non-zero $precision in financial applications where the argument is of type xs:decimal. For arguments of type xs:float and xs:double the results may be counter-intuitive. For example, consider round(35.425e0, 2). The result is not 35.43, as might be expected, but 35.42. This is because the xs:double written as 35.425e0 has an exact value equal to 35.42499999999..., which is closer to 35.42 than to 35.43.
The call round($v, 0, "floor") is equivalent to floor($v).
The call round($v, 0, "ceiling") is equivalent to ceiling($v).
The call round($v, $p, "half-to-even") is equivalent to round-half-to-even($v, $p).
| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rounds a value to a specified number of decimal places, rounding to make the last digit even if two such values are equally near.
fn:round-half-to-even( | ||
$value | as , | |
$precision | as | := 0 |
) as | ||
This function is deterministic, context-independent, and focus-independent.
General rules: see 4.4 Functions on numeric values.
The function returns the nearest (that is, numerically closest) value to $value that is a multiple of ten to the power of minus $precision. If two such values are equally near (e.g. if the fractional part in $value is exactly .500...), the function returns the one whose least significant digit is even.
For the four types xs:float, xs:double, xs:decimal and xs:integer, it is guaranteed that if the type of $value is an instance of type T then the result will also be an instance of T. The result may also be an instance of a type derived from one of these four by restriction. For example, if $value is an instance of xs:decimal and $precision is less than one, then the result may be an instance of xs:integer.
If the second argument is omitted or anthe empty sequence, the function produces the same result as the two-argument version with $precision = 0.
For arguments of type xs:float and xs:double:
If the argument is NaN, positive or negative zero, or positive or negative infinity, then the result is the same as the argument.
In all other cases, the argument is cast to xs:decimal using an implementation of xs:decimal that imposes no limits on the number of digits that can be represented. The function is applied to this xs:decimal value, and the resulting xs:decimal is cast back to xs:float or xs:double as appropriate to form the function result. If the resulting xs:decimal value is zero, then positive or negative zero is returned according to the sign of the original argument.
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation.
This function is typically used in financial applications where the argument is of type xs:decimal. For arguments of type xs:float and xs:double the results may be counter-intuitive. For example, consider round-half-to-even(xs:float(150.015), 2). The result is not 150.02 as might be expected, but 150.01. This is because the conversion of the xs:float value represented by the literal 150.015 to an xs:decimal produces the xs:decimal value 150.014999389..., which is closer to 150.01 than to 150.02.
From 4.0, the effect of this function can also be achieved by calling fn:round with the third argument set to "half-to-even".
| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
It is possible to convert strings to values of type xs:integer, xs:float, xs:decimal, or xs:double using the constructor functions described in 22 Constructor functions or using cast expressions as described in 23 Casting.
In addition the fn:number function is available to convert strings to values of type xs:double. It differs from the xs:double constructor function in that any value outside the lexical space of the xs:double datatype is converted to the xs:double value NaN.
| Function | Meaning |
|---|---|
fn:number | Returns the value indicated by $value or, if $value is not specified, the context value after atomization, converted to an xs:double. |
fn:parse-integer | Converts a string to an integer, recognizing any radix in the range 2 to 36. |
Converts a string to an integer, recognizing any radix in the range 2 to 36.
fn:parse-integer( | ||
$value | as , | |
$radix | as | := 10 |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $value is anthe empty sequence, the result is anthe empty sequence.
The supplied $radix must be in the range 2 to 36 inclusive.
The string $value is preprocessed by stripping all whitespace characters (including internal whitespace) and underscore characters.
After this process, the supplied value must consist of an optional sign (+ or -) followed by a sequence of one or more generalized digits drawn from the first $radix characters in the alphabet 0123456789abcdefghijklmnopqrstuvwxyz; upper-case alphabetics A-Z may be used in place of their lower-case equivalents.
The value of a generalized digit corresponds to its position in this alphabet.
The effect of the function is equivalent to the result of the following XPath expression, except in error cases.
let $alphabet := characters("0123456789abcdefghijklmnopqrstuvwxyz")
let $preprocessed := translate(
$value,
codepoints-to-string((9, 10, 13, 32, 95)),
""
)
let $digits := translate($preprocessed, "+-", "")
let $abs := sum(
for $char at $p in reverse(characters(lower-case($digits)))
return (index-of($alphabet, $char) - 1) * xs:integer(math:pow($radix, $p - 1))
)
return if (starts-with($preprocessed, "-")) then -$abs else +$absA dynamic error is raised [err:FORG0011] if $radix is not in the range 2 to 36.
A dynamic error is raised [err:FORG0012] if, after stripping whitespace and underscores and the optional leading sign, $value is a zero-length string, or if it contains a character that is not among the first $radix characters in the alphabet 0123456789abcdefghijklmnopqrstuvwxyz, or the upper-case equivalent of such a character.
A dynamic error is raised [err:FOCA0003] if the value of the resulting integer exceeds the implementation-dependent limit on the size of an xs:integer.
When $radix takes its default value of 10, the function delivers the same result as casting $value (after removal of whitespace and underscores) to xs:integer.
If underscores or whitespace in the input need to be rejected, then the string should first be validated, perhaps using fn:matches.
If other characters may legitimately appear in the input, for example a leading 0x, then this must first be removed by pre-processing the input.
If the input uses a different family of digits, then the value should first be converted to the required digits using fn:translate.
A string in the lexical space of xs:hexBinary will always be an acceptable input, provided it is not too long. So, for example, the expression "1DE=" => xs:base64Binary() => xs:hexBinary() => xs:string() => parse-integer(16) can be used to convert the Base 64 value 1DE= to the integer 54321, via the hexadecimal string D431.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
Alphabetic base-26 numbering systems (hexavigesimal) can be parsed via translation. Note, enumerating systems that do not assign a symbol to zero (e.g., spreadsheet columns) must be preprocessed in a different fashion. | |
| Expression: | lower-case("AAB")
=> translate("abcdefghijklmnopqrstuvwxyz", "0123456789abcdefghijklmnop")
=> parse-integer(26) |
| Result: | 1 |
Digit-based numeration systems comparable to the Arabic numbers 0 through 9 can be parsed via translation. | |
| Expression: | translate(value := '٢٠٢٣', replace := '٠١٢٣٤٥٦٧٨٩', with := '0123456789') => parse-integer() |
| Result: | 2023 |
| Function | Meaning |
|---|---|
fn:format-integer | Formats an integer according to a given picture string, using the conventions of a given natural language if specified. |
Formats an integer according to a given picture string, using the conventions of a given natural language if specified.
fn:format-integer( | ||
$value | as , | |
$picture | as , | |
$language | as | := () |
) as | ||
The two-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on default language.
The three-argument form of this function is deterministic, context-independent, and focus-independent.
If $value is anthe empty sequence, the function returns a zero-length string.
In all other cases, the $picture argument describes the format in which $value is output.
The rules that follow describe how non-negative numbers are output. If the value of $value is negative, the rules below are applied to the absolute value of $value, and a minus sign is prepended to the result.
The value of $picture consists of the following, in order:
An optional radix, which is an integer in the range 2 to 36, written using ASCII digits (0-9) without any leading zero;
A circumflex (^), which is present if the radix is present, and absent otherwise.
A circumflex is recognized as marking the presence of a radix only if (a) it is immediately preceded by an integer in the range 2 to 36, and (b) it is followed (somewhere within the primary format token) by an "X" or "x". In other cases, the circumflex is treated as a grouping separator. For example, the picture 9^000 outputs the number 2345 as "2^345", whereas 9^XXX outputs "3185". This rule is to ensure backwards compatibility.
A primary format token. This is always present and must not be zero-length.
An optional format modifier.
If the string contains one or more semicolons then the last semicolon is taken as terminating the primary format token, and everything that follows is taken as the format modifier; if the string contains no semicolon then the format modifier is taken to be absent (which is equivalent to supplying a zero-length string).
If a radix is present, then the primary format token must follow the rules for a digit-pattern.
The primary format token is classified as one of the following:
A digit-pattern made up of optional-digit-signs, mandatory-digit-signs, and grouping-separator-signs.
The optional-digit-sign is the character #.
If the radix is absent, then a mandatory-digit-sign is a character in Unicode category Nd. All mandatory-digit-signs within the format token must be from the same digit family, where a digit family is a sequence of ten consecutive characters in Unicode category Nd, having digit values 0 through 9. Within the format token, these digits are interchangeable: a three-digit number may thus be indicated equivalently by 000, 001, or 999.
If the primary format token contains at least one Unicode digit, then the primary format token is taken as a decimal digit pattern, and in this case it must match the regular expression ^((\p{Nd}|#|[^\p{N}\p{L}])+?)$. If it contains a digit but does not match this pattern, a dynamic error is raised [err:FODF1310].
If the radix (call it R) is present (including the case where an explicit radix of 10 is used), then the character used as the mandatory-digit-sign is either "x" or "X". If any mandatory-digit-sign is upper-case "X", then all mandatory-digit-signs must be upper-case "X". The digit family used in the output comprises the first R characters of the alphabet 0123456789abcdefghijklmnopqrstuvwxyz, but using upper-case letters in place of lower-case if an upper-case "X" is used as the mandatory-digit-sign.
In this case the primary format token must match the regular expression ^(([Xx#]|[^\p{N}\p{L}])+?)$
a grouping-separator-sign is a non-alphanumeric character, that is a character whose Unicode category is other than Nd, Nl, No, Lu, Ll, Lt, Lm or Lo.
Note:
If a semicolon is to be used as a grouping separator, then the primary format token as a whole must be followed by another semicolon, to ensure that the grouping separator is not mistaken as a separator between the primary format token and the format modifier.
There must be at least one mandatory-digit-sign. There may be zero or more optional-digit-signs, and (if present) these must precede all mandatory-digit-signs. There may be zero or more grouping-separator-signs. A grouping-separator-signmust not appear at the start or end of the digit-pattern, nor adjacent to another grouping-separator-sign.
The corresponding output is a number in the specified radix, using this digit family, with at least as many digits as there are mandatory-digit-signs in the format token. Thus:
A format token 1 generates the sequence 0 1 2 ... 10 11 12 ...
A format token 01 (or equivalently, 00 or 99) generates the sequence 00 01 02 ... 09 10 11 12 ... 99 100 101
A format token of U+0661 (ARABIC-INDIC DIGIT ONE, ١) generates the sequence ١ then ٢ then ٣ ...
A format token of 16^xx generates the sequence 00 01 02 03 ... 08 09 0a 0b 0c 0d 0e 0f 10 11 ...
A format token of 16^X generates the sequence 0 1 2 3 ... 8 9 A B C D E F 10 11 ...
The grouping-separator-signs are handled as follows:
The position of grouping separators within the format token, counting backwards from the last digit, indicates the position of grouping separators to appear within the formatted number, and the character used as the grouping-separator-sign within the format token indicates the character to be used as the corresponding grouping separator in the formatted number.
More specifically, the position of a grouping separator is the number of optional-digit-signs and mandatory-digit-signs appearing between the grouping separator and the right-hand end of the primary format token.
Grouping separators are defined to be regular if the following conditions apply:
There is at least one grouping separator.
Every grouping separator is the same character (call it C).
There is a positive integer G (the grouping size) such that:
The position of every grouping separator is an integer multiple of G, and
Every positive integer multiple of G that is less than the number of optional-digit-signs and mandatory-digit-signs in the primary format token is the position of a grouping separator.
The grouping separator template is a (possibly infinite) set of (position, character) pairs.
If grouping separators are regular, then the grouping separator template contains one pair of the form (n×G, C) for every positive integer n where G is the grouping size and C is the grouping character.
Otherwise (when grouping separators are not regular), the grouping separator template contains one pair of the form (P, C) for every grouping separator found in the primary formatting token, where C is the grouping separator character and P is its position.
Note:
If there are no grouping separators, then the grouping separator template is anthe empty set.
The number is formatted as follows:
Let S1 be the result of formatting the supplied number in the appropriate radix: for radix 10 this will be the value obtained by casting it to xs:string.
Let S2 be the result of padding S1 on the left with as many leading zeroes as are needed to ensure that it contains at least as many digits as the number of mandatory-digit-signs in the primary format token.
Let S3 be the result of replacing all decimal digits (0-9) in S2 with the corresponding digits from the selected digit family. (This has no effect when the selected digit family uses ASCII digits (0-9), which will always be the case if a radix is specified.)
Let S4 be the result of inserting grouping separators into S3: for every (position P, character C) pair in the grouping separator template where P is less than the number of digits in S3, insert character C into S3 at position P, counting from the right-hand end.
Let S5 be the result of converting S4 into ordinal form, if an ordinal modifier is present, as described below.
The result of the function is then S5.
The format token A, which generates the sequence A B C ... Z AA AB AC....
The format token a, which generates the sequence a b c ... z aa ab ac....
The format token i, which generates the sequence i ii iii iv v vi vii viii ix x ....
The format token I, which generates the sequence I II III IV V VI VII VIII IX X ....
The format token w, which generates numbers written as lower-case words, for example in English, one two three four ...
The format token W, which generates numbers written as upper-case words, for example in English, ONE TWO THREE FOUR ...
The format token Ww, which generates numbers written as title-case words, for example in English, One Two Three Four ...
Any other format token, which indicates a numbering sequence in which that token represents the number 1 (one) (but see the note below). It is implementation-defined which numbering sequences, additional to those listed above, are supported. If an implementation does not support a numbering sequence represented by the given token, it must use a format token of 1.
Note:
In some traditional numbering sequences additional signs are added to denote that the letters should be interpreted as numbers, for example, in ancient Greek U+0374 (DEXIA KERAIA, ʹ) and sometimes U+0375 (ARISTERI KERAIA, ͵) . These should not be included in the format token.
For all format tokens other than a digit-pattern, there may be implementation-defined lower and upper bounds on the range of numbers that can be formatted using this format token; indeed, for some numbering sequences there may be intrinsic limits. For example, the format token U+2460 (CIRCLED DIGIT ONE, ①) has a range imposed by the Unicode character repertoire — zero to 20 in Unicode versions prior to 3.2, or zero to 50 in subsequent versions. For the numbering sequences described above any upper bound imposed by the implementation must not be less than 1000 (one thousand) and any lower bound must not be greater than 1. Numbers that fall outside this range must be formatted using the format token 1.
The above expansions of numbering sequences for format tokens such as a and i are indicative but not prescriptive. There are various conventions in use for how alphabetic sequences continue when the alphabet is exhausted, and differing conventions for how roman numerals are written (for example, IV versus IIII as the representation of the number 4). Sometimes alphabetic sequences are used that omit letters such as i and o. This specification does not prescribe the detail of any sequence other than those sequences consisting entirely of decimal digits.
Many numbering sequences are language-sensitive. This applies especially to the sequence selected by the tokens w, W, and Ww. It also applies to other sequences, for example different languages using the Cyrillic alphabet use different sequences of characters, each starting with the letter U+0410 (CYRILLIC CAPITAL LETTER A, А) . In such cases, the $language argument specifies which language conventions are to be used. If the argument is specified, the value should be either anthe empty sequence or a value that would be valid for the xml:lang attribute (see [Extensible Markup Language (XML) 1.0 (Fifth Edition)]). Note that this permits the identification of sublanguages based on country codes (from ISO 3166-1) as well as identification of dialects and regions within a country.
The set of languages for which numbering is supported is implementation-defined. If the $language argument is absent, or is set to anthe empty sequence, or is invalid, or is not a language supported by the implementation, then the number is formatted using the default language from the dynamic context.
The format modifier must be a string that matches the regular expression ^([co](\(.+\))?)?[at]?$. That is, if it is present it must consist of one or more of the following, in order:
either c or o, optionally followed by a sequence of characters enclosed between parentheses, to indicate cardinal or ordinal numbering respectively, the default being cardinal numbering
either a or t, to indicate alphabetic or traditional numbering respectively, the default being implementation-defined.
If the o modifier is present, this indicates a request to output ordinal numbers rather than cardinal numbers. For example, in English, when used with the format token 1, this outputs the sequence 1st 2nd 3rd 4th ..., and when used with the format token w outputs the sequence first second third fourth ....
The string of characters between the parentheses, if present, is used to select between other possible variations of cardinal or ordinal numbering sequences. The interpretation of this string is implementation-defined. No error occurs if the implementation does not define any interpretation for the defined string.
It is implementation-defined what combinations of values of the format token, the language, and the cardinal/ordinal modifier are supported. If ordinal numbering is not supported for the combination of the format token, the language, and the string appearing in parentheses, the request is ignored and cardinal numbers are generated instead.
The use of the a or t modifier disambiguates between numbering sequences that use letters. In many languages there are two commonly used numbering sequences that use letters. One numbering sequence assigns numeric values to letters in alphabetic sequence, and the other assigns numeric values to each letter in some other manner traditional in that language. In English, these would correspond to the numbering sequences specified by the format tokens a and i. In some languages, the first member of each sequence is the same, and so the format token alone would be ambiguous. In the absence of the a or t modifier, the default is implementation-defined.
A dynamic error is raised [err:FODF1310] if the format token is invalid, that is, if it violates any mandatory rules (indicated by an emphasized must or required keyword in the above rules). For example, the error is raised if the primary format token contains a digit but does not match the required regular expression.
Note the careful distinction between conditions that are errors and conditions where fallback occurs. The principle is that an error in the syntax of the format picture will be reported by all processors, while a construct that is recognized by some implementations but not others will never result in an error, but will instead cause a fallback representation of the integer to be used.
The following notes apply when a digit-pattern is used:
If grouping-separator-signs appear at regular intervals within the format token, then the sequence is extrapolated to the left, so grouping separators will be used in the formatted number at every multiple of N. For example, if the format token is 0'000 then the number one million will be formatted as 1'000'000, while the number fifteen will be formatted as 0'015.
The only purpose of optional-digit-signs is to mark the position of grouping-separator-signs. For example, if the format token is #'##0 then the number one million will be formatted as 1'000'000, while the number fifteen will be formatted as 15. A grouping separator is included in the formatted number only if there is a digit to its left, which will only be the case if either (a) the number is large enough to require that digit, or (b) the number of mandatory-digit-signs in the format token requires insignificant leading zeros to be present.
Grouping separators are not designed for effects such as formatting a US telephone number as (365)123-9876. In general they are not suitable for such purposes because (a) only single characters are allowed, and (b) they cannot appear at the beginning or end of the number.
Numbers will never be truncated. Given the digit-pattern01, the number three hundred will be output as 300, despite the absence of any optional-digit-sign.
The following notes apply when ordinal numbering is selected using the o modifier.
In some languages, the form of numbers (especially ordinal numbers) varies depending on the grammatical context: they may have different genders and may decline with the noun that they qualify. In such cases the string appearing in parentheses after the letter c or o may be used to indicate the variation of the cardinal or ordinal number required.
The way in which the variation is indicated will depend on the conventions of the language.
For inflected languages that vary the ending of the word, the approach recommended in the previous version of this specification was to indicate the required ending, preceded by a hyphen: for example in German, appropriate values might be o(-e), o(-er), o(-es), o(-en).
Another approach, which might usefully be adopted by an implementation based on the open-source ICU localization library [ICU], or any other library making use of the Unicode Common Locale Data Repository [Unicode CLDR], is to allow the value in parentheses to be the name of a registered numbering rule set for the language in question, conventionally prefixed with a percent sign: for example, o(%spellout-ordinal-masculine), or c(%spellout-cardinal-year).
The following notes apply when the primary format token is neither a digit-pattern nor one of the seven other defined format tokens (A, a, i, I, w, W, Ww), but is an arbitrary token representing the number 1:
Unexpected results may occur for traditional numbering. For example, in an implementation that supports traditional numbering system in Greek, the example format-integer(19, "α;t") might return δπιιιι or ιθ, depending upon whether the ancient acrophonic or late antique alphabetic system is supported.
Unexpected results may also occur for alphabetic numbering. For example, in an implementation that supports alphabetic numbering system in Greek, someone writing format-integer(19, "α;a") might expect the nineteenth Greek letter, U+03C4 (GREEK SMALL LETTER TAU, τ) , but the implementation might return the eighteenth one, U+03C3 (GREEK SMALL LETTER SIGMA, σ) , because the latter is the nineteenth item in the sequence of lowercase Greek letters in Unicode (the sequence is interrupted because of the final form of the sigma, U+03C2 (GREEK SMALL LETTER FINAL SIGMA, ς) ). Because Greek never had a final capital sigma, Unicode has marked U+03A2, the eighteenth codepoint in the sequence of Greek capital letters, as reserved, to ensure that every Greek uppercase letter is always 32 codepoints less than its lowercase counterpart. Therefore, someone writing format-integer(18, "Α;a") might expect the eighteenth Greek capital letter, U+03A3 (GREEK CAPITAL LETTER SIGMA, Σ) , but an implementation might return U+03A2, the eighteenth position in the sequence of Greek capital letters, but unassigned to any character.
| Expression: |
|
|---|---|
| Result: |
|
| |
Ordinal numbering in Italian: The specification | |
1º 2º 3º 4º ... | |
The specification | |
Primo Secondo Terzo Quarto Quinto ... | |
| Expression: |
|
| Result: |
|
| |
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
This section defines a function for formatting decimal and floating point numbers.
| Function | Meaning |
|---|---|
fn:format-number | Returns a string containing a number formatted according to a given picture string and decimal format. |
Note:
This function can be used to format any numeric quantity, including an integer. For integers, however, the fn:format-integer function offers additional possibilities. Note also that the picture strings used by the two functions are not 100% compatible, though they share some options in common.
Changes in 4.0 (next | previous)
The decimal format name can now be supplied as a value of type xs:QName, as an alternative to supplying a lexical QName as an instance of xs:string. [Issue 780 PR 925 9 January 2024]
Decimal format parameters can now be supplied directly as a map in the third argument, rather than referencing a format defined in the static context. [Issues 340 1138 PRs 1049 1151 5 March 2024]
For selected properties including percent and exponent-separator, it is now possible to specify a single-character marker to be used in the picture string, together with a multi-character rendition to be used in the formatted output. [Issue 1048 PR 1250 11 June 2024]
Returns a string containing a number formatted according to a given picture string and decimal format.
fn:format-number( | ||
$value | as , | |
$picture | as , | |
$options | as | := () |
) as | ||
The two-argument form of this function is deterministic, context-independent, and focus-independent.
The three-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on decimal formats, and namespaces.
The function formats $value as a string using the picture string specified by the $picture argument and a decimal format.
The $value argument may be of any numeric data type (xs:double, xs:float, xs:decimal, or their subtypes including xs:integer). Note that if an xs:decimal is supplied, it is not automatically converted to an xs:double, as such conversion can involve a loss of precision.
If the supplied value of the $value argument is anthe empty sequence, the function behaves as if the supplied value were the xs:double value NaN.
If $options is absent, or if it is supplied as anthe empty sequence or anthe empty map, then the number is formatted using the properties of the unnamed decimal format in the static context.
For backwards compatibility reasons, the decimal format can be supplied as an instance of xs:string. If the value of the $options argument is an xs:string, then its value must be a string which after removal of leading and trailing whitespace is in the form of an EQName as defined in the XPath 4.0 grammar, that is one of the following:
A lexical QName, which is expanded using the statically known namespaces. The default namespace is not used (no prefix means no namespace).
A URIQualifiedName using the syntax Q{uri}local, where the URI can be zero-length to indicate a name in no namespace.
The effective value of the $options argument is then the map { 'format-name': $FN } where $FN is the xs:QName result of expanding this EQName.
The entries that may appear in the $options map are as follows. The option parameter conventions apply. The detailed rules for the interpretation of each option appear later.
In the table, the type xs:string (: matching '.' :) represents a single-character string, that is, a restriction of xs:string with the facet pattern=".", while the type xs:string (: matching '.|.:.*' :) indicates a string that is either a single character, or a single character followed by U+003A (COLON, :) followed by an arbitrary string. Such a property identifies two values: a single character called the marker, which is used to represent the property in the picture string; and an arbitrary string called the rendition which is used to represent in the property in the result string. In the absence of the colon the single character value is used both as the marker and the rendition.
The default value for absent options (other than format-name) is taken from a decimal format in the static context; the default values shown in the table are the values used if no specific value is assigned in the static context.
record( | |
format-name? | as (xs:NCName | xs:QName)?, |
decimal-separator? | as xs:string (: matching '.|.:.*' :), |
grouping-separator? | as xs:string (: matching '.|.:.*' :), |
exponent-separator? | as xs:string (: matching '.|.:.*' :), |
infinity? | as xs:string, |
minus-sign? | as xs:string, |
NaN? | as xs:string, |
percent? | as xs:string (: matching '.|.:.*' :), |
per-mille? | as xs:string (: matching '.|.:.*' :), |
zero-digit? | as xs:string (: matching '.' :), |
digit? | as xs:string (: matching '.' :), |
pattern-separator? | as xs:string (: matching '.' :) |
) | |
| Key | Meaning |
|---|---|
| The name of a decimal format in the static context; if absent, the unnamed decimal format in the static context is used. An xs:NCName represents the local part of an xs:QName in no namespace.
|
| The marker used to represent the decimal point in the picture string, and the rendition of the decimal point in the formatted number.
|
| The marker used to separate groups of digits in the picture string, and the rendition of the grouping separator in the formatted number.
|
| The marker used to separate the mantissa from the exponent in scientific notation in the picture string, and the rendition of the exponent separator in the formatted number.
|
| The string used to represent the value positive or negative infinity in the formatted number.
|
| The string used as a minus sign in the formatted number if there is no subpicture for formatting negative numbers.
|
| The string used to represent the value NaN in the formatted number.
|
| The marker used to indicate the presence of a percent sign in the picture string, and the rendition of the percent sign in the formatted number.
|
| marker used to indicate the presence of a per-mille sign in the picture string, and the rendition of the per-mille sign in the formatted number.
|
| Defines the characters used in the picture string to represent a mandatory digit: for example, if the zero-digit is 0 then any of the digits 0 to 9 may be used (interchangeably) in the picture string to represent a mandatory digit, and in the formatted number the characters 0 to 9 will be used to represent the digits zero to nine. The value must be a character in Unicode category Nd with decimal digit value 0 (zero).
|
| The character used in the picture string to represent an optional digit.
|
| The character used in the picture string to separate the positive and negative subpictures.
|
A base decimal format is established as follows:
If the format-name option is present, then the decimal format in the static context identified by this name.
Otherwise, the unnamed decimal format in the static context.
The base decimal format is then modified using the other entries in the supplied $options map.
The evaluation of the fn:format-number function takes place in two phases, an analysis phase described in 4.7.4 Analyzing the picture string and a formatting phase described in 4.7.5 Formatting the number.
The analysis phase takes as its inputs the picture string and the variables derived from the relevant decimal format in the static context, and produces as its output a number of variables with defined values. The formatting phase takes as its inputs the number to be formatted and the variables produced by the analysis phase, and produces as its output a string containing a formatted representation of the number.
The result of the function is the formatted string representation of the supplied number.
A dynamic error is raised [err:FODF1280] if the $options argument is supplied as an xs:string that is neither a valid lexical QName nor a valid URIQualifiedName, or if it uses a prefix that is not found in the statically known namespaces; or if the static context does not contain a declaration of a decimal format with a matching expanded QName; or if $options?format-name is present and the static context does not contain a declaration of a decimal format whose name matches $options?format-name. If the processor is able to detect the error statically (for example, when the argument is supplied as a string literal), then the processor may optionally signal this as a static error.
A dynamic error is raised [err:FODF1290] if a value of $format is not valid for the associated property, or if the properties of the decimal format resulting from a supplied $options map do not have distinct values.
A string is an ordered sequence of characters, and this specification uses terms such as “left” and “right”, “preceding” and “following” in relation to this ordering, irrespective of the position of the characters when visually rendered on some output medium. Both in the picture string and in the result string, digits with higher significance (that is, representing higher powers of ten) always precede digits with lower significance, even when the rendered text flow is from right to left.
In previous versions of XSLT and XQuery, decimal formats were typically defined in the static context using custom declarations (<xsl:decimal-format> in XSLT, declare decimal-format in XQuery) and then selected by name in a call on fn:format-number. This mechanism remains available, but in 4.0, it may be more convenient to dispense with these declarations, and instead to define a decimal format as a map bound to a global variable, which can be referenced in the $options argument of the fn:format-number call.
Alternative ways to format an xs:double as a string include:
Direct casting to string, for example using the constructor function xs:string($number)
JSON serialization, for example fn:serialize($number, {'method':'json', 'canonical':true()})
In general these will produce different results, for example in the amount of precision that is retained, and in the use of exponential (scientific) notation.
The following examples assume a default decimal format in which the chosen digits are the ASCII digits 0-9, the decimal separator is | |
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | format-number(12345, '0.0###^0', {
'exponent-separator': '^:×10^'
}) |
| Result: | "1.2345×10^4" |
| Expression: |
|
| Result: |
|
| Expression: | format-number(1234567.8, '0.000,0', {
'grouping-separator': '.',
'decimal-separator': ','
}) |
| Result: | "1.234.567,8" |
The following examples assume the existence of a decimal format named | |
| Expression: |
|
| Result: |
|
| Expression: | format-number(12345, '0,###^0', {
'format-name': 'de',
'exponent-separator': '^'
}) |
| Result: | "1,234^4" |
| Expression: | format-number(12345, '0,###^0', {
'format-name': 'de',
'exponent-separator': '^:×10^'
}) |
| Result: | "1,234×10^4" |
The following examples assume that the exponent separator in decimal format | |
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
The functions in this section perform trigonometric and other mathematical calculations on xs:double values. They are provided primarily for use in applications performing geometrical computation, for example when generating SVG graphics.
Functions are provided to support the six most commonly used trigonometric calculations: sine, cosine, and tangent, and their inverses arc sine, arc cosine, and arc tangent. Other functions such as secant, cosecant, and cotangent are not provided because they are easily computed in terms of these six.
The functions in this section (with the exception of math:pi) are specified by reference to [IEEE 754-2019], where they appear as Recommended operations in section 9. IEEE defines these functions for a variety of floating point formats; this specification defines them only for xs:double values. The IEEE specification applies with the following caveats:
IEEE states that the preferred quantum is language-defined. In this specification, it is implementation-defined.
IEEE states that certain functions should raise the inexact exception if the result is inexact. In this specification, this exception if it occurs does not result in an error. Any diagnostic information is outside the scope of this specification.
IEEE defines various rounding algorithms for inexact results, and states that the choice of rounding direction, and the mechanisms for influencing this choice, are language-defined. In this specification, the rounding direction and any mechanisms for influencing it are implementation-defined.
Certain operations (such as taking the square root of a negative number) are defined in IEEE to signal the invalid operation exception and return a quiet NaN. In this specification, such operations return NaN and do not raise an error. The same policy applies to operations (such as taking the logarithm of zero) that raise a divide-by-zero exception. Any diagnostic information is outside the scope of this specification.
Operations whose mathematical result is greater than the largest finite xs:double value are defined in IEEE to signal the overflow exception; operations whose mathematical result is closer to zero than the smallest non-zero xs:double value are similarly defined in IEEE to signal the underflow exception. The treatment of these exceptions in this specification is defined in 4.2 Arithmetic operators on numeric values.
| Function | Meaning |
|---|---|
math:pi | Returns an approximation to the mathematical constant π. |
math:e | Returns an approximation to the mathematical constant e. |
math:acos | Returns the arc cosine of the argument. |
math:asin | Returns the arc sine of the argument. |
math:atan | Returns the arc tangent of the argument. |
math:atan2 | Returns the angle in radians subtended at the origin by the point on a plane with coordinates (x, y) and the positive x-axis. |
math:cos | Returns the cosine of the argument. The argument is an angle in radians. |
math:cosh | Returns the hyperbolic cosine of the argument. |
math:exp | Returns the value of ex where x is the argument value. |
math:exp10 | Returns the value of 10x, where x is the supplied argument value. |
math:log | Returns the natural logarithm of the argument. |
math:log10 | Returns the base-ten logarithm of the argument. |
math:pow | Returns the result of raising the first argument to the power of the second. |
math:sin | Returns the sine of the argument. The argument is an angle in radians. |
math:sinh | Returns the hyperbolic sine of the argument. |
math:sqrt | Returns the non-negative square root of the argument. |
math:tan | Returns the tangent of the argument. The argument is an angle in radians. |
math:tanh | Returns the hyperbolic tangent of the argument. |
Returns the angle in radians subtended at the origin by the point on a plane with coordinates (x, y) and the positive x-axis.
math:atan2( | ||
$y | as , | |
$x | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The result is the value of atan2(y, x) as defined in the [IEEE 754-2019] specification of the atan2 function applied to 64-bit binary floating point values. The result is in the range -π to +π radians.
The treatment of the underflow exception is defined in 4.2 Arithmetic operators on numeric values. The following rules apply when the values are finite and non-zero, (subject to rules for overflow, underflow, and approximation).
If either argument is NaN then the result is NaN.
If $x is positive, then the value of atan2($y, $x) is atan($y div $x).
If $x is negative, then:
If $y is positive, then the value of atan2($y, $x) is atan($y div $x) + π.
If $y is negative, then the value of atan2($y, $x) is atan($y div $x) - π.
Some results for special values of the arguments are shown in the examples below.
| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Function | Meaning |
|---|---|
fn:random-number-generator | Returns a random number generator, which can be used to generate sequences of random numbers. |
The function makes use of the record structure defined in the next section.
Changes in 4.0 (next | previous)
The 3.1 specification suggested that every value in the result range should have the same chance of being chosen. This has been corrected to say that the distribution should be arithmetically uniform (because there are as many xs:double values between 0.01 and 0.1 as there are between 0.1 and 1.0).
Returns a random number generator, which can be used to generate sequences of random numbers.
fn:random-number-generator( | ||
$seed | as | := () |
) as random-number-generator-record | ||
This function is deterministic, context-independent, and focus-independent.
The function returns a random number generator. A random number generator is represented as a value of type random-number-generator-record, defined in 4.9.1 Record fn:random-number-generator-record.
Calling the fn:random-number-generator function with no arguments is equivalent to calling the single-argument form of the function with an implementation-dependent seed.
Calling the fn:random-number-generator function with anthe empty sequence as $seed is equivalent to calling the single-argument form of the function with an implementation-dependent seed.
If a $seed is supplied, it may be an atomic item of any type.
Both forms of the function are deterministic: calling the function twice with the same arguments, within a single execution scope, produces the same results.
The value of the number entry should be such that the distribution of numbers is uniform: for example, the probability of the number being in the range 0.1e0 to 0.2e0 is the same as the probability of its being in the range 0.8e0 to 0.9e0.
The function returned in the permute entry should be such that all permutations of the supplied sequence are equally likely to be chosen.
The map returned by the fn:random-number-generator function may contain additional entries beyond those specified here, but it must match the record type defined above. The meaning of any additional entries is implementation-defined. To avoid conflict with any future version of this specification, the keys of any such entries should start with an underscore character.
It is not meaningful to ask whether the functions returned in the next and permute functions resulting from two separate calls with the same seed are “the same function”, but the functions must be equivalent in the sense that calling them produces the same sequence of random numbers.
The repeatability of the results of function calls in different execution scopes is outside the scope of this specification. It is recommended that when the same seed is provided explicitly, the same random number sequence should be delivered even in different execution scopes; while if no seed is provided, the processor should choose a seed that is likely to be different from one execution scope to another. (The same effect can be achieved explicitly by using fn:current-dateTime() as a seed.)
The specification does not place strong conformance requirements on the actual randomness of the result; this is left to the implementation. It is desirable, for example, when generating a sequence of random numbers that the sequence should not get into a repeating loop; but the specification does not attempt to dictate this.
The following example returns a random permutation of the integers in the range | |
| |
The following example returns a 10% sample of the items in an input sequence | |
| |
The following XQuery code produces a random sequence of 200 | |
declare %public function local:random-sequence($length as xs:integer) as xs:double* {
local:random-sequence($length, random-number-generator())
};
declare %private function local:random-sequence(
$length as xs:integer,
$record as record(number as xs:double, next as fn(*), *)
) as xs:double* {
if ($length != 0) {
$record?number,
local:random-sequence($length - 1, $record?next())
}
};
local:random-sequence(200) | |
An equivalent result can be achieved with | |
tail(fold-left(
(1 to 200),
random-number-generator(),
fn($result) { head($result) ! (?next(), ?number), tail($result) }
)) |
This section specifies functions and operators on the [XML Schema Part 2: Datatypes Second Edition]xs:string datatype and the datatypes derived from it.
| Function | Meaning |
|---|---|
fn:codepoint-equal | Returns true if two strings are equal, considered codepoint-by-codepoint. |
fn:collation | Constructs a collation URI with requested properties. |
fn:collation-available | Asks whether a collation URI is recognized by the implementation, and whether it has required properties. |
fn:collation-key | Given a string value and a collation, generates an internal value called a collation key, with the property that the matching and ordering of collation keys reflects the matching and ordering of strings under the specified collation. |
fn:contains-token | Determines whether or not any of the supplied strings, when tokenized at whitespace boundaries, contains the supplied token, under the rules of the supplied collation. |
Asks whether a collation URI is recognized by the implementation, and whether it has required properties.
fn:collation-available( | ||
$collation | as , | |
$usage | as | := () |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on collations.
The first argument is a candidate collation URI.
The second argument establishes the intended usage of the collation URI. The value is a sequence containing zero or more of the following:
compare indicates that the intended purpose of the collation URI is to compare strings for equality or ordering, for example in functions such as fn:index-of, fn:deep-equal, fn:compare, and fn:sort.
key indicates that the intended purpose of the collation URI is to obtain collation keys for strings using the fn:collation-key function.
substring indicates that the intended purpose of the collation URI is to establish whether one string is a substring of another, for example in functions such as fn:contains or fn:starts-with.
The function returns true if and only if the implementation recognizes the candidate collation URI as one that can be used for each of the purposes listed in the $usage argument. If the $usage argument is absent or set to anthe empty sequence, the function returns true only if the collation is available for all purposes.
If the candidate collation is a UCA collation specifying fallback=yes, then this function will always return true: implementations are required to recognize such a collation and use fallback behavior if there is no direct equivalent available.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
The expression | |
The following functions are defined on values of type xs:string and types derived from it.
| Function | Meaning |
|---|---|
fn:char | Returns a string containing a particular character or glyph. |
fn:characters | Splits the supplied string into a sequence of single-character strings. |
fn:graphemes | Splits the supplied string into a sequence of single-grapheme strings. |
fn:concat | Returns the concatenation of the arguments, treated as sequences of strings. |
fn:string-join | Returns a string created by concatenating the items in a sequence, with a defined separator between adjacent items. |
fn:substring | Returns the part of $value beginning at the position indicated by $start and continuing for the number of characters indicated by $length. |
fn:string-length | Returns the number of characters in a string. |
fn:normalize-space | Returns $value with leading and trailing whitespace removed, and sequences of internal whitespace reduced to a single space character. |
fn:normalize-unicode | Returns $value after applying Unicode normalization. |
fn:upper-case | Converts a string to upper case. |
fn:lower-case | Converts a string to lower case. |
fn:translate | Returns $value modified by replacing or removing individual characters. |
fn:hash | Returns the results of a specified hash, checksum, or cyclic redundancy check function applied to the input. |
Notes:
When the above operators and functions are applied to datatypes derived from xs:string, they are guaranteed to return values that are instances of xs:string, but the value might or might not be an instance of the particular subtype of xs:string to which they were applied.
The strings returned by fn:concat and fn:string-join are not guaranteed to be normalized. But see note in fn:concat.
Returns a string created by concatenating the items in a sequence, with a defined separator between adjacent items.
fn:string-join( | ||
$values | as , | |
$separator | as | := "" |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If the second argument is omitted or anthe empty sequence, the effect is the same as calling the two-argument version with $separator set to a zero-length string.
The coercion rules ensure that the supplied $values argument is first converted to a sequence of atomic items by applying atomization.
The function then returns an xs:string created by casting each item in the atomized sequence to an xs:string, and then concatenating the result strings in order, using the value of $separator as a separator between adjacent strings. If $separator is the zero-length string, then the items in $values are concatenated without a separator.
If $values is the empty sequence, the function returns the zero-length string.
| Variables | |
|---|---|
let $doc := <doc><chap><section xml:id="xyz"/></chap></doc> | |
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | string-join(
('Blow, ', 'blow, ', 'thou ', 'winter ', 'wind!'),
''
) |
| Result: | "Blow, blow, thou winter wind!" |
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | $doc//@xml:id ! string-join((node-name(), '="', ., '"')) |
| Result: | 'xml:id="xyz"' |
| Expression: | $doc//section ! string-join(ancestor-or-self::*/name(), '/') |
| Result: | "doc/chap/section" |
Returns the part of $value beginning at the position indicated by $start and continuing for the number of characters indicated by $length.
fn:substring( | ||
$value | as , | |
$start | as , | |
$length | as | := () |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence, the function returns the zero-length string.
Otherwise, the function returns a string comprising those characters of $value whose index position (counting from one) is greater than or equal to $start (rounded to an integer), and (if $length is specified and non-empty) less than the sum of $start and $length (both rounded to integers).
The characters returned do not extend beyond $value. If $start is zero or negative, only those characters in positions greater than zero are returned.
More specifically, the three argument version of the function returns the characters in $value whose position $p satisfies:
fn:round($start) <= $p and $p < fn:round($start) + fn:round($length)
The two argument version of the function assumes that $length is infinite and thus returns the characters in $value whose position $p satisfies:
fn:round($start) <= $p
In the above computations, the operators such as <= and + are evaluated according to the rules of the XPath 4.0 specification.
The first character of a string is located at position 1, not position 0.
The second and third arguments allow xs:double values (rather than requiring xs:integer) in order to achieve compatibility with XPath 1.0.
A surrogate pair counts as one character, not two.
The consequences of supplying values such as NaN or positive or negative infinity for the $start or $length arguments follow from the above rules, and are not always intuitive.
| Expression | Result |
|---|---|
|
(Characters starting at position 6 to the end of |
|
(Characters at positions greater than or equal to 4 and less than 7 are selected.) |
|
(Characters at positions greater than or equal to 2 and less than 5 are selected.) |
|
(Characters at positions greater than or equal to 0 and less than 3 are selected. Since the first position is 1, these are the characters at positions 1 and 2.) |
|
(Characters at positions greater than or equal to 5 and less than 2 are selected.) |
|
(Characters at positions greater than or equal to -3 and less than 2 are selected. Since the first position is 1, this is the character at position 1.) |
|
(Since |
|
(As above.) |
|
|
|
(Characters at positions greater than or equal to -42 and less than |
|
(Since the value of |
Returns $value after applying Unicode normalization.
fn:normalize-unicode( | ||
$value | as , | |
$form | as | := "NFC" |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence, the function returns the zero-length string.
If the second argument is omitted or anthe empty sequence, the result is the same as calling the two-argument version with $form set to the string "NFC".
Otherwise, the function returns $value normalized according to the rules of the normalization form identified by the value of $form.
The effective value of $form is the value of the expression fn:upper-case(fn:normalize-space($form)).
If the effective value of $form is “NFC”, then the function returns $value converted to Unicode Normalization Form C (NFC).
If the effective value of $form is “NFD”, then the function returns $value converted to Unicode Normalization Form D (NFD).
If the effective value of $form is “NFKC”, then the function returns $value in Unicode Normalization Form KC (NFKC).
If the effective value of $form is “NFKD”, then the function returns $value converted to Unicode Normalization Form KD (NFKD).
If the effective value of $form is “FULLY-NORMALIZED”, then the function returns $value converted to fully normalized form.
If the effective value of $form is the zero-length string, no normalization is performed and $value is returned.
Normalization forms NFC, NFD, NFKC, and NFKD, and the algorithms to be used for converting a string to each of these forms, are defined in [UAX #15].
The motivation for normalization form FULLY-NORMALIZED is explained in [Character Model for the World Wide Web 1.0: Normalization]. However, as that specification did not progress beyond working draft status, the normative specification is as follows:
A string is fully-normalized if (a) it is in normalization form NFC as defined in [UAX #15], and (b) it does not start with a composing character.
A composing character is a character that is one or both of the following:
the second character in the canonical decomposition mapping of some character that is not listed in the Composition Exclusion Table defined in [UAX #15];
of non-zero canonical combining class (as defined in [The Unicode Standard]).
A string is converted to FULLY-NORMALIZED form as follows:
if the first character in the string is a composing character, prepend a single space (x20);
convert the resulting string to normalization form NFC.
Conforming implementations must support normalization form NFC and may support normalization forms NFD, NFKC, NFKD, and FULLY-NORMALIZED. They may also support other normalization forms with implementation-defined semantics.
It is implementation-defined which version of Unicode (and therefore, of the normalization algorithms and their underlying data) is supported by the implementation. See [UAX #15] for details of the stability policy regarding changes to the normalization rules in future versions of Unicode. If the input string contains codepoints that are unassigned in the relevant version of Unicode, or for which no normalization rules are defined, the fn:normalize-unicode function leaves such codepoints unchanged. If the implementation supports the requested normalization form then it must be able to handle every input string without raising an error.
A dynamic error is raised [err:FOCH0003] if the effective value of the $form argument is not one of the values supported by the implementation.
The functions described in this section make use of a regular expression syntax for pattern matching. The syntax and semantics of regular expressions are defined in this section.
| Function | Meaning |
|---|---|
fn:matches | Returns true if the supplied string matches a given regular expression. |
fn:replace | Returns a string produced from the input string by replacing any segments that match a given regular expression with a supplied replacement string, provided either literally, or by invoking a supplied function. |
fn:tokenize | Returns a sequence of strings constructed by splitting the input wherever a separator is found; the separator is any substring that matches a given regular expression. |
fn:analyze-string | Analyzes a string using a regular expression, returning an XML structure that identifies which parts of the input string matched or failed to match the regular expression, and in the case of matched substrings, which substrings matched each capturing group in the regular expression. |
Returns true if the supplied string matches a given regular expression.
fn:matches( | ||
$value | as , | |
$pattern | as , | |
$flags | as | := "" |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence, it is interpreted as the zero-length string.
If the $flags argument is omitted or if it is anthe empty sequence, the effect is the same as setting $flags to a zero-length string. Flags are defined in 6.2 Flags.
The function returns true if the set of disjoint matching segments obtained by matching $value against the regular expression $pattern, with the associated $flags, is non-empty. Otherwise, the function returns false.
A dynamic error is raised [err:FORX0002] if $pattern is invalid according to the rules described in 6.1 Regular expression syntax.
A dynamic error is raised [err:FORX0001] if $flags is invalid according to the rules described in 6.2 Flags.
Unless the metacharacters ^ and $ are used as anchors, the string is considered to match the pattern if any substring matches the pattern. But if anchors are used, the anchors must match the start/end of the string (in string mode), or the start/end of a line (in multi-line mode).
This is different from the behavior of patterns in [XML Schema Part 2: Datatypes Second Edition], where regular expressions are implicitly anchored.
Regular expression matching is defined on the basis of Unicode codepoints; it takes no account of collations.
It is valid for the regular expression to match a zero-length segment of $value. For example, the result of the expression matches($s, "") is always true, regardless of the value of $s.
| Variables | |
|---|---|
let $poem := <poem author="Wilhelm Busch"> Kaum hat dies der Hahn gesehen, Fängt er auch schon an zu krähen: Kikeriki! Kikikerikih!! Tak, tak, tak! - da kommen sie. </poem> | |
| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Returns a string produced from the input string by replacing any segments that match a given regular expression with a supplied replacement string, provided either literally, or by invoking a supplied function.
fn:replace( | ||
$value | as , | |
$pattern | as , | |
$replacement | as | := (), |
$flags | as | := '' |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence, it is interpreted as the zero-length string.
If the $flags argument is omitted or if it is anthe empty sequence, the effect is the same as setting $flags to a zero-length string. Flags are defined in 6.2 Flags.
The string $value is matched against the regular expression $pattern, using the supplied $flags, to obtain a set of disjoint matching segments. A replacement string R for each of these segments (say M) is determined by the value of the $replacement argument, by applying the first of the following rules that applies:
If $replacement is absent or empty, R is a zero-length string.
If $replacement is a function item F, then R is obtained by calling F, and then applying the function fn:string to the result.
The first argument to F is the string to be replaced, provided as xs:untypedAtomic.
The second argument to F provides the captured groups as an xs:untypedAtomic sequence. The Nth item in this sequence is the string value of the segment captured by the Nth capturing subexpression. If the Nth capturing subexpression was not matched, the Nth item will be the zero-length string.
Note that the rules for function coercion mean that the function actually supplied for F may be an arity-1 function: the second argument does not need to be declared if it is not used.
If $replacement is a string and the q flag is present, R is the value of $replacement.
Otherwise, the value of $replacement is processed as follows.
Within the supplied $replacement string, a variable marker $N (where N is an unsigned integer) may be used to refer to the Nth captured group associated with M. The replacement string R is obtained by replacing each of these variable markers with the string value of the relevant captured group. The variable marker $0 refers to the substring captured by the regular expression as a whole.
A literal $ character within the replacement string must be written as \$, and a literal \ character must be written as \\.
More specifically, the rules are as follows, where S is the number of capturing subexpressions in the regular expression, and N is the decimal number formed by taking all the digits that consecutively follow the $ character in $replacement:
If N=0, then the variable is replaced by the string value of M.
If 1<=N<=S, then the variable marker is replaced by the string value of the Nth captured group associated with M. If the Nth parenthesized sub-expression was not matched, then the variable marker is replaced by the zero-length string.
If S<N<=9, then the variable marker is replaced by the zero-length string.
Otherwise (if N>S and N>9), the last digit of N is taken to be a literal character to be included “as is” in the replacement string, and the rules are reapplied using the number N formed by stripping off this last digit.
For example, if the replacement string is "$23" and there are 5 substrings, the result contains the value of the substring that matches the second capturing subexpression, followed by the digit 3.
The function returns the xs:string that is obtained by replacing each of the disjoint matching segments of $value with the corresponding value of R.
A dynamic error is raised [err:FORX0002] if the value of $pattern is invalid according to the rules described in section 6.1 Regular expression syntax.
A dynamic error is raised [err:FORX0001] if the value of $flags is invalid according to the rules described in section 6.2 Flags.
In the absence of the q flag, a dynamic error is raised [err:FORX0004] if the value of $replacement contains a dollar sign ($) character that is not immediately followed by a digit 0-9 and not immediately preceded by a backslash (\).
In the absence of the q flag, a dynamic error is raised [err:FORX0004] if the value of $replacement contains a backslash (\) character that is not part of a \\ pair, unless it is immediately followed by a dollar sign ($) character.
A dynamic error is raised [err:FORX0005] if both the $replacement and $action arguments are supplied, and neither is anthe empty sequence.
If the input string contains no substring that matches the regular expression, the result of the function is a single string identical to the input string.
If two overlapping substrings of $value both match the $pattern, then only the first one (that is, the one whose first character comes first in the $value string) is replaced.
If two alternatives within the pattern both match at the same position in the $input, then the match that is chosen is the one matched by the first alternative. For example:
replace("abcd", "(ab)|(a)", "[1=$1][2=$2]") returns "[1=ab][2=]cd"The rules for disjoint matching segments allow a zero-length matching segment to immediately follow a non-zero-length matching segment (they are not considered to overlap). This means, for example, that the regular expression .* will typically produce two matches: one matching segment containing all the characters in the input string, and a second zero-length matching seqment at the end position of the string.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
(Only the first |
| Expression: | replace("abracadabra", "bra", upper-case#1) |
| Result: | "aBRAcadaBRA" |
| Expression: | replace("Chapter 9", "[0-9]+", fn { . + 1 }) |
| Result: | "Chapter 10" |
| Expression: | replace(
"LHR to LAX",
"\b[A-Z]{3}\b",
{ 'LAX': 'Los Angeles', 'LHR': 'London' }
) |
| Result: | "London to Los Angeles" |
| Expression: | replace(
"57°43′30″",
"([0-9]+)°([0-9]+)′([0-9]+)″",
fn($s, $groups) {
string($groups[1] + $groups[2] ÷ 60 + $groups[3] ÷ 3600) || '°'
}
) |
| Result: | "57.725°" |
Returns a sequence of strings constructed by splitting the input wherever a separator is found; the separator is any substring that matches a given regular expression.
fn:tokenize( | ||
$value | as , | |
$pattern | as | := (), |
$flags | as | := "" |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The following rules apply when the $pattern argument is omitted, or is set to anthe empty sequence:
The function splits the supplied string at whitespace boundaries.
More specifically, calling fn:tokenize($value)or fn:tokenize($value, ()) is equivalent to calling fn:tokenize(fn:normalize-space($value), ' ')) where the second argument is a single space character (x20).
The $flags argument is ignored.
The following rules apply when the $pattern argument is supplied as a single string:
If the $flags argument is omitted or if it is anthe empty sequence, the effect is the same as setting $flags to a zero-length string. Flags are defined in 6.2 Flags.
If $value is the empty sequence, or if $value is the zero-length string, the function returns the empty sequence.
The function returns a sequence of strings formed by breaking the $value string into a sequence of strings, treating any substring that matches $pattern as a separator. The separators themselves are not returned.
More specifically:
Let M0 be the sequence of disjoint matching segments that results from matching $value against $pattern in the presence of $flags.
Unless the first segment in M0 is zero-length and starts at the first character position of $value, prepend a zero-length segment that starts at the start of $value: call the result M1.
Unless the last segment in M1 is zero-length and starts at the last character position of $value (that is, the character position after the last character), append a zero-length segment that starts at the last character position of $value. Call the result M2.
For each pair of adjacent segments in M2 (say, Sn and Sn+1), construct a string (possibly zero-length) that is the substring of $value containing all characters that follow Sn and that precede Sn+1. Return this sequence of strings, in order.
A dynamic error is raised [err:FORX0002] if the value of $pattern is invalid according to the rules described in section 6.1 Regular expression syntax.
A dynamic error is raised [err:FORX0001] if the value of $flags is invalid according to the rules described in section 6.2 Flags.
If the input string is not zero length, and no separators are found in the input string, the result of the function is a single string identical to the input string.
For the one-argument form of the function:
The function has a similar effect to the two-argument form with \s+ as the separator pattern, except that the one-argument form strips leading and trailing whitespace, whereas the two-argument form delivers an extra zero-length token if leading or trailing whitespace is present.
The separator used is any sequence of tab (U+0009 (TAB) ), newline (U+000A (NEWLINE) ), carriage return (U+000D (CARRIAGE RETURN) ) or space (U+0020 (SPACE) ) characters. This is the same as the separator recognized by list-valued attributes as defined in XSD. It is not the same as the separator recognized by list-valued attributes in HTML5, which also treats form-feed (U+000C (FORM FEED) ) as whitespace. If it is necessary to treat form-feed as a separator, an explicit separator pattern should be used.
For the two-argument form of the function:
The function returns no information about the separators that were found in the string. If this information is required, the fn:analyze-string function can be used instead. Alternatively, zero-width assertions can be used to identify separators. For example, using the regular expression (?<=,) will start a new token after every comma, including the comma as part of the previous token.
If a separator occurs at the start of $value, and is not zero-length, the result sequence will start with a zero-length string. Similarly, zero-length strings will also occur in the result sequence if a non-zero-length separator occurs at the end of $value, or if two adjacent substrings match the supplied $pattern.
If two alternatives within the supplied $pattern both match at the same position in the $value string, then the match that is chosen is the first. For example:
tokenize("abracadabra", "(ab)|(a)") returns ("", "r", "c", "d", "r", "")The pattern may match zero-length segments of the input string. For example, the expression tokenize("Do not eat", "\b") returns the sequence "Do", " ", "not", " ", "eat".
A string may be split into individual characters (producing the same effect as the fn:characters function) by using the empty regular expression (for example, tokenize("xyz", ""), or any other regular expression such as .?? that matches every zero-length string, regardless of position.
Unlike the split method in some other popular languages, however, not every regular expression that matches a zero-length string produces this behavior: for example the regular expression \b splits the string before and after every word.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | tokenize( "Some unparsed <br> HTML <BR> text", "\s*<br>\s*", "i" ) |
| Result: | "Some unparsed", "HTML", "text" |
Analyzes a string using a regular expression, returning an XML structure that identifies which parts of the input string matched or failed to match the regular expression, and in the case of matched substrings, which substrings matched each capturing group in the regular expression.
fn:analyze-string( | ||
$value | as , | |
$pattern | as , | |
$flags | as | := "" |
) as | ||
This function is nondeterministic, context-independent, and focus-independent.
If the $flags argument is omitted or if it is anthe empty sequence, the effect is the same as setting $flags to a zero-length string. Flags are defined in 6.2 Flags.
If $value is the empty sequence the function behaves as if $value were the zero-length string.
The function returns an element node whose local name is analyze-string-result. This element and all its descendant elements have the namespace URI http://www.w3.org/2005/xpath-functions. The namespace prefix is implementation-dependent. The children of this element are a sequence of fn:match and fn:non-match elements. This sequence is formed by breaking the $value string into a sequence of strings, returning any substring that matches $pattern as the content of an fn:match element, and any intervening substring as the content of an fn:non-match element.
More specifically, the function starts by matching the regular expression against the string, using the supplied $flags, to obtain the disjoint matching segments. For each such segment it constructs an fn:match child, whose string value is the string value of the segment. Before, between, or after these fn:match elements, as required to ensure that the string value of the fn:analyze-string-result element is the same as $value, it inserts fn:non-match elements. The content of an fn:non-match element is always a single (non-empty) text node, and two fn:non-match elements never appear as adjacent siblings.
The captured groups for each disjoint matching segment are represented using fn:group or fn:lookahead-group children of the corresponding fn:match element. Groups captured by a subexpression within a lookahead assertion are referred to as lookahead groups; those not within a lookahead assertion are called ordinary groups.
The content of an fn:match element is in general:
A sequence of text nodes and fn:group element children, whose string-values when concatenated comprise the string value of the matching segment, followed by
A sequence of zero or more fn:lookahead-group elements, representing the lookahead groups
The string value of an fn:match element may be empty.
An fn:group element with a nr attribute having the integer value N identifies the substring captured by an ordinary group, specifically the string value of the Nth captured group. For each ordinary capturing subexpression there will be at most one corresponding fn:group element in each fn:match element in the result.
By contrast, lookahead groups are represented by fn:lookahead-group elements, which (if they appear at all) must follow all text node and fn:group element children of the fn:match element. These groups may overlap the matching and non-matching substrings, and indeed may overlap each other. They must appear in ascending numerical order of group number. The attributes of the fn:lookahead-group element are as follows:
nr: the group number, based on the position of the capturing subexpression that captured the group;
value: the string value of the segment that was captured;
position: the one-based start position of the segment within the input string.
If the function is called twice with the same arguments, it is implementation-dependent whether the two calls return the same element node or distinct (but deep equal) element nodes. In this respect it is nondeterministic with respect to node identity.
The base URI of the element nodes in the result is implementation-dependent.
A schema is defined for the structure of the returned element: see D.1 Schema for the result of fn:analyze-string.
The result of the function will always be such that validation against this schema would succeed. However, it is implementation-defined whether the result is typed or untyped, that is, whether the elements and attributes in the returned tree have type annotations that reflect the result of validating against this schema.
A dynamic error is raised [err:FORX0002] if the value of $pattern is invalid according to the rules described in section 6.1 Regular expression syntax.
A dynamic error is raised [err:FORX0001] if the value of $flags is invalid according to the rules described in section 6.2 Flags.
It is recommended that a processor that implements schema awareness should return typed nodes. The concept of “schema awareness”, however, is a matter for host languages to define and is outside the scope of the function library specification.
The declarations and definitions in the schema are not automatically available in the static context of the fn:analyze-string call (or of any other expression). The contents of the static context are host-language defined, and in some host languages are implementation-defined.
The schema defines the outermost element, analyze-string-result, in such a way that mixed content is permitted. In fact the element will only have element nodes (match and non-match) as its children, never text nodes. Although this might have originally been an oversight, defining the analyze-string-result element with mixed="true" allows it to be atomized, which is potentially useful (the atomized value will be the original input string), and the capability has therefore been retained for compatibility with the 3.0 version of this specification.
The rules for disjoint matching segments allow a zero-length matching segment to immediately follow a non-zero-length matching segment (they are not considered to overlap). This means, for example, that the regular expression .* will typically produce two matches: one matching segment containing all the characters in the input string, and a second zero-length matching seqment at the end position of the string.
In the following examples, the result document is shown in serialized form, with whitespace between the element nodes. This whitespace is not actually present in the result. | |
| Expression: |
|
|---|---|
| Result: | <analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions"> <match>The</match> <non-match> </non-match> <match>cat</match> <non-match> </non-match> <match>sat</match> <non-match> </non-match> <match>on</match> <non-match> </non-match> <match>the</match> <non-match> </non-match> <match>mat</match> <non-match>.</non-match> </analyze-string-result> (with whitespace added for legibility) |
| Expression: | analyze-string("08-12-03", "^(\d+)\-(\d+)\-(\d+)$") |
| Result: | <analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions">
<match>
<group nr="1">08</group>-<group nr="2">12</group>-<group nr="3">03</group>
</match>
</analyze-string-result>(with whitespace added for legibility) |
| Expression: | analyze-string("A1,C15,,D24, X50,", "([A-Z])([0-9]+)") |
| Result: | <analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions">
<match>
<group nr="1">A</group>
<group nr="2">1</group>
</match>
<non-match>,</non-match>
<match>
<group nr="1">C</group>
<group nr="2">15</group>
</match>
<non-match>,,</non-match>
<match>
<group nr="1">D</group>
<group nr="2">24</group>
</match>
<non-match>, </non-match>
<match>
<group nr="1">X</group>
<group nr="2">50</group>
</match>
<non-match>,</non-match>
</analyze-string-result>(with whitespace added for legibility) |
| Expression: | analyze-string("Chapter 5", "(Chapter|Appendix)(?=\s+([0-9]+))") |
| Result: | <analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions">
<match>
<group nr="1">Chapter</group>
<lookahead-group nr="2" value="5" position="9"/>
</match>
<non-match> 5</non-match>
</analyze-string-result>(with whitespace added for legibility) |
| Expression: | analyze-string("There we go", "\b(?=(\w+))") |
| Result: | <analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions"> <match><lookahead-group nr="1" value="There" position="1"/></match> <non-match>There </non-match> <match><lookahead-group nr="1" value="we" position="7"/></match> <non-match>we </non-match> <match><lookahead-group nr="1" value="go" position="10"/></match> <non-match>go</non-match> </analyze-string-result> (with whitespace added for legibility) |
This section specifies functions that manipulate URI values, either as instances of xs:anyURI or as strings.
| Function | Meaning |
|---|---|
fn:decode-from-uri | Decodes URI-escaped characters in a string. |
fn:encode-for-uri | Encodes reserved characters in a string that is intended to be used in the path segment of a URI. |
fn:escape-html-uri | Escapes a URI in the same way that HTML user agents handle attribute values expected to contain URIs. |
fn:iri-to-uri | Converts a string containing an IRI into a URI according to the rules of [RFC 3987]. |
fn:resolve-uri | Resolves a relative IRI reference against an absolute IRI. |
Resolves a relative IRI reference against an absolute IRI.
fn:resolve-uri( | ||
$href | as , | |
$base | as | := () |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on executable base URI.
The function is defined to operate on IRI references as defined in [RFC 3987], and the implementation must permit all arguments that are valid according to that specification. In addition, the implementation may accept some or all strings that conform to the rules for (absolute or relative) Legacy Extended IRI references as defined in [Legacy extended IRIs for XML resource identification]. For the purposes of this section, the terms IRI and IRI reference include these extensions, insofar as the implementation chooses to support them.
The following rules apply in order:
If $href is the empty sequence, the function returns the empty sequence.
If $href is an absolute IRI (as defined above), then it is returned unchanged.
If the $base argument is not supplied, or is supplied as anthe empty sequence then:
If the executable base URIXP in the dynamic context is not absent, it is used as the effective value of $base.
Otherwise, a dynamic error is raised: [err:FONS0005].
The function resolves the relative IRI reference $href against the base IRI $base using the algorithm defined in [RFC 3986], adapted by treating any character that would not be valid in an RFC3986 URI or relative reference in the same way that RFC3986 treats unreserved characters. No percent-encoding takes place.
The first form of this function resolves $href against the value of the base-uri property from the static context. A dynamic error is raised [err:FONS0005] if the base-uri property is not initialized in the static context.
A dynamic error is raised [err:FORG0002] if $href is not a valid IRI according to the rules of RFC3987, extended with an implementation-defined subset of the extensions permitted in LEIRI, or if it is not a suitable relative reference to use as input to the RFC3986 resolution algorithm extended to handle additional unreserved characters.
A dynamic error is raised [err:FORG0002] if $base is not a valid IRI according to the rules of RFC3987, extended with an implementation-defined subset of the extensions permitted in LEIRI, or if it is not a suitable IRI to use as input to the chosen resolution algorithm (for example, if it is a relative IRI reference or if it is a non-hierarchic URI). In XPath 4.0, attempting to resolve against an absolute URI that includes a fragment identifier is no longer an error, the fragment identifier is simply ignored. A narrow reading of RFC 3986 might seem to forbid this, but in practice the interpretation is non-controversial and the practice is widely supported.
A dynamic error is raised [err:FORG0009] if the chosen resolution algorithm fails for any other reason.
Resolving a URI does not dereference it. This is merely a syntactic operation on two strings.
The algorithms in the cited RFCs include some variations that are optional or recommended rather than mandatory; they also describe some common practices that are not recommended, but which are permitted for backwards compatibility. Where the cited RFCs permit variations in behavior, so does this specification.
Throughout this family of specifications, the phrase "resolving a relative URI (or IRI) reference" should be understood as using the rules of this function, unless otherwise stated.
RFC3986 defines an algorithm for resolving relative references in the context of the URI syntax defined in that RFC. RFC3987 describes a modification to that algorithm to make it applicable to IRIs (specifically: additional characters permitted in an IRI are handled the same way that RFC3986 handles unreserved characters). The LEIRI specification does not explicitly define a resolution algorithm, but suggests that it should not be done by converting the LEIRI to a URI, and should not involve percent-encoding. This specification fills this gap by defining resolution for LEIRIs in the same way that RFC3987 defines resolution for IRIs, that is by specifying that additional characters are handled as unreserved characters.
This section specifies functions that parse strings as URIs, to identify their structure, and construct URI strings from their structured representation.
Some URI schemes are hierarchical and some are non-hierarchical. Implementations must treat the following schemes as non-hierarchical: jar, mailto, news, tag, tel, and urn. Whether additional schemes are known to be non-hierarchical implementation-defined. If a scheme is not known to be non-hierarchical, it must be treated as hierarchical.
| Function | Meaning |
|---|---|
fn:parse-uri | Parses the URI provided and returns a map of its parts. |
fn:build-uri | Constructs a URI from the parts provided. |
Both functions use a structured representation of a URI as defined in the next section.
This record type represents the components of a URI.
| Name | Meaning |
|---|---|
| The original URI. This element is returned by
|
| The URI scheme (e.g., “https” or “file”).
|
| The URI is an absolute URI.
|
| Whether the URI is hierarchical or not.
|
| The authority portion of the URI (e.g., “example.com:8080”).
|
| Any userinfo that was passed as part of the authority.
|
| The host passed as part of the authority (e.g., “example.com”).
|
| The port passed as part of the authority (e.g., “8080”).
|
| The path portion of the URI.
|
| Any query string.
|
| Any fragment identifier.
|
| Parsed and unescaped path segments.
|
| Parsed and unescaped query key-value pairs.
|
| The path of the URI, treated as a filepath.
|
| The record type is extensible (it may contain additional fields beyond those listed). |
The segmented forms of the path and query parameters provide convenient access to commonly used information.
The path, if there is one, is tokenized on “/” characters and each segment is unescaped (as per the fn:decode-from-uri function). Consider the URI http://example.com/path/to/a%2fb. The path portion has to be returned as /path/to/a%2fb because decoding the %2f would change the nature of the path. The unescaped form is easily accessible from path-segments:
("", "path", "to", "a/b")Note that the presence or absence of a leading slash on the path will affect whether or not the sequence begins with an emptya zero-length string.
The query parameters are decoded into a map. Consider the URI: http://example.com/path?a=1&b=2%264&a=3. The decoded form in the query-parameters is the following map:
{ "a": ("1", "3"), "b": "2&4" }Note that both keys and values are unescaped. If a key is repeated in the query string, the map will contain a sequence of values for that key, as seen for a in this example.
Parses the URI provided and returns a map of its parts.
fn:parse-uri( | ||
$value | as , | |
$options | as | := {} |
) as uri-structure-record? | ||
This function is deterministic, context-independent, and focus-independent.
If $value is anthe empty sequence, the result is anthe empty sequence.
The function parses the $value provided, returning a map containing its constituent parts: scheme, authority components, path, etc. In addition to parsing URIs as defined by [RFC 3986] (and [RFC 3987]), this function also attempts to account for strings that are not valid URIs but that often appear in URI-adjacent spaces, such as file names. Not all such strings can be successfully parsed as URIs.
The following options are available:
record( | |
allow-deprecated-features? | as xs:boolean, |
omit-default-ports? | as xs:boolean, |
unc-path? | as xs:boolean |
) | |
| Key | Meaning |
|---|---|
| Indicates that deprecated URI features should be returned
|
| Indicates that a port number that is the same as the default port for a given scheme should be omitted.
|
| Indicates that an input URI that begins with two or more leading slashes should be interprted as a Windows Universal Naming Convention Path. (Specifically: that it has the file: scheme.)
|
This function is described as a series of transformations over the input string to identify the parts of a URI that are present. Some portions of the URI are identified by matching with a regular expression. This approach is designed to make the description clear and unambiguous; it is not implementation advice. Comparison of scheme and authority components is case insensitive.
Processing begins with a string that is equal to the $value. If the string contains any backslashes (\), replace them with forward slashes (/).
Strip off the fragment identifier and any query:
If the string matches ^(.*?)#(.*)$, the string is the first match group and the fragment is the second match group. Otherwise, the string is unchanged and the fragment is the empty sequence. If a fragment is present, it is URI decoded. If the fragment is the emptya zero-length string, it is discarded and the fragment is the empty sequence.
If the string matches ^(.*?)\?(.*)$, the string is the first match group and the query is the second match group. Otherwise, the string is unchanged and the query is the empty sequence. If the query is the emptya zero-length string, it is discarded and the query is the empty sequence.
Attempt to identify the scheme:
If the string matches ^([a-zA-Z][A-Za-z0-9\+\-\.]+):(.*)$:
the scheme is the first match group and
the string is the second match group.
Otherwise, the scheme is the empty sequence and the string is unchanged.
If the scheme is not empty and the fragment is empty, absolute is true. Otherwise, absolute is the empty sequence. (But see the discussion of hierarchical URIs, below.)
If scheme is the empty sequence or file:
If the string matches ^/*([a-zA-Z][:|].*)$:
the scheme is file and
the string is a single slash / followed by the first match group with the second character changed to :, if necessary.
Otherwise, if unc-path is true:
the scheme is file and
the string is unchanged.
Finally, if neither of the preceding cases apply:
the scheme remains the empty sequence and
the string is unchanged.
Now that the scheme, if there is one, has been identified, determine if the URI is hierarchical:
If the scheme is known to be hierarchical, or known not to be hierarchical, then hierarchical is set accordingly. If the implementation does not know if a scheme is or is not hierarchical, the hierarchical setting depends on the string: if the string is the emptyzero-length string, hierarchical is the empty sequence (i.e. not known), otherwise hierarchical is true if string begins with / and false otherwise.
If the URI is not hierarchical, absolute is the empty sequence.
Identify the remaining components according to the scheme and whether or not the URI is hierarchical.
If the scheme is file:
The authority is the empty sequence.
If unc-path is true and the string matches ^/*(//[^/].*)$: then filepath, and string are both the first match group.
If the string begins ^//*[A-Za-z]:/ then all but one leading slash is removed from string and the filepath is the string with all leading slashes removed.
Otherwise, the filepath and string are the string with any sequence of leading slashes replaced by a single slash.
If the scheme is hierarchical:
If the string matches ^//([^/]+)$, the authority is the first match group and the string is empty.
If the string matches ^//([^/]*)(/.*)$, the authority is the first match group and the string is the second match group.
Otherwise, the authority is the empty sequence and the string is unchanged.
If the scheme is not hierarchical:
The authority is the empty sequence and the string is unchanged.
If the authority matches ^(([^@]*)@)(.*)(:([^:]*))?$, then the userinfo is match group 2, otherwise userinfo is the empty sequence. If userinfo is present and contains a non-empty password, then userinfo is discarded and set to the empty sequence unless the allow-deprecated-features option is true.
When parsing the authority to find the host, there are four possibilities: the host can be a registered name (e.g., example.com), an IPv4 address (e.g., 127.0.0.1), an IPv6 (or IPvFuture) address (e.g., [::1]), or an error if there is an open square bracket ([) not matched by a close square bracket (]). In a properly constructed RFC 3986 URI, the only place where square brackets may occur is around the IPv6/IPvFuture IP address.
If the authority matches ^(([^@]*)@)?(\[[^\]]*\])(:([^:]*))?$, then the host is match group 3, otherwise
If the authority matches ^(([^@]*)@)?\[.*$ then [err:FOUR0001] is raised, otherwise
If the authority matches ^(([^@]*)@)?([^:]+)(:([^:]*))?$, then the host is match group 3, otherwise
the host is the empty sequence.
This function does not attempt to decode the components of the host.
Similar care must be taken to match the port because an IPv6/IPvFuture address may contain a colon.
If the authority matches ^(([^@]*)@)?(\[[^\]]*\])(:([^:]*))?$, then the port is match group 5.
Otherwise, if the authority matches ^(([^@]*)@)?([^:]+)(:([^:]*))?$, then the port is match group 5.
Otherwise, the port is the empty sequence.
If the omit-default-ports option is true, the port is discarded and set to the empty sequence if the port number is the same as the default port for the given scheme. Implementations should recognize the default ports for http (80), https (443), ftp (21), and ssh (22). Exactly which ports are recognized is implementation-defined.
If the string is the emptya zero-length string, then path is the empty sequence, otherwise path is the whole string. If the scheme is the empty sequence, filepath is also the whole string.
A path-segments sequence is constructed by tokenizing the string on / (solidus) and applying uri decoding on each token.
Note:
The path and path-segments properties both contain the path portion of the URI. The different formats only become important when the path contains encoded delimiters.
Consider /path%2Fsegment. An application may want to decode that, using /path/segment in a database query, for example. At the same time, an application may wish to modify the URI and then reconstruct it.
In the string form, decoding %2F to / is not reversible. In the path-segments form, the path is broken into discrete segments where the syntactic delimiters occur. This means the encoded delimiters can be decoded without introducing ambiguity: ("", "path/segment"). In this format, the decoding is reversible: escape the non-syntactic delimiters before reconstructing the path with the syntactic ones.
A consequence of constructing the path-segments this way is that an emptya zero-length string appears before the first /, if the path begins with a /, after the last /, if the path ends with a /, and between consecutive / characters. (If the path consists of a single /, that / counts as both the first and last /, producing a segment list containing two empty strings.)
The empty strings may seem unnecessary at first glance, but they assure that the path can be reconstructed by joining the segments together again without having to handle the presence or absence of a leading or trailing / as special cases.
Applying uri decoding is equivalent to calling fn:decode-from-uri on the string.
The query-parameters value is constructed as follows. Start with anthe empty map. Tokenize the query on the & (ampersand). For each token, identify the key and the value. If the token contains an equal sign (=), the key is the string that precedes the first equal sign, uri decoded, and the value is the remainder of the token, after the first equal sign, uri decoded. If the token does not contain an equal sign, key is the empty string and the value is equal to the token, uri decoded. Add the key/value pair to the map. If the key already exists in the map, add the value to a list of values associated with that key. The resulting map, when all tokens have been processed, is the query-parameters map.
If the filepath is not the empty sequence, it is uri decoded. On a Windows system, any forward slashes in the path may be replaced with backslashes.
A uri-structure-record is returned. The record should be populated with only those keys that have a non-empty value (keys whose value is the empty sequence should be omitted).
Implementations may implement additional or different rules for URIs that have a scheme or pattern that they recognize. An implementation might choose to parse jar: URIs with special rules, for example, since they extend the syntax in ways not defined by [RFC 3986]. Implementations may add additional keys to the map. The meaning of those keys is implementation-defined.
A dynamic error is raised [err:FOUR0001] if the URI contains an open square bracket in the authority component that is not followed by a close square bracket.
Like fn:resolve-uri, this function handles the additional characters allowed in [RFC 3987] IRIs in the same way that other unreserved characters are handled.
Unlike fn:resolve-uri, this function is not attempting to resolve one URI against another and consequently, the errors that can arise under those circumstances do not apply here. The fn:parse-uri function will accept strings that would raise errors if resolution was attempted; see fn:build-uri.
In the examples that follow, keys with values that are null or anthe empty sequence are elided for editorial clarity. String literals that include an ampersand character are written as string templates (for example | |
| Expression: | parse-uri("http://qt4cg.org/specifications/xpath-functions-40/Overview.html#parse-uri") |
|---|---|
| Result: | {
"authority": "qt4cg.org",
"fragment": "parse-uri",
"hierarchical": true(),
"host": "qt4cg.org",
"path": "/specifications/xpath-functions-40/Overview.html",
"path-segments": ("", "specifications", "xpath-functions-40", "Overview.html"),
"scheme": "http",
"uri": "http://qt4cg.org/specifications/xpath-functions-40/Overview.html#parse-uri"
} |
| Expression: | parse-uri("http://www.ietf.org/rfc/rfc2396.txt") |
| Result: | {
"authority": "www.ietf.org",
"hierarchical": true(),
"absolute": true(),
"host": "www.ietf.org",
"path": "/rfc/rfc2396.txt",
"path-segments": ("", "rfc", "rfc2396.txt"),
"scheme": "http",
"uri": "http://www.ietf.org/rfc/rfc2396.txt"
} |
| Expression: | parse-uri("https://example.com/path/to/file") |
| Result: | {
"authority": "example.com",
"path": "/path/to/file",
"scheme": "https",
"path-segments": ("", "path", "to", "file"),
"host": "example.com",
"hierarchical": true(),
"absolute": true(),
"uri": "https://example.com/path/to/file"
} |
| Expression: | parse-uri( `https://example.com:8080/path?s=%22hello world%22&sort=relevance` ) |
| Result: | {
"authority": "example.com:8080",
"hierarchical": true(),
"absolute": true(),
"host": "example.com",
"path": "/path",
"path-segments": ("", "path"),
"port": 8080,
"query": `s=%22hello world%22&sort=relevance`,
"query-parameters": {
"s": """hello world""",
"sort": "relevance"
},
"scheme": "https",
"uri": `https://example.com:8080/path?s=%22hello world%22&sort=relevance`
} |
| Expression: | parse-uri("https://user@example.com/path/to/file") |
| Result: | {
"authority": "user@example.com",
"hierarchical": true(),
"absolute": true(),
"host": "example.com",
"path": "/path/to/file",
"path-segments": ("", "path", "to", "file"),
"scheme": "https",
"uri": "https://user@example.com/path/to/file",
"userinfo": "user"
} |
| Expression: | parse-uri("ftp://ftp.is.co.za/rfc/rfc1808.txt") |
| Result: | {
"authority": "ftp.is.co.za",
"hierarchical": true(),
"absolute": true(),
"host": "ftp.is.co.za",
"path": "/rfc/rfc1808.txt",
"path-segments": ("", "rfc", "rfc1808.txt"),
"scheme": "ftp",
"uri": "ftp://ftp.is.co.za/rfc/rfc1808.txt"
} |
| Expression: | parse-uri("file:////uncname/path/to/file") |
| Result: | {
"filepath": "/uncname/path/to/file",
"hierarchical": true(),
"absolute": true(),
"path": "/uncname/path/to/file",
"path-segments": ("", "uncname", "path", "to", "file"),
"scheme": "file",
"uri": "file:////uncname/path/to/file"
} |
| Expression: | parse-uri("file:///c:/path/to/file") |
| Result: | {
"filepath": "c:/path/to/file",
"hierarchical": true(),
"absolute": true(),
"path": "/c:/path/to/file",
"path-segments": ("", "c:", "path", "to", "file"),
"scheme": "file",
"uri": "file:///c:/path/to/file"
} |
| Expression: | parse-uri("file:/C:/Program%20Files/test.jar") |
| Result: | {
"filepath": "C:/Program Files/test.jar",
"hierarchical": true(),
"absolute": true(),
"path": "/C:/Program%20Files/test.jar",
"path-segments": ("", "C:", "Program Files", "test.jar"),
"scheme": "file",
"uri": "file:/C:/Program%20Files/test.jar"
} |
| Expression: | parse-uri("file:\\c:\path\to\file") |
| Result: | {
"filepath": "c:/path/to/file",
"hierarchical": true(),
"absolute": true(),
"path": "/c:/path/to/file",
"path-segments": ("", "c:", "path", "to", "file"),
"scheme": "file",
"uri": "file:\\c:\path\to\file"
} |
| Expression: | parse-uri("file:\c:\path\to\file") |
| Result: | {
"filepath": "c:/path/to/file",
"hierarchical": true(),
"absolute": true(),
"path": "/c:/path/to/file",
"path-segments": ("", "c:", "path", "to", "file"),
"scheme": "file",
"uri": "file:\c:\path\to\file"
} |
| Expression: | parse-uri("c:\path\to\file") |
| Result: | {
"filepath": "c:/path/to/file",
"hierarchical": true(),
"path": "/c:/path/to/file",
"path-segments": ("", "c:", "path", "to", "file"),
"scheme": "file",
"uri": "c:\path\to\file"
} |
| Expression: | parse-uri("/path/to/file") |
| Result: | {
"filepath": "/path/to/file",
"hierarchical": true(),
"path": "/path/to/file",
"path-segments": ("", "path", "to", "file"),
"uri": "/path/to/file"
} |
| Expression: | parse-uri("#testing") |
| Result: | {
"fragment": "testing",
"uri": "#testing"
} |
| Expression: | parse-uri("?q=1") |
| Result: | {
"query": "q=1",
"query-parameters":{
"q": "1"
},
"uri": "?q=1"
} |
| Expression: | parse-uri("ldap://[2001:db8::7]/c=GB?objectClass?one") |
| Result: | {
"authority": "[2001:db8::7]",
"hierarchical": true(),
"absolute": true(),
"host": "[2001:db8::7]",
"path": "/c=GB",
"path-segments": ("", "c=GB"),
"query": "objectClass?one",
"query-parameters":{
"": "objectClass?one"
},
"scheme": "ldap",
"uri": "ldap://[2001:db8::7]/c=GB?objectClass?one"
} |
| Expression: | parse-uri("mailto:John.Doe@example.com") |
| Result: | {
"hierarchical": false(),
"path": "John.Doe@example.com",
"path-segments": "John.Doe@example.com",
"scheme": "mailto",
"uri": "mailto:John.Doe@example.com"
} |
| Expression: | parse-uri("news:comp.infosystems.www.servers.unix") |
| Result: | {
"hierarchical": false(),
"path": "comp.infosystems.www.servers.unix",
"path-segments": "comp.infosystems.www.servers.unix",
"scheme": "news",
"uri": "news:comp.infosystems.www.servers.unix"
} |
| Expression: | parse-uri("tel:+1-816-555-1212") |
| Result: | {
"hierarchical": false(),
"path": "+1-816-555-1212",
"path-segments": " 1-816-555-1212",
"scheme": "tel",
"uri": "tel:+1-816-555-1212"
} |
| Expression: | parse-uri("telnet://192.0.2.16:80/") |
| Result: | {
"authority": "192.0.2.16:80",
"hierarchical": true(),
"absolute": true(),
"host": "192.0.2.16",
"path": "/",
"path-segments": ("", ""),
"port": 80,
"scheme": "telnet",
"uri": "telnet://192.0.2.16:80/"
} |
| Expression: | parse-uri("urn:oasis:names:specification:docbook:dtd:xml:4.1.2") |
| Result: | {
"hierarchical": false(),
"path": "oasis:names:specification:docbook:dtd:xml:4.1.2",
"path-segments": "oasis:names:specification:docbook:dtd:xml:4.1.2",
"scheme": "urn",
"uri": "urn:oasis:names:specification:docbook:dtd:xml:4.1.2"
} |
| Expression: | parse-uri("tag:textalign.net,2015:ns") |
| Result: | {
"hierarchical": false(),
"path": "textalign.net,2015:ns",
"path-segments": "textalign.net,2015:ns",
"scheme": "tag",
"uri": "tag:textalign.net,2015:ns"
} |
| Expression: | parse-uri("tag:jan@example.com,1999-01-31:my-uri") |
| Result: | {
"hierarchical": false(),
"path": "jan@example.com,1999-01-31:my-uri",
"path-segments": "jan@example.com,1999-01-31:my-uri",
"scheme": "tag",
"uri": "tag:jan@example.com,1999-01-31:my-uri"
} |
This example uses the algorithm described above, not an algorithm that is specifically aware of the | |
| Expression: | parse-uri("jar:file:/C:/Program%20Files/test.jar!/foo/bar") |
| Result: | {
"hierarchical": false(),
"path": "file:/C:/Program%20Files/test.jar!/foo/bar",
"path-segments": ("file:", "C:", "Program Files", "test.jar!", "foo", "bar"),
"scheme": "jar",
"uri": "jar:file:/C:/Program%20Files/test.jar!/foo/bar"
} |
This example demonstrates that parsing the URI treats non-URI characters in lexical IRIs as “unreserved characters”. The rationale for this is given in the description of | |
| Expression: | parse-uri("http://www.example.org/Dürst") |
| Result: | {
"authority": "www.example.org",
"hierarchical": true(),
"absolute": true(),
"host": "www.example.org",
"path": "/Dürst",
"path-segments": ("", "Dürst"),
"scheme": "http",
"uri": "http://www.example.org/Dürst"
} |
This example demonstrates the use of | |
| Expression: | parse-uri("c|/path/to/file") |
| Result: | {
"filepath": "c:/path/to/file",
"hierarchical": true(),
"path": "/c:/path/to/file",
"path-segments": ("", "c:", "path", "to", "file"),
"scheme": "file",
"uri": "c|/path/to/file"
} |
This example demonstrates the use of | |
| Expression: | parse-uri("file://c|/path/to/file") |
| Result: | {
"filepath": "c:/path/to/file",
"hierarchical": true(),
"absolute": true(),
"path": "/c:/path/to/file",
"path-segments": ("", "c:", "path", "to", "file"),
"scheme": "file",
"uri": "file://c|/path/to/file"
} |
Constructs a URI from the parts provided.
fn:build-uri( | ||
$parts | as uri-structure-record, | |
$options | as | := {} |
) as | ||
This function is deterministic, context-dependent, and focus-independent.
A URI is composed from a scheme, authority, path, query, and fragment.
The following options are available:
record( | |
allow-deprecated-features? | as xs:boolean, |
omit-default-ports? | as xs:boolean, |
unc-path? | as xs:boolean |
) | |
| Key | Meaning |
|---|---|
| Indicates that deprecated URI features should be returned
|
| Indicates that a port number that is the same as the default port for a given scheme should be omitted.
|
| Indicates that the URI represents a Windows Universal Naming Convention Path.
|
The components are derived from the contents of the $parts map. To simplify the description below, a value is considered to be present in the map if the relevant field exists and is non-empty.
If the scheme key is present in the map, the URI begins with the value of that key. A URI is considered to be non-hierarchical if either the hierarchical key is present in the $parts map with the value false or if the scheme is known to be non-hierarchical. (In other words, schemes are hierarchical by default.)
If the scheme is known to be non-hierarchical, it is delimited by a trailing :.
Otherwise, if the scheme is file and the unc-path option is true, the scheme is delimited by a trailing :////.
Otherwise, the scheme is delimited by a trailing ://.
For simplicity of exposition, we take the userinfo, host, and port values from the map and imagine they are stored in variables with the same name. If the key is not present in the map, the value of the variable is set to the empty sequence.
If $userinfo is non-empty and contains a non-empty password, then $userinfo is set to the empty sequence unless the allow-deprecated-features option is true.
If the omit-default-ports option is true then the $port is set to the empty sequence if the port number is the same as the default port for the given scheme. Implementations should recognize the default ports for http (80), https (443), ftp (21), and ssh (22). Exactly which ports are recognized is implementation-defined.
If any of $userinfo, $host, or $port exist, the following authority is added to the URI under construction:
concat(
if (exists($userinfo)) { $userinfo || "@" },
$host,
if (exists($port)) { ":" || $port }
)If none of userinfo, host, or port is present, and authority is present, the value of the authority key is added to the URI. (In this case, no attempt is made to determine if a password or standard port are present, the authority value is simply added to the string.)
The fn:parse-uri function removes percent-escaping when it constructs the path-segments, query-parameters, and fragment properties. That’s often the most convenient behavior but, in order to reconstruct a URI from them, special escaping rules apply. These rules protect delimiters without encoding additional characters unnecessarily. The rules for path-segments, query-parameters, and fragment are slightly different because the URI encoding conventions are slightly different in each case.
An application with more stringent requirements can construct a path or query that satisfies the requirements and leave the path-segments and/or query-parameters keys out of the map.
If the path-segments key exists in the map, then the path is constructed from the segments. To construct the path, the possibly encoded segments are concatentated together, separated by U+002F (SOLIDUS, FORWARD SLASH, /) characters.
The rules for encoding the path segments are different for hierarchical and non-hierarchical URIs. If the URI is non-hierarchical, no encoding is performed on the segments. Otherwise, each segment is encoded by replacing any control characters (codepoints less than 0x20) and exclusively the following characters with their percent-escaped forms: U+0020 (SPACE) , U+0025 (PERCENT SIGN, %) , U+002F (SOLIDUS, FORWARD SLASH, /) , U+003F (QUESTION MARK, ?) , U+0023 (NUMBER SIGN, #) , U+002B (PLUS, +) , U+005B (LEFT SQUARE BRACKET, [) , and U+005D (RIGHT SQUARE BRACKET, ]) . That is “[#0-#20%/\?\#\+\[\]]”.
Note:
Encoding is performed unless the URI is known to be non-hierarchical; in other words, encoding is the default. This heuristic improves the reliability of using fn:build-uri() on the output of fn:parse-uri(). (For example, fn:parse-uri('a+b/c') => fn:build-uri() will return a+b/c.)
It is necessary to avoid encoding non-hierarchical schemes because there is more variation in them (for example, the tel: scheme uses a “+” that must not be encoded). Users working with non-hierarchical schemes may need to address the encoding issue directly bearing in mind the encoding requirements of the particular schemes in use.
Otherwise the value of the path key is used.
If neither are present, the emptya zero-length string is used for the path.
The path is added to the URI.
If the query-parameters key exists in the map, its value must be a map. A sequence of strings is constructed from the values in the map.
To construct the string, each key and value is encoded. The encoding performed replaces any control characters (codepoints less than 0x20) and exclusively the following characters with their percent-escaped forms: U+0020 (SPACE) , U+0025 (PERCENT SIGN, %) , U+003D (EQUALS SIGN, =) , U+0026 (AMPERSAND, &) , U+0023 (NUMBER SIGN, #) , U+002B (PLUS, +) , U+005B (LEFT SQUARE BRACKET, [) , and U+005D (RIGHT SQUARE BRACKET, ]) . That is “[#0-#20%=&\#\+\[\]]”. (This differs from the path encoding in that it excludes U+002F (SOLIDUS, FORWARD SLASH, /) and U+003F (QUESTION MARK, ?) but includes U+003D (EQUALS SIGN, =) and U+0026 (AMPERSAND, &) .) For each key and each value associated with that key in turn:
If the key is the emptya zero-length string, the string constructed is the encoded value.
Otherwise, the string constructed is the value of the key, encoded, followed by an equal sign (U+003D (EQUALS SIGN, =) ), followed by the value, encoded.
The query is constructed by joining the resulting strings into a single string, separated by & (ampersand) characters. If the query-parameters key does not exist in the map, but the query key does, then the query is the value of the query key.
If there is a query, it is added to the URI with a preceding U+003F (QUESTION MARK, ?) .
If the fragment key exists in the map, then the value of that key is encoded and added to the URI with a preceding U+0023 (NUMBER SIGN, #) . The encoding performed replaces any control characters (codepoints less than 0x20) and exclusively the following characters with their percent-escaped forms: U+0020 (SPACE) , U+0025 (PERCENT SIGN, %) , U+0023 (NUMBER SIGN, #) , U+002B (PLUS, +) , U+005B (LEFT SQUARE BRACKET, [) , and U+005D (RIGHT SQUARE BRACKET, ]) . That is “[#0-#20%\#\+\[\]]”. (This differs from the path encoding in that it excludes U+002F (SOLIDUS, FORWARD SLASH, /) and U+003F (QUESTION MARK, ?) .)
The resulting URI is returned.
| Expression: | build-uri({
"scheme": "https",
"host": "qt4cg.org",
"port": (),
"path": "/specifications/index.html"
}) |
|---|---|
| Result: | "https://qt4cg.org/specifications/index.html" |
Operators are defined on the following type:
xs:duration
and on the two defined subtypes (see 8.1.1 Subtypes of duration):
xs:yearMonthDuration
xs:dayTimeDuration
Arithmetic on durations is defined only on these subtypes: this is because the results of some operations (for example one month minus one day) have no representation in the value space.
Two xs:duration values may however be compared.
| Function | Meaning |
|---|---|
op:add-yearMonthDurations | Returns the result of adding two xs:yearMonthDuration values. |
op:subtract-yearMonthDurations | Returns the result of subtracting one xs:yearMonthDuration value from another. |
op:multiply-yearMonthDuration | Returns the result of multiplying $arg1 by $arg2. The result is rounded to the nearest month. |
op:divide-yearMonthDuration | Returns the result of dividing $arg1 by $arg2. The result is rounded to the nearest month. |
op:divide-yearMonthDuration-by-yearMonthDuration | Returns the ratio of two xs:yearMonthDuration values. |
op:add-dayTimeDurations | Returns the sum of two xs:dayTimeDuration values. |
op:subtract-dayTimeDurations | Returns the result of subtracting one xs:dayTimeDuration from another. |
op:multiply-dayTimeDuration | Returns the result of multiplying a xs:dayTimeDuration by a number. |
op:divide-dayTimeDuration | Returns the result of multiplying a xs:dayTimeDuration by a number. |
op:divide-dayTimeDuration-by-dayTimeDuration | Returns the ratio of two xs:dayTimeDuration values, as a decimal number. |
For operators that combine a duration and a date/time value, see 9.7 Arithmetic operators on durations, dates, and times.
Returns the ratio of two xs:dayTimeDuration values, as a decimal number.
Defines the semantics of the div operator when applied to two xs:dayTimeDuration values.
op:divide-dayTimeDuration-by-dayTimeDuration( | ||
$arg1 | as , | |
$arg2 | as | |
) as | ||
The function returns the result of dividing $arg1 by $arg2. The result is the xs:dayTimeDuration whose length in seconds is equal to the length in seconds of $arg1 divided by the length in seconds of $arg2. The calculation is performed by applying op:numeric-divide to the two xs:decimal operands.
For handling of overflow, underflow, and rounding, see 8.1.2 Limits and precision.
Either operand (and therefore the result) may be negative.
| Expression: | round-half-to-even(
op:divide-dayTimeDuration-by-dayTimeDuration(
xs:dayTimeDuration("P2DT53M11S"), xs:dayTimeDuration("P1DT10H")
),
4
) |
|---|---|
| Result: | 1.4378 |
This examples shows how to determine the number of seconds in a duration. | |
| Expression: | op:divide-dayTimeDuration-by-dayTimeDuration(
xs:dayTimeDuration("P2DT53M11S"),
xs:dayTimeDuration("PT1S")
)op:divide-dayTimeDuration-by-dayTimeDuration(
xs:dayTimeDuration("P2DT53M11S"),
seconds(1)
) |
| Result: | 175991.0 |
This section defines operations on the [XML Schema Part 2: Datatypes Second Edition] date and time types.
See [Working With Timezones] for a disquisition on working with date and time values with and without timezones.
[Definition] The eight primitive types xs:dateTime, xs:date, xs:time, xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth, xs:gDay are referred to collectively as the Gregorian types.
This section describes operations on atomic items of these types.
Values of these types are modeled as comprising one or more of the seven components year, month, day, hour, minute, second, and timezone.
The only operations defined on xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth, and xs:gDay values are equality comparison and component extraction. For other types, further operations are provided, including order comparisons, arithmetic, formatted display, and timezone adjustment.
All conforming processors must support year values in the range 1 to 9999, and a minimum fractional second precision of 1 millisecond or three digits (i.e.that is, s.sss). However, processors may set larger implementation-defined limits on the maximum number of digits they support in these two situations. Processors may also choose to support the year 0 and years with negative values. The results of operations on dates that cross the year 0 are implementation-defined.
A processor that limits the number of digits in date and time datatype representations may encounter overflow and underflow conditions when it tries to execute the functions in 9.7 Arithmetic operators on durations, dates, and times. In these situations, the processor must return 00:00:00 in case of time underflow. It must raise a dynamic error [err:FODT0001] in case of overflow.
Similarly, a processor that limits the precision of the seconds component of date and time or duration values may need to deliver a rounded result for arithmetic operations. Such a processor must deliver a result that is as close as possible to the mathematically precise result, given these limits: if two values are equally close, the one that is chosen is implementation-defined.
| Function | Meaning |
|---|---|
fn:dateTime | Returns an xs:dateTime value created by combining an xs:date and an xs:time. |
fn:unix-dateTime | Returns a dateTime value for a Unix time. |
Returns a dateTime value for a Unix time.
fn:unix-dateTime( | ||
$value | as | := 0 |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function returns a dateTime value in UTC timezone for the Unix time specified by $value in milliseconds. If the value is absent or anthe empty sequence, 0 is used. The Unix time is defined in [IEEE 1003.1-2024].
If the implementation supports data types from XSD 1.1 then the returned value will be an instance of xs:dateTimeStamp. Otherwise, the only guarantees are that it will be an instance of xs:dateTime and will have a timezone component.
The effect of the function is equivalent to the result of the following XPath expression.
xs:dateTime('1970-01-01T00:00:00Z') + ($value otherwise 0) * seconds(0.001)By calling this convenience function, it can be ensured that the correct timezone is used for computing the Unix time.
Note that Unix time does not account for leap seconds. It assumes that every day has 86,400 seconds.
| Expression: | unix-dateTime() |
|---|---|
| Result: | xs:dateTime('1970-01-01T00:00:00Z') |
| Expression: | unix-dateTime(1) |
| Result: | xs:dateTime('1970-01-01T00:00:00.001Z') |
| Expression: | unix-dateTime(86400000) |
| Result: | xs:dateTime('1970-01-02T00:00:00Z') |
Calculate the Unix time associated with a | |
let $value := current-dateTime() return ($value - unix-dateTime()) div seconds(0.001) | |
These functions support adding or subtracting a duration value to or from an xs:dateTime, an xs:date or an xs:time value. Appendix E of [XML Schema Part 2: Datatypes Second Edition] describes an algorithm for performing such operations.
| Function | Meaning |
|---|---|
op:subtract-dateTimes | Returns an xs:dayTimeDuration representing the amount of elapsed time between the instants arg2 and arg1. |
op:subtract-dates | Returns the xs:dayTimeDuration that corresponds to the elapsed time between the starting instant of $arg2 and the starting instant of $arg2. |
op:subtract-times | Returns the xs:dayTimeDuration that corresponds to the elapsed time between the values of $arg2 and $arg1 treated as times on the same date. |
op:add-yearMonthDuration-to-dateTime | Returns the xs:dateTime that is a given duration after a specified xs:dateTime (or before, if the duration is negative). |
op:add-dayTimeDuration-to-dateTime | Returns the xs:dateTime that is a given duration after a specified xs:dateTime (or before, if the duration is negative). |
op:subtract-yearMonthDuration-from-dateTime | Returns the xs:dateTime that is a given duration before a specified xs:dateTime (or after, if the duration is negative). |
op:subtract-dayTimeDuration-from-dateTime | Returns the xs:dateTime that is a given duration before a specified xs:dateTime (or after, if the duration is negative). |
op:add-yearMonthDuration-to-date | Returns the xs:date that is a given duration after a specified xs:date (or before, if the duration is negative). |
op:add-dayTimeDuration-to-date | Returns the xs:date that is a given duration after a specified xs:date (or before, if the duration is negative). |
op:subtract-yearMonthDuration-from-date | Returns the xs:date that is a given duration before a specified xs:date (or after, if the duration is negative). |
op:subtract-dayTimeDuration-from-date | Returns the xs:date that is a given duration before a specified xs:date (or after, if the duration is negative). |
op:add-dayTimeDuration-to-time | Returns the xs:time value that is a given duration after a specified xs:time (or before, if the duration is negative or causes wrap-around past midnight) |
op:subtract-dayTimeDuration-from-time | Returns the xs:time value that is a given duration before a specified xs:time (or after, if the duration is negative or causes wrap-around past midnight) |
| Function | Meaning |
|---|---|
fn:format-dateTime | Returns a string containing an xs:dateTime value formatted for display. |
fn:format-date | Returns a string containing an xs:date value formatted for display. |
fn:format-time | Returns a string containing an xs:time value formatted for display. |
Three functions are provided to represent dates and times as a string, using the conventions of a selected calendar, language, and country. The functions are presented in their customary fashion, except for the rules and examples, which are described en bloc at 9.8.4 The date/time formatting functions and 9.8.5 Examples of date and time formatting.
The fn:format-dateTime, fn:format-date, and fn:format-time functions format $value as a string using the picture string specified by the $picture argument, the calendar specified by the $calendar argument, the language specified by the $language argument, and the country or other place name specified by the $place argument. The result of the function is the formatted string representation of the supplied xs:dateTime, xs:date, or xs:time value.
[Definition] The three functions fn:format-dateTime, fn:format-date, and fn:format-time are referred to collectively as the date formatting functions.
If $value is the empty sequence, the function returns the empty sequence.
Calling the two-argument form of each of the three functions is equivalent to calling the five-argument form with each of the last three arguments set to anthe empty sequence.
For details of the $language, $calendar, and $place arguments, see 9.8.4.8 The language, calendar, and place arguments.
In general, the use of an invalid $picture, $language, $calendar, or $place argument results in a dynamic error [err:FOFD1340]. By contrast, use of an option in any of these arguments that is valid but not supported by the implementation is not an error, and in these cases the implementation is required to output the value in a fallback representation. More detailed rules are given below.
The set of languages, calendars, and places that are supported in the date formatting functions is implementation-defined. When any of these arguments is omitted or is anthe empty sequence, an implementation-defined default value is used.
If the fallback representation uses a different calendar from that requested, the output string must identify the calendar actually used, for example by prefixing the string with [Calendar: X] (where X is the calendar actually used), localized as appropriate to the requested language. If the fallback representation uses a different language from that requested, the output string must identify the language actually used, for example by prefixing the string with [Language: Y] (where Y is the language actually used) localized in an implementation-dependent way. If a particular component of the value cannot be output in the requested format, it should be output in the default format for that component.
The $language argument specifies the language to be used for the result string of the function. The value of the argument should be either the empty sequence or a value that would be valid for the xml:lang attribute (see [Extensible Markup Language (XML) 1.0 (Fifth Edition)]). Note that this permits the identification of sublanguages based on country codes (from [ISO 3166-1]) as well as identification of dialects and of regions within a country.
If the $language argument is omitted or is set to anthe empty sequence, or if it is set to an invalid value or a value that the implementation does not recognize, then the processor uses the default language defined in the dynamic context.
The language is used to select the appropriate language-dependent forms of:
names (for example, of months)
numbers expressed as words or as ordinals (twenty, 20th, twentieth)
hour convention (0-23 vs 1-24, 0-11 vs 1-12)
first day of week, first week of year
Where appropriate this choice may also take into account the value of the $place argument, though this should not be used to override the language or any sublanguage that is specified as part of the language argument.
The choice of the names and abbreviations used in any given language is implementation-defined. For example, one implementation might abbreviate July as Jul while another uses Jly. In German, one implementation might represent Saturday as Samstag while another uses Sonnabend. Implementations may provide mechanisms allowing users to control such choices.
Where ordinal numbers are used, the selection of the correct representation of the ordinal (for example, the grammatical gender) may depend on the component being formatted and on its textual context in the picture string.
The calendar attribute specifies that the dateTime, date, or time supplied in the $value argument must be converted to a value in the specified calendar and then converted to a string using the conventions of that calendar.
The calendar value if present must be a valid EQName (dynamic error: [err:FOFD1340]). If it is a lexical QName then it is expanded into an expanded QName using the statically known namespaces; if it has no prefix then it represents an expanded-QName in no namespace. If the expanded QName is in no namespace, then it must identify a calendar with a designator specified below (dynamic error: [err:FOFD1340]). If the expanded QName is in a namespace then it identifies the calendar in an implementation-defined way.
If the $calendar argument is omitted or is set to anthe empty sequence then the default calendar defined in the dynamic context is used.
Note:
The calendars listed below were known to be in use during the last hundred years. Many other calendars have been used in the past.
This specification does not define any of these calendars, nor the way that they map to the value space of the xs:date datatype in [XML Schema Part 2: Datatypes Second Edition]. There may be ambiguities when dates are recorded using different calendars. For example, the start of a new day is not simultaneous in different calendars, and may also vary geographically (for example, based on the time of sunrise or sunset). Translation of dates is therefore more reliable when the time of day is also known, and when the geographic location is known. When translating dates between one calendar and another, the processor may take account of the values of the $place and/or $language arguments, with the $place argument taking precedence.
Information about some of these calendars, and algorithms for converting between them, may be found in [Calendrical Calculations].
| Designator | Calendar |
|---|---|
| AD | Anno Domini (Christian Era) |
| AH | Anno Hegirae (Islamic Era) |
| AME | Mauludi Era (solar years since Muhammad’s birth) |
| AM | Anno Mundi (Jewish Calendar) |
| AP | Anno Persici |
| AS | Aji Saka Era (Java) |
| BE | Buddhist Era |
| CB | Cooch Behar Era |
| CE | Common Era |
| CL | Chinese Lunar Era |
| CS | Chula Sakarat Era |
| EE | Ethiopian Era |
| FE | Fasli Era |
| ISO | ISO 8601 calendar |
| JE | Japanese Calendar |
| KE | Khalsa Era (Sikh calendar) |
| KY | Kali Yuga |
| ME | Malabar Era |
| MS | Monarchic Solar Era |
| NS | Nepal Samwat Era |
| OS | Old Style (Julian Calendar) |
| RS | Rattanakosin (Bangkok) Era |
| SE | Saka Era |
| SH | Solar Hijri (Islamic Era, used in Iran and Afghanistan) |
| SS | Saka Samvat |
| TE | Tripurabda Era |
| VE | Vikrama Era |
| VS | Vikrama Samvat Era |
At least one of the above calendars must be supported. It is implementation-defined which calendars are supported.
The ISO 8601 calendar ([ISO 8601]), which is included in the above list and designated ISO, is very similar to the Gregorian calendar designated AD, but it differs in several ways. The ISO calendar is intended to ensure that date and time formats can be read easily by other software, as well as being legible for human users. The ISO calendar prescribes the use of particular numbering conventions as defined in ISO 8601, rather than allowing these to be localized on a per-language basis. In particular it provides a numeric “week date” format which identifies dates by year, week of the year, and day in the week; in the ISO calendar the days of the week are numbered from 1 (Monday) to 7 (Sunday), and week 1 in any calendar year is the week (from Monday to Sunday) that includes the first Thursday of that year. The numeric values of the components year, month, day, hour, minute, and second are the same in the ISO calendar as the values used in the lexical representation of the date and time as defined in [XML Schema Part 2: Datatypes Second Edition]. The era (E component) with this calendar is either a minus sign (for negative years) or a zero-length string (for positive years). For dates before 1 January, AD 1, year numbers in the ISO and AD calendars are off by one from each other: ISO year 0000 is 1 BC, -0001 is 2 BC, etc.
ISO 8601 does not define a numbering for weeks within a month. When the w component is used, the convention to be adopted is that each Monday-to-Sunday week is considered to fall within a particular month if its Thursday occurs in that month; the weeks that fall in a particular month under this definition are numbered starting from 1. Thus, for example, 29 January 2013 falls in week 5 because the Thursday of the week (31 January 2013) is the fifth Thursday in January, and 1 February 2013 is also in week 5 for the same reason.
Note:
The value space of the date and time datatypes, as defined in XML Schema, is based on absolute points in time. The lexical space of these datatypes defines a representation of these absolute points in time using the proleptic Gregorian calendar, that is, the modern Western calendar extrapolated into the past and the future; but the value space is calendar-neutral. The date formatting functions produce a representation of this absolute point in time, but denoted in a possibly different calendar. So, for example, the date whose lexical representation in XML Schema is 1502-01-11 (the day on which Pope Gregory XIII was born) might be formatted using the Old Style (Julian) calendar as 1 January 1502. This reflects the fact that there was at that time a ten-day difference between the two calendars. It would be incorrect, and would produce incorrect results, to represent this date in an element or attribute of type xs:date as 1502-01-01, even though this might reflect the way the date was recorded in contemporary documents.
When referring to years occurring in antiquity, modern historians generally use a numbering system in which there is no year zero (the year before 1 CE is thus 1 BCE). This is the convention that should be used when the requested calendar is OS (Julian) or AD (Gregorian). When the requested calendar is ISO, however, the conventions of ISO 8601 should be followed: here the year before +0001 is numbered zero. In [XML Schema Part 2: Datatypes Second Edition] (version 1.0), the value space for xs:date and xs:dateTime does not include a year zero: however, XSD 1.1 endorses the ISO 8601 convention. This means that the date on which Julius Caesar was assassinated has the ISO 8601 lexical representation -0043-03-13, but will be formatted as 15 March 44 BCE in the Julian calendar or 13 March 44 BCE in the Gregorian calendar (dependent on the chosen localization of the names of months and eras).
The intended use of the $place argument is to identify the place where an event represented by the dateTime, date, or time supplied in the $value argument took place or will take place. If the $place argument is omitted or is set to anthe empty sequence, then the default place defined in the dynamic context is used. If the value is supplied, and is not the empty sequence, then it should either be a country code or an IANA timezone name. If the value does not take this form, or if its value is not recognized by the implementation, then the default place defined in the dynamic context is used.
Country codes are defined in [ISO 3166-1]. Examples are "de" for Germany and "jp" for Japan. Implementations may also allow the use of codes representing subdivisions of a country from ISO 3166-2, or codes representing formerly used names of countries from ISO 3166-3
IANA timezone names are defined in the IANA timezone database [IANA Timezone Database]. Examples are "America/New_York" and "Europe/Rome".
This argument is not intended to identify the location of the user for whom the date or time is being formatted; that should be done by means of the $language attribute. This information may be used to provide additional information when converting dates between calendars or when deciding how individual components of the date and time are to be formatted. For example, different countries using the Old Style (Julian) calendar started the new year on different days, and some countries used variants of the calendar that were out of synchronization as a result of differences in calculating leap years.
The geographical area identified by a country code is defined by the boundaries as they existed at the time of the date to be formatted, or the present-day boundaries for dates in the future.
If the $place argument is supplied in the form of an IANA timezone name that is recognized by the implementation, then the date or time being formatted is adjusted to the timezone offset applicable in that timezone. For example, if the xs:dateTime value 2010-02-15T12:00:00Z is formatted with the $place argument set to America/New_York, then the output will be as if the value 2010-02-15T07:00:00-05:00 had been supplied. This adjustment takes daylight savings time into account where possible; if the date in question falls during daylight savings time in New York, then it is adjusted to timezone offset -PT4H rather than -PT5H. Adjustment using daylight savings time is only possible where the value includes a date, and where the date is within the range covered by the timezone database.
| Function | Meaning |
|---|---|
fn:parse-ietf-date | Parses a string containing the date and time in IETF format, returning the corresponding xs:dateTime value. |
A function is provided to parse dates and times expressed using syntax that is commonly encountered in internet protocols.
Parses a string containing the date and time in IETF format, returning the corresponding xs:dateTime value.
fn:parse-ietf-date( | ||
$value | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function accepts a string matching the production input in the following grammar:
input | ::= | S? (dayname ","? S)? ((datespec S time) | asctime) S? |
dayname | ::= | "Mon" | "Tue" | "Wed" | "Thu" | "Fri" | "Sat" | "Sun" | "Monday | "Tuesday" | "Wednesday" | "Thursday" | "Friday" | "Saturday" | "Sunday" |
datespec | ::= | daynum dsep monthname dsep year |
asctime | ::= | monthname dsep daynum S time S year |
dsep | ::= | S | (S? "-" S?) |
daynum | ::= | digit digit? |
year | ::= | digit digit (digit digit)? |
digit | ::= | [0-9] |
monthname | ::= | "Jan" | "Feb" | "Mar" | "Apr" | "May" | "Jun" | "Jul" | "Aug" | "Sep" | "Oct" | "Nov" | "Dec" |
time | ::= | hours ":" minutes (":" seconds)? (S? timezone)? |
hours | ::= | digit digit? |
minutes | ::= | digit digit |
seconds | ::= | digit digit ("." digit+)? |
timezone | ::= | tzname | tzoffset (S? "(" S? tzname S? ")")? |
tzname | ::= | "UT" | "UTC" | "GMT" | "EST" | "EDT" | "CST" | "CDT" | "MST" | "MDT" | "PST" | "PDT" |
tzoffset | ::= | ("+"|"-") hours ":"? minutes? |
S | ::= | (x09 | x0A | x0D | x20)+ |
The input is case-insensitive: upper-case and lower-case distinctions in the above grammar show the conventional usage, but otherwise have no significance.
If the input is anthe empty sequence, the result is anthe empty sequence.
The dayname, if present, is ignored.
The daynum, monthname, and year supply the day, month, and year of the resulting xs:dateTime value. A two-digit year must have 1900 added to it. A year such as 0070 is to be treated as given; negative years are not permitted.
The hours, minutes, and seconds (including fractional seconds) values supply the corresponding components of the resulting xs:dateTime value; if the seconds value or the fractional seconds value is absent then zero is assumed.
If both a tzoffset and a tzname are supplied then the tzname is ignored.
If a tzoffset is supplied then this defines the hours and minutes parts of the timezone offset:
If it contains a colon, this separates the hours part from the minutes part.
Otherwise, the grammar allows a sequence of from one to four digits. These are interpreted as H, HH, HMM, or HHMM respectively, where H or HH is the hours part, and MM (if present) is the minutes part.
If the minutes part is absent it defaults to 00.
If a tzname is supplied with no tzoffset then it is translated to a timezone offset as follows:
| tzname | Offset |
|---|---|
| UT, UTC, GMT | 00:00 |
| EST | -05:00 |
| EDT | -04:00 |
| CST | -06:00 |
| CDT | -05:00 |
| MST | -07:00 |
| MDT | -06:00 |
| PST | -08:00 |
| PDT | -07:00 |
If neither a tzoffset nor tzname is supplied, a timezone offset of 00:00 is assumed.
A dynamic error is raised [err:FORG0010] if the input does not match the grammar, or if the resulting date/time value is invalid (for example, "31 February").
The parse-ietf-date function attempts to interpret its input as a date in any of the three formats specified by HTTP [RFC 2616].
These formats are used widely on the Internet to represent timestamps, and were specified in:
[RFC 822] (electronic mail), extended in [RFC 1123] to allow four-digit years;
[RFC 850] (Usenet Messages), obsoleted by [RFC 1036];
POSIX asctime() format
[RFC 2616] (HTTP) officially uses a subset of those three formats restricted to GMT.
The grammar for this function is slightly more liberal than the RFCs (reflecting the internet tradition of being liberal in what is accepted). For example the function:
Accepts a single-digit value where appropriate in place of a two-digit value with a leading zero (so "Wed 1 Jun" is acceptable in place of "Wed 01 Jun", and the timezone offset "-5:00" is equivalent to "-05:00")
Accepts one or more whitespace characters (x20, x09, x0A, x0D) wherever a single space is required, and allows whitespace to be omitted where it is not required for parsing
Accepts and ignores whitespace characters (x20, x09, x0A, x0D) at the start or end of the string.
In new protocols IETF recommends the format of [RFC 3339], which is based on a profile of ISO 8601 similar to that already used in XPath and XSD, but the “approximate” [RFC 822] format described here is very widely used.
An [RFC 1123] date can be generated approximately using fn:format-dateTime with a picture string of "[FNn3], [D01] [MNn3] [Y04] [H01]:[m01]:[s01] [Z0000]".
| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
|
|
In XPath 4.0, statically-known QNames can be expressed using a QName literal such as #xml:space. Where the QName is not known statically, the xs:QName constructor function can be used.
In addition to the xs:QName constructor function, QName values can be constructed by combining a namespace URI, prefix, and local name, or by resolving a lexical QName against the in-scope namespaces of an element node. This section defines functions that perform these operations. Leading and trailing whitespace, if present, is stripped from string arguments before the result is constructed.
| Function | Meaning |
|---|---|
fn:QName | Returns an xs:QName value formed using a supplied namespace URI and lexical QName. |
fn:parse-QName | Returns an xs:QName value formed by parsing an EQName. |
fn:resolve-QName | Returns an xs:QName value (that is, an expanded-QName) by taking an xs:string that has the lexical form of an xs:QName (a string in the form "prefix:local-name" or "local-name") and resolving it using the in-scope namespaces for a given element. |
Returns an xs:QName value formed by parsing an EQName.
fn:parse-QName( | ||
$value | as | |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on namespaces.
If $value is anthe empty sequence, the result is anthe empty sequence.
Otherwise, leading and trailing whitespace in $value is stripped: call the result V
If V is castable to xs:NCName, the result is fn:QName("", $value): that is, a QName in no namespace.
If V is in the lexical space of xs:QName (that is, if it is in the form prefix:local), the result is xs:QName($value). Note that this result depends on the in-scope prefixes in the static context, and may result in various error conditions.
If V takes the form of a XPath URIQualifiedNameXP (that is, Q{uri}local, where the uri part may be zero-length, or Q{uri}prefix:local), then the result is fn:QName(uri, local) or fn:QName(uri, prefix:local) respectively.
The rules used for parsing a BracedURILiteralXP within a URIQualifiedNameXP are the XPath rules, not the XQuery rules (the XQuery rules require special characters such as < and & to be escaped).
A dynamic error is raised [err:FOCA0002] if the supplied value of $value, after whitespace normalization, does not match the XPath production EQNameXP, or if the input is a URIQualifiedName in which the namespace prefix is present but the namespace URI is absent.
A dynamic error is raised [err:FONS0004] if the supplied value of $value, after whitespace normalization, is in the form prefix:local (with a non-absent prefix), and the prefix cannot be resolved to a namespace URI using the in-scope namespace bindings from the static context.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | declare namespace xmp = "http://www.example.com/ns";
fn:parse-QName("xmp:person") |
| Result: | fn:QName("http://www.example.com/ns", "xmp:person") |
This section specifies functions and an operator on QNames as defined in [XML Schema Part 2: Datatypes Second Edition].
| Function | Meaning |
|---|---|
op:QName-equal | Returns true if two supplied QNames have the same namespace URI and the same local part. |
fn:prefix-from-QName | Returns the prefix component of the supplied QName. |
fn:local-name-from-QName | Returns the local part of the supplied QName. |
fn:namespace-uri-from-QName | Returns the namespace URI part of the supplied QName. |
fn:expanded-QName | Returns a string representation of an xs:QName in the format Q{uri}local. |
Returns true if two supplied QNames have the same namespace URI and the same local part.
Defines the semantics of the eq and ne operators when applied to two values of type xs:QName.
op:QName-equal( | ||
$arg1 | as , | |
$arg2 | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function returns true if the namespace URIs of $arg1 and $arg2 are equal and the local names of $arg1 and $arg2 are equal.
Otherwise, the function returns false.
The namespace URI parts are considered equal if they are both absentDM, or if they are both present and equal under the rules of the fn:codepoint-equal function.
The local parts are also compared under the rules of the fn:codepoint-equal function.
The prefix parts of $arg1 and $arg2, if any, are ignored.
Returns the prefix component of the supplied QName.
fn:prefix-from-QName( | ||
$value | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence the function returns the empty sequence.
If $value has no prefix component the function returns the empty sequence.
Otherwise, the function returns an xs:NCName representing the prefix component of $value.
Returns the local part of the supplied QName.
fn:local-name-from-QName( | ||
$value | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence the function returns the empty sequence.
Otherwise, the function returns an xs:NCName representing the local part of $value.
| Expression: | local-name-from-QName(
QName("http://www.example.com/example", "person")
) |
|---|---|
| Result: | "person" |
Returns the namespace URI part of the supplied QName.
fn:namespace-uri-from-QName( | ||
$value | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence the function returns the empty sequence.
Otherwise, the function returns an xs:anyURI representing the namespace URI part of $value.
If $value is in no namespace, the function returns the zero-length xs:anyURI.
| Expression: | namespace-uri-from-QName(
QName("http://www.example.com/example", "person")
) |
|---|---|
| Result: | xs:anyURI("http://www.example.com/example") |
Returns a string representation of an xs:QName in the format Q{uri}local.
fn:expanded-QName( | ||
$value | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence, returns the empty sequence.
The result is a string in the format Q{uri}local, where:
uri is the result of fn:string(fn:namespace-uri-from-QName($value)) (which will be a zero-length string if the QName is in no namespace), and
local is the result of fn:local-name-from-QName($value).
There is no escaping of special characters in the namespace URI. If the namespace URI contains curly braces, the resulting string will not be a valid BracedURILiteralXP.
| Expression: | QName("http://example.com/", "person")
=> expanded-QName() |
|---|---|
| Result: | "Q{http://example.com/}person" |
| Expression: | QName("", "person")
=> expanded-QName() |
| Result: | "Q{}person" |
This section defines operators that take xs:NOTATION values as arguments.
| Function | Meaning |
|---|---|
op:NOTATION-equal | Returns true if the two xs:NOTATION values have the same namespace URI and the same local part. |
Returns true if the two xs:NOTATION values have the same namespace URI and the same local part.
Defines the semantics of the eq and ne operators when applied to two values of type xs:NOTATION.
op:NOTATION-equal( | ||
$arg1 | as , | |
$arg2 | as | |
) as | ||
The function returns true if the namespace URIs of $arg1 and $arg2 are equal and the local names of $arg1 and $arg2 are equal.
Otherwise, the function returns false.
The namespace URI parts are considered equal if they are both absentDM, or if they are both present and equal under the rules of the fn:codepoint-equal function.
The local parts are also compared under the rules of the fn:codepoint-equal function.
The prefix parts of $arg1 and $arg2, if any, are ignored.
There are no functions designed explicitly to process xs:NOTATION items.
However, some generic functions such as fn:atomic-equal and fn:compare can be used on xs:NOTATION items.
Accessors and their semantics are described in [XQuery and XPath Data Model (XDM) 4.0]. Some of these accessors are exposed to the user through the functions described below.
Each of these functions has an arity-zero signature which is equivalent to the arity-one form, with the context value supplied as the implicit first argument. In addition, each of the arity-one functions accepts anthe empty sequence as the argument, in which case it generally delivers anthe empty sequence as the result: the exception is fn:string, which delivers a zero-length string.
| Function | Accessor | Accepts | Returns |
|---|---|---|---|
fn:node-name | node-name | node (optional) | xs:QName (optional) |
fn:nilled | nilled | node (optional) | xs:boolean (optional) |
fn:string | string-value | item (optional) | xs:string |
fn:data | typed-value | zero or more items | a sequence of atomic items |
fn:base-uri | base-uri | node (optional) | xs:anyURI (optional) |
fn:document-uri | document-uri | node (optional) | xs:anyURI (optional) |
| Function | Meaning |
|---|---|
fn:base-uri | Returns the base URI of a node. |
fn:document-uri | Returns the URI of a resource where a document can be found, if available. |
fn:nilled | Returns true for an element that is nilled. |
fn:node-name | Returns the name of a node, as an xs:QName. |
fn:string | Returns the value of $value represented as an xs:string. |
fn:data | Returns the result of atomizing a sequence. This process flattens arrays, and replaces nodes by their typed values. |
Returns the base URI of a node.
fn:base-uri( | ||
$node | as | := . |
) as | ||
The zero-argument form of this function is deterministic, context-dependent, and focus-dependent.
The one-argument form of this function is deterministic, context-independent, and focus-independent.
The zero-argument version of the function returns the base URI of the context node: it is equivalent to calling fn:base-uri(.).
The single-argument version of the function behaves as follows:
If $node is the empty sequence, the function returns the empty sequence.
Otherwise, the function returns the value of the dm:base-uri accessor applied to the node $node. This accessor is defined, for each kind of node, in the XDM specification (See [XQuery and XPath Data Model (XDM) 4.0] section 7.6.2 base-uri Accessor).
Note:
As explained in XDM, document, element and processing-instruction nodes have a base-uri property which may be empty. The base-uri property for all other node kinds is the empty sequence. The dm:base-uri accessor returns the base-uri property of a node if it exists and is non-empty; otherwise it returns the result of applying the dm:base-uri accessor to its parent, recursively. If the node does not have a parent, or if the recursive ascent up the ancestor chain encounters a parentless node whose base-uri property is empty, the empty sequence is returned. In the case of namespace nodes, however, the result is always anthe empty sequence — it does not depend on the base URI of the parent element.
See also fn:static-base-uri.
The following errors may be raised when $node is omitted:
If the context value is absentDM, type error [err:XPDY0002]XP
If the context value is not an instance of the sequence type node()?, type error [err:XPTY0004]XP.
A dynamic error may be raised [err:FORG0002] if the base URI is not a valid Legacy Extended IRI reference (see [Legacy extended IRIs for XML resource identification]).
Returns the name of a node, as an xs:QName.
fn:node-name( | ||
$node | as | := . |
) as | ||
The zero-argument form of this function is deterministic, context-dependent, and focus-dependent.
The one-argument form of this function is deterministic, context-independent, and focus-independent.
If the argument is omitted, it defaults to the context value (.).
If $node is the empty sequence, the empty sequence is returned.
Otherwise, the function returns the result of the dm:node-name accessor as defined in [XQuery and XPath Data Model (XDM) 3.1] (see [XQuery and XPath Data Model (XDM) 4.0] section 7.6.10 node-name Accessor).
The following errors may be raised when $node is omitted:
If the context value is absentDM, type error [err:XPDY0002]XP.
If the context value is not an instance of the sequence type node()?, type error [err:XPTY0004]XP.
For element and attribute nodes, the name of the node is returned as an xs:QName, retaining the prefix, namespace URI, and local part.
For processing instructions, the name of the node is returned as an xs:QName in which the prefix and namespace URI are absentDM.
For a namespace node, the function returns anthe empty sequence if the node represents the default namespace; otherwise it returns an xs:QName in which prefix and namespace URI are absentDM and the local part is the namespace prefix being bound.
For all other kinds of node, the function returns the empty sequence.
| Variables | |
|---|---|
let $e := <doc> <p id="alpha" xml:id="beta">One</p> <p id="gamma" xmlns="http://example.com/ns">Two</p> <ex:p id="delta" xmlns:ex="http://example.com/ns">Three</ex:p> <?pi 3.14159?> </doc> | |
| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Returns the result of atomizing a sequence. This process flattens arrays, and replaces nodes by their typed values.
fn:data( | ||
$input | as | := . |
) as | ||
The zero-argument form of this function is deterministic, context-dependent, and focus-dependent.
The one-argument form of this function is deterministic, context-independent, and focus-independent.
If the argument is omitted, it defaults to the context value (.).
The result of fn:data is the sequence of atomic items produced by applying the following rules to each item in $input:
If the item is an atomic item, it is appended to the result sequence.
If the item is an XNodeDM, the typed value of the node is appended to the result sequence. The typed value is a sequence of zero or more atomic items: specifically, the result of the dm:typed-value accessor as defined in [XQuery and XPath Data Model (XDM) 3.1] (See [XQuery and XPath Data Model (XDM) 4.0] section 7.6.14 typed-value Accessor).
If the item is a JNodeDM, the atomized value of its ·content· property is appended to the result sequence.
If the item is an array, the result of applying fn:data to each member of the array, in order, is appended to the result sequence.
A type error is raised [err:FOTY0012] if an item in the sequence $input is a node that does not have a typed value.
A type error is raised [err:FOTY0013] if an item in the sequence $input is a function item other than an array.
A type error is raised [err:XPDY0002]XP if $input is omitted and the context value is absentDM.
The process of applying the fn:data function to a sequence is referred to as atomization. In many cases an explicit call on fn:data is not required, because atomization is invoked implicitly when a node or sequence of nodes is supplied in a context where an atomic item or sequence of atomic items is required.
The result of atomizing anthe empty sequence is anthe empty sequence.
The result of atomizing anthe empty array is anthe empty sequence.
| Variables | |
|---|---|
let $para := <para>There lived a <term author="Tolkien">hobbit</term>.</para> | |
| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
|
|
| Raises error FOTY0013. |
This section specifies further functions that return properties of nodes. Nodes are formally defined in 6 Nodes DM31.
| Function | Meaning |
|---|---|
fn:has-children | Returns true if the supplied GNode has one or more child nodes (of any kind). |
fn:in-scope-namespaces | Returns the in-scope namespaces of an element node, as a map. |
fn:in-scope-prefixes | Returns the prefixes of the in-scope namespaces for an element node. |
fn:lang | This function tests whether the language of $node, or the context value if the second argument is omitted, as specified by xml:lang attributes is the same as, or is a sublanguage of, the language specified by $language. |
fn:local-name | Returns the local part of the name of $node as an xs:string that is either the zero-length string, or has the lexical form of an xs:NCName. |
fn:name | Returns the name of a node, as an xs:string that is either the zero-length string, or has the lexical form of an xs:QName. |
fn:namespace-uri | Returns the namespace URI part of the name of $node, as an xs:anyURI value. |
fn:namespace-uri-for-prefix | Returns the namespace URI of one of the in-scope namespaces for $element, identified by its namespace prefix. |
fn:path | Returns a path expression that can be used to select the supplied node relative to the root of its containing document. |
fn:root | Returns the root of the tree to which $node belongs. The function can be applied both to XNodesDM and to JNodesDM. |
fn:siblings | Returns the supplied GNode together with its siblings, in document order. |
Returns true if the supplied GNode has one or more child nodes (of any kind).
fn:has-children( | ||
$node | as | := . |
) as | ||
The zero-argument form of this function is deterministic, context-dependent, and focus-dependent.
The one-argument form of this function is deterministic, context-independent, and focus-independent.
If the argument is omitted, it defaults to the context value (.).
Provided that the supplied argument $node matches the expected type gnode()?, the result of the function call fn:has-children($node) is defined to be the same as the result of the expression fn:exists($node/child::gnode()).
The following errors may be raised when $node is omitted:
If the context value is absentDM, type error [err:XPDY0002]XP
If the context value is not an instance of the sequence type gnode()?, type error [err:XPTY0004]XP.
If $node is anthe empty sequence the result is false.
The motivation for this function is to support streamed evaluation. According to the streaming rules in [XSL Transformations (XSLT) Version 4.0], the following construct is not streamable:
<xsl:if test="exists(row)">
<ulist>
<xsl:for-each select="row">
<item><xsl:value-of select="."/></item>
</xsl:for-each>
</ulist>
</xsl:if>This is because it makes two downward selections to read the child row elements. The use of fn:has-children in the xsl:if conditional is intended to circumvent this restriction.
Although the function was introduced to support streaming use cases, it has general utility as a convenience function.
If the supplied argument is a map or an array, it will automatically be coerced to a JNode.
| Variables | |
|---|---|
let $e := <doc> <p id="alpha">One</p> <p/> <p>Three</p> <?pi 3.14159?> </doc> | |
| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This function tests whether the language of $node, or the context value if the second argument is omitted, as specified by xml:lang attributes is the same as, or is a sublanguage of, the language specified by $language.
fn:lang( | ||
$language | as , | |
$node | as | := . |
) as | ||
The one-argument form of this function is deterministic, context-dependent, and focus-dependent.
The two-argument form of this function is deterministic, context-independent, and focus-independent.
The behavior of the function if the second argument is omitted is exactly the same as if the context value (.) had been passed as the second argument.
The language of the argument $node, or the context value if the second argument is omitted, is determined by the value of the xml:lang attribute on the node, or, if the node has no such attribute, by the value of the xml:lang attribute on the nearest ancestor of the node that has an xml:lang attribute. If there is no such ancestor, then the function returns false.
If $language is the empty sequence it is interpreted as the zero-length string.
The relevant xml:lang attribute is determined by the value of the XPath expression:
(ancestor-or-self::*/@xml:lang)[last()]
If this expression returns anthe empty sequence, the function returns false.
Otherwise, the function returns true if and only if, based on a caseless default match as specified in section 3.13 of [The Unicode Standard], either:
$language is equal to the string-value of the relevant xml:lang attribute, or
$language is equal to some substring of the string-value of the relevant xml:lang attribute that starts at the start of the string-value and ends immediately before a hyphen, - (HYPHEN-MINUS, #x002D).
The following errors may be raised when $node is omitted:
If the context value is absentDM, type error [err:XPDY0002]XP
If the context value is not a single node, type error [err:XPTY0004]XP.
The expression | |
| |
The expression |
Returns the supplied GNode together with its siblings, in document order.
fn:siblings( | ||
$node | as | := . |
) as | ||
The zero-argument form of this function is deterministic, context-dependent, and focus-dependent.
The one-argument form of this function is deterministic, context-independent, and focus-independent.
If the $node argument is omitted, it defaults to the context value (.).
If the value of $node is anthe empty sequence, the function returns anthe empty sequence.
If $node is a child of some parent GNode P, the function returns all the children of P (including $node), in document order, as determined by the value of $node/child::gnode().
Otherwise (specifically, if $node is parentless, or if it is an attribute or namespace node), the function returns $node.
The effect of the function is equivalent to the result of the following XPath expression.
if ($node intersect $node/parent::node()/child::node()) then $node/parent::node()/child::node() else $node
The following errors may be raised when $node is omitted:
If the context value is absentDM, type error [err:XPDY0002]XP
If the context value is not an instance of the sequence type node()?, type error [err:XPTY0004]XP.
The result of siblings($n) (except in error cases) is the same as the result of $n/(preceding-sibling::node() | following-sibling-or-self::node()). It is also the same as $n/(preceding-sibling-or-self::node() | following-sibling::node())
As with names such as parent and child, the word sibling used here as a technical term is not a precise match to its use in describing human family relationships, but is chosen for convenience.
| Variables | |
|---|---|
let $e := <doc x="X"><a>A</a>text<?pi 3.14159?></doc> | |
| Expression | Result |
|---|---|
|
|
|
|
|
|
The functions included in this section operate on function items, that is, values referring to a function.
[Definition] Functions that accept functions among their arguments, or that return functions in their result, are described in this specification as higher-order functions.
Note:
Some functions such as fn:parse-json allow the option of supplying a callback function for example to define exception behavior. Where this is not essential to the use of the function, the function has not been classified as higher-order for this purpose; in applications where function items cannot be created, these particular options will not be available.
| Function | Meaning |
|---|---|
fn:function-lookup | Returns a function item having a given name and arity, if there is one. |
fn:function-name | Returns the name of the function identified by a function item. |
fn:function-arity | Returns the arity of the function identified by a function item. |
fn:function-identity | Returns a string representing the identity of a function item. |
fn:function-annotations | Returns the annotations of the function item. |
Returns a function item having a given name and arity, if there is one.
fn:function-lookup( | ||
$name | as , | |
$arity | as | |
) as | ||
This function is deterministic, context-dependent, and focus-dependent.
A call to fn:function-lookup starts by looking for a function definitionXP in the named functions component of the dynamic context (specifically, the dynamic context of the call to fn:function-lookup), using the expanded QName supplied as $name and the arity supplied as $arity. There can be at most one such function definition.
If no function definition can be identified (by name and arity), then anthe empty sequence is returned.
If a function definition is identified, then a function item is obtained from the function definition using the same rules as for evaluation of a named function reference (see [XML Path Language (XPath) 4.0] section 4.6.5 Named Function References). The captured context of the returned function item (if it is context dependent) is the static and dynamic context of the call on fn:function-lookup.
If the arguments to fn:function-lookup identify a function that is present in the static context of the function call, the function will always return the same function that a static reference to this function would bind to. If there is no such function in the static context, then the results depend on what is present in the dynamic context, which is implementation-defined.
An error is raised if the identified function depends on components of the static or dynamic context that are not present, or that have unsuitable values. For example [err:XPDY0002]XP is raised for the call function-lookup( #fn:name, 0 ) if the context value is absent, and [err:FODC0001] is raised for the call function-lookup( #fn:id, 1 ) if the context value is not a single node in a tree that is rooted at a document node. The error that is raised is the same as the error that would be raised by the corresponding function if called with the same static and dynamic context.
This function can be useful where there is a need to make a dynamic decision on which of several statically known functions to call. It can thus be used as a substitute for polymorphism, in the case where the application has been designed so several functions implement the same interface.
The function can also be useful in cases where a query or stylesheet module is written to work with alternative versions of a library module. In such cases the author of the main module might wish to test whether an imported library module contains or does not contain a particular function, and to call a function in that module only if it is available in the version that was imported. A static call would cause a static error if the function is not available, whereas getting the function using fn:function-lookup allows the caller to take fallback action in this situation.
If the function that is retrieved by fn:function-lookup is context-dependent, that is, if it has dependencies on the static or dynamic context of its caller, the context that applies is the static and/or dynamic context of the call to the fn:function-lookup function itself. The context thus effectively forms part of the closure of the returned function. This mainly applies when the target of fn:function-lookup is a built-in function, because user-defined functions typically have no dependency on the static or dynamic context of the function call (an exception arises when the expressions used to define default values for parameters are context-dependent). The rule applies recursively, since fn:function-lookup is itself a context-dependent built-in function.
However, the static and dynamic context of the call to fn:function-lookup may play a role even when the selected function definition is not itself context dependent, if the expressions used to establish default parameter values are context dependent.
User-defined XSLT or XQuery functions should be accessible to fn:function-lookup only if they are statically visible at the location where the call to fn:function-lookup appears. This means that private functions, if they are not statically visible in the containing module, should not be accessible using fn:function-lookup.
The function identity is determined in the same way as for a named function reference. Specifically, if there is no context dependency, two calls on fn:function-lookup with the same name and arity must return the same function.
These specifications do not define any circumstances in which the dynamic context will contain functions that are not present in the static context, but neither do they rule this out. For example an API may provide the ability to add functions to the dynamic context, and such functions may potentially be context-dependent.
The mere fact that a function exists and has a name does not of itself mean that the function is present in the dynamic context. For example, functions obtained through use of the fn:load-xquery-module function are not added to the dynamic context.
| Expression: |
|
|---|---|
| Result: |
|
The expression | |
The expression let $f := function-lookup( #zip:binary-entry, 2 ) return if (exists($f)) then $f($source, $entry) else () returns the result of calling | |
Returns the name of the function identified by a function item.
fn:function-name( | ||
$function | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $function refers to a named function, fn:function-name($func) returns the name of that function.
Otherwise ($function refers to an anonymous function), fn:function-name($function) returns anthe empty sequence.
The prefix part of the returned QName is implementation-dependent.
| Expression: |
|
|---|---|
| Result: | QName("http://www.w3.org/2005/xpath-functions", "fn:substring")(The namespace prefix of the returned QName is not predictable.) |
| Expression: |
|
| Result: |
|
Returns the annotations of the function item.
fn:function-annotations( | ||
$function | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The fn:function-annotations function returns the annotations of $function as a sequence of single-entry maps, each associating the name of a function annotation with the value of the annotation. Note that several annotations on a function can share the same name. The order of the annotations is retained.
The result is a sequence of single-entry maps, each being an instance of map(xs:QName, xs:anyAtomicType*). If a function (for example, a built-in function) has no annotations, the result of the function is anthe empty sequence.
For each annotation, a map is returned, with a single entry. The key of the map entry is the name of the annotation as an xs:QName. The value of the entry is the value of the annotation as a sequence of atomic items. If the annotation has no values, the associated value is anthe empty sequence.
In the common case where the annotation names are all unique, the result of the function can readily be converted into single map by applying the function map:merge.
| Expression: | function-annotations(true#0) |
|---|---|
| Result: | () |
| Expression: | declare %private function local:inc($c) { $c + 1 };
function-annotations(local:inc#1) |
| Result: | { #Q{http://www.w3.org/2012/xquery}private : () } |
| Expression: | let $old := %local:deprecated('0.1', '0.2') fn() {}
let $ann := function-annotations($old)
return map:merge($ann) |
| Result: | {
#Q{http://www.w3.org/2005/xquery-local-functions}deprecated :
("0.1", "0.2")
} |
Maps were introduced as a new datatype in XDM 3.1. This section describes functions that operate on maps.
A map is a kind of item.
[Definition] A map consists of a sequence of entries, also known as key-value pairs. Each entry comprises a key which is an arbitrary atomic item, and an arbitrary sequence called the associated value.
[Definition] Within a map, no two entries have the same key. Two atomic items K1 and K2 are the same key for this purpose if the function call fn:atomic-equal($K1, $K2) returns true.
It is not necessary that all the keys in a map should be of the same type (for example, they can include a mixture of integers and strings).
Maps are immutable, and have no identity separate from their content. For example, the map:remove function returns a map that differs from the supplied map by the omission (typically) of one entry, but the supplied map is not changed by the operation. Two calls on map:remove with the same arguments return maps that are indistinguishable from each other; there is no way of asking whether these are “the same map”.
A map can also be viewed as a function from keys to associated values. To achieve this, a map is also a function item. The function corresponding to the map has the signature function($key as xs:anyAtomicValue) as item()*. Calling the function has the same effect as calling the map:get function: the expression $map($key) returns the same result as get($map, $key). For example, if $books-by-isbn is a map whose keys are ISBNs and whose assocated values are book elements, then the expression $books-by-isbn("0470192747") returns the book element with the given ISBN. The fact that a map is a function item allows it to be passed as an argument to higher-order functions that expect a function item as one of their arguments.
The XDM data model ([XQuery and XPath Data Model (XDM) 4.0]) defines three primitive operations on maps:
dm:empty-map constructs anthe empty map.
dm:map-put adds or replaces an entry in a map.
dm:iterate-map applies a supplied function to every entry in a map.
The functions in this section are all specified by means of equivalent expressions that either call these primitives directly, or invoke other functions that rely on these primitives. The specifications avoid relying on XPath language constructs that manipulate maps, such as map constructor syntax, lookup expressions, or FLWOR expressions. This is done to allow these language constructs to be specified by reference to this function library, without risk of circularity.
There is one exception to this rule: for convenience, the notation {} is used to represent anthe empty map, in preference to a call on dm:empty-map().
The formal equivalents are not intended to provide a realistic way of implementating the functions (in particular, any real implementation might be expected to implement map:get and map:put much more efficiently). They do, however, provide a framework that allows the correctness of a practical implementation to be verified.
| Editorial note | |
TODO: as yet there is no formal equivalent for map:find(). | |
The functions defined in this section use a conventional namespace prefix map, which is assumed to be bound to the namespace URI http://www.w3.org/2005/xpath-functions/map.
The function call map:get($map, $key) can be used to retrieve the value associated with a given key.
There is no operation to atomize a map or convert it to a string. The function fn:serialize can in some cases be used to produce a JSON representation of a map.
Note that when the required type of an argument to a function such as map:build is a map type, then the coercion rules ensure that a JNode can be supplied in the function call: if the ·content· property of the JNode is a map, then the map is automatically extracted as if by the jnode-content function.
| Function | Meaning |
|---|---|
map:build | Returns a map that typically contains one entry for each item in a supplied input sequence. |
map:contains | Tests whether a supplied map contains an entry for a given key. |
map:empty | Returns true if the supplied map contains no entries. |
map:entries | Returns a sequence containing all the key-value pairs present in a map, each represented as a single-entry map. |
map:entry | Returns a single-entry map that represents a single key-value pair. |
map:filter | Selects entries from a map, returning a new map. |
map:find | Searches the supplied input sequence and any contained maps and arrays for a map entry with the supplied key, and returns the corresponding values. |
map:for-each | Applies a supplied function to every entry in a map, returning the sequence concatenationXP of the results. |
map:get | Returns the value associated with a supplied key in a given map. |
map:items | Returns a sequence containing all the values present in a map, in order. |
map:keys | Returns a sequence containing all the keys present in a map. |
map:keys-where | Returns a sequence containing selected keys present in a map. |
map:merge | Returns a map that combines the entries from a number of existing maps. |
map:put | Returns a map containing all the contents of the supplied map, but with an additional entry, which replaces any existing entry for the same key. |
map:remove | Returns a map containing all the entries from a supplied map, except those having a specified key. |
map:size | Returns the number of entries in the supplied map. |
Changes in 4.0 (next | previous)
New in 4.0 [Issues 584 843 1074 1133 PRs 969 1134 11 July 2023]
The $predicate callback function may return anthe empty sequence (meaning false). [Issue 1171 PR 1182 7 May 2024]
Enhanced to allow for ordered maps. [Issue 1651 PR 1703 14 January 2025]
The $action callback function now accepts an optional position argument. [Issue 1718 PR 2224 2 October 2025]
Selects entries from a map, returning a new map.
map:filter( | ||
$map | as , | |
$predicate | as | |
) as | ||
This function is context-independent, and focus-independent.
The function map:filter takes any map as its $map argument and applies the supplied function to each entry in the map; the result is a new map containing those entries for which the function returns true. A return value of () from the predicate is treated as false.
The function supplied as $predicate takes three arguments. It is called supplying the key of the map entry as the first argument, the associated value as the second argument, and the 1-based integer position as the third argument.
The relative order of entriesDM in the returned map is the same as their relative order in $map.
The effect of the function is equivalent to the result of the following XPath expression.
map:for-each($map, fn($key, $value, $position) {
if ($predicate($key, $value, $position)) {
map:entry($key, $value)
}
})
=> map:merge()| Expression: | map:filter(
{ 1: "Sunday", 2: "Monday", 3: "Tuesday", 4: "Wednesday",
5: "Thursday", 6: "Friday", 7: "Saturday" },
fn($k, $v) { $k = (1, 7) }
) |
|---|---|
| Result: | { 1: "Sunday", 7: "Saturday" } |
| Expression: | map:filter(
{ 1: "Sunday", 2: "Monday", 3: "Tuesday", 4: "Wednesday",
5: "Thursday", 6: "Friday", 7: "Saturday" },
fn($k, $v) { $v = ("Saturday", "Sunday") }
) |
| Result: | { 1: "Sunday", 7: "Saturday" } |
| Expression: | let $en-ja := { 'one': '一', 'two': '二', 'three': '三' }
return map:filter($en-ja, fn($en, $ja, $pos) {
$pos mod 2 = 1
}) |
| Result: | { 'one': '一', 'three': '三' } |
Searches the supplied input sequence and any contained maps and arrays for a map entry with the supplied key, and returns the corresponding values.
map:find( | ||
$input | as , | |
$key | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function map:find searches the sequence supplied as $input looking for map entries whose key is the same key as $key. The associated value in any such map entry (each being in general a sequence) is returned as a member of the result array.
The search processes the $input sequence using the following recursively defined rules (any equivalent algorithm may be used provided it delivers the same result, respecting those rules that constrain the order of the result):
To process a sequence, process each of its items in order.
To process an item that is an array, process each of its members in order (each member is, in general, a sequence).
To process an item that is a map, then for each key-value entry (K, V) in the map (in entry orderDM) perform both of the following steps, in order:
If K is the same key as $key, then add V as a new member to the end of the result array.
Process V (which is, in general, a sequence).
To process an item that is neither a map nor an array, do nothing. (Such items are ignored).
If $input is anthe empty sequence, map, or array, or if the requested $key is not found, the result will be a zero-length array.
| Variables | |
|---|---|
let $responses := [
{ 0: 'no', 1: 'yes' },
{ 0: 'non', 1: 'oui' },
{ 0: 'nein', 1: ('ja', 'doch') }
] | |
let $inventory := {
"name": "car",
"id": "QZ123",
"parts": [ { "name": "engine", "id": "YW678", "parts": [] } ]
} | |
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: | [
[ { "name": "engine", "id": "YW678", "parts": [] } ],
[]
] |
Returns the value associated with a supplied key in a given map.
map:get( | ||
$map | as , | |
$key | as , | |
$default | as | := () |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function map:get attempts to find an entry within the map supplied as $map that has the same key as $key. If there is such an entry, it returns the associated value; if not, it returns the supplied $default value, which defaults to the empty sequence.
The function is defined as follows, making use of primitive constructors and accessors defined in [XQuery and XPath Data Model (XDM) 4.0].
let $entry := dm:iterate-map($map, fn($k, $v) {
if (atomic-equal($k, $key)) {
map:entry($k, $v)
}
})
return (
if (exists($entry))
then map:items($entry)
else $default
)A return value of () from map:get#2 could indicate that the key is present in the map with an associated value of (), or it could indicate that the key is not present in the map. The two cases can be distinguished by either by calling map:contains to test whether an entry is present, or by using a $default value to return a value known never to appear in the map.
Invoking the map as a function item has the same effect as calling get with no $default argument: that is, when $map is a map, the expression $map($K) is equivalent to map:get($map, $K). Similarly, the expression map:get(map:get(map:get($map, 'employee'), 'name'), 'first') can be written as $map('employee')('name')('first').
| Variables | |
|---|---|
let $week := {
0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch",
4: "Donnerstag", 5: "Freitag", 6: "Samstag"
} | |
| Expression | Result |
|---|---|
|
|
|
(When the key is not present, the function returns anthe empty sequence.) |
|
(An empty sequence as the result can also signify that the key is present and the associated value is anthe empty sequence.) |
|
(The third argument supplies a default value.) |
Returns a sequence containing selected keys present in a map.
map:keys-where( | ||
$map | as , | |
$predicate | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
Informally, the function map:keys takes any map as its $map argument. The $predicate function takes the key and the value of the corresponding map entry as an argument, and the result is a sequence containing the keys of those entries for which the predicate function returns true, in entry orderDM.
A return value of () from the predicate function is treated as false.
The effect of the function is equivalent to the result of the following XPath expression.
map:for-each($map, fn($key, $value) {
if ($predicate($key, $value)) { $key }
})| Expression: | let $numbers := {
0: "zero",
1: "one",
2: "two",
3: "three"
}
return map:keys-where(
$numbers,
fn($key, $value) { $value = ("two", "three") }
) |
|---|---|
| Result: | (2, 3) |
| Expression: | let $square := map:merge(
(1 to 5) ! map:entry(., . * .)
)
return map:keys-where(
$square,
fn($key, $value) { $value > 5 and $value < 20 }
) |
| Result: | (3, 4) |
| Expression: | let $birthdays := {
"Agnieszka": xs:date("1980-12-31"),
"Jabulile": xs:date("2001-05-05"),
"Joel": xs:date("1969-11-10"),
"Midori": xs:date("2012-01-08")
}
return map:keys-where($birthdays, fn($name, $date) {
starts-with($name, "J") and year-from-date($date) = 1969
}) |
| Result: | "Joel" |
Returns a map that combines the entries from a number of existing maps.
map:merge( | ||
$maps | as , | |
$options | as | := {} |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function map:merge returns a map that is formed by combining the contents of the maps supplied in the $maps argument.
Informally, the supplied maps are combined as follows:
There is one entry in the returned map for each distinct key present in the union of the input maps, where two keys are distinct if they are not the same key. The order of the input maps, and of the entries within these input maps, is retained in the entry orderDM of the result map.
If there are duplicate keys, that is, if two or more maps contain entries having the same key, then the relevant entries are combined in a way that is controlled by the supplied $options.
The $options argument takes the same values (with the same meanings) as the map:build function, except that the default is different: for map:merge, the default for duplicate keys is use-first.
Note:
The difference is for backwards compatibility reasons.
With the default options, when duplicate entries occur:
There will be a single entry in the result corresponding to a set of duplicate entries in the input.
The value of that entry will be taken from the first of the duplicates.
The position of that entry in the entry orderDM of the result map will correspond to the position of the first of the duplicates.
The key of the combined entry will correspond to the key of one of the duplicates: it is implementation-dependent which one is chosen. (Keys may be duplicates even though they differ: for example, they may have different type annotations, or they may be xs:dateTime values in different timezones.)
An error is raised [err:FOJS0003] if the value of $options indicates that duplicates are to be rejected, and a duplicate key is encountered.
An error is raised [err:FOJS0005] if the value of $options includes an entry whose key is defined in this specification, and whose value is not a permitted value for that key.
If the input is anthe empty sequence, the result is anthe empty map.
If the input is a sequence of length one, the result map is indistinguishable from the input map.
There is no requirement that the supplied input maps should have the same or compatible types. The type of a map (for example map(xs:integer, xs:string)) is descriptive of the entries it currently contains, but is not a constraint on how the map may be combined with other maps.
The XSLT 3.0 recommendation included a specification of this function that incorrectly used the option value { 'duplicates': 'unspecified' } in place of { 'duplicates': 'use-any' }. XSLT implementations wishing to preserve backwards compatibility may choose to retain support for this setting.
| Variables | |
|---|---|
let $week := {
0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch",
4: "Donnerstag", 5: "Freitag", 6: "Samstag"
} | |
| Expression: |
|
|---|---|
| Result: |
(Returns anthe empty map). |
| Expression: | map:merge(( map:entry(0, "no"), map:entry(1, "yes") )) |
| Result: | { 0: "no", 1: "yes" }(Returns a map with two entries). |
| Expression: | map:merge(({ "red": 0 }, { "green": 1}, { "blue": 2 }))
=> map:keys() |
| Result: | "red", "green", "blue" (Note the order of the result.) |
| Expression: | map:merge(
($week, { 7: "Unbekannt" })
) |
| Result: | { 0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch",
4: "Donnerstag", 5: "Freitag", 6: "Samstag", 7: "Unbekannt" }(The value of the existing map is unchanged; the returned map contains all the entries from |
| Expression: | map:merge(
($week, { 6: "Sonnabend" }),
{ "duplicates": "use-last" }
) |
| Result: | { 0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch",
4: "Donnerstag", 5: "Freitag", 6: "Sonnabend" }(The value of the existing map is unchanged; the returned map contains all the entries from |
| Expression: | map:merge(
($week, { 6: "Sonnabend" }),
{ "duplicates": "use-first" }
) |
| Result: | { 0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch",
4: "Donnerstag", 5: "Freitag", 6: "Samstag" }(The value of the existing map is unchanged; the returned map contains all the entries from |
| Expression: | map:merge(
($week, { 6: "Sonnabend" }),
{ "duplicates": "combine" }
) |
| Result: | { 0: "Sonntag", 1: "Montag", 2: "Dienstag", 3: "Mittwoch",
4: "Donnerstag", 5: "Freitag", 6: ("Samstag", "Sonnabend") }(The value of the existing map is unchanged; the returned map contains all the entries from |
| Expression: | map:merge(
({ "oxygen": 0.22, "hydrogen": 0.68, "nitrogen": 0.1 },
{ "oxygen": 0.24, "hydrogen": 0.70, "nitrogen": 0.06 }),
{ "duplicates": fn($a, $b) { max(($a, $b)) } }) |
| Result: | { "oxygen": 0.24, "hydrogen": 0.70, "nitrogen": 0.1 }(The result map holds, for each distinct key, the maximum of the values for that key in the input.) |
The fn:element-to-map function converts a tree rooted at an XML element node to a corresponding tree of maps, in a form suitable for serialization as JSON. In effect it provides a mechanism for converting XML to JSON.
This section describes the mappings used by this function.
This mapping is designed with three objectives:
It should be possible to represent any XML element as a map suitable for JSON serialization.
The resulting JSON should be intuitive and easy to use.
The JSON should be consistent and stable: small variations in the input should not result in large variations in the output.
Achieving all three objectives requires design compromises. It also requires sacrificing some other desiderata. In consequence:
The conversion is not lossless (see 14.5.8 Lost XDM Information for details).
The conversion is not streamable.
The results are not necessarily compatible with those produced by other popular libraries.
The requirement for consistency and stability is particularly challenging. An element such as <name>John</name> maps naturally to the map { "name": "John" }; but adding an attribute (so it becomes <name role="first">John</name>) then requires an incompatible change in the JSON representation. The format could be made extensible by converting <name>John</name> to { "name": {"#content":"John"} } and <name role="first">John</name> to { "name": { "@role":"first", "#content":"John" } }, but this imposes unwanted complexity on the simplest cases. The solution adopted is threefold:
It is possible to analyze a corpus of XML documents to develop a conversion plan, which can then be applied consistently to individual input documents, whether or not these documents were present in the corpus. The conversion plan can be serialized and subsequently reused, so that it can be applied to input documents that might not have existed at the time the conversion plan was formulated.
Alternatively, the function can make use of schema information where available, so it considers not just the structure of an individual element instance, but the rules governing the element type.
It is possible to override the choices made by the system, and explicitly specify the format to be used for elements or attributes having a given name.
The key challenge in mapping XML to JSON is in deciding how element content is to be represented. To illustrate the variety of mappings that are possible, the following table lists some examples of typical XML elements and their JSON equivalents:
| XML element | JSON equivalent |
|---|---|
<hr/> | "hr": "" |
<date-of-birth>2023-05-18</date-of-birth> | "date-of-birth": "2023-05-18" |
<box width="5" height="10"/> | "box": { "@width": "5", "@height": "10" } |
<label id="t41">Warning!</label> | "label": { "@id": "t41", "#content": "Warning!" } |
<box>
<width>5</width>
<height>10</height>
</box> | "box": {
"width": 5,
"height": 10
} |
<polygon>
<point x="0" y="0"/>
<point x="1" y="0"/>
<point x="1" y="1"/>
<point x="0" y="1"/>
</polygon> | "polygon": [
{ "x": 0, "y": 0 },
{ "x": 1, "y": 0 },
{ "x": 1, "y": 1 },
{ "x": 0, "y": 1 }
] |
This specification defines a number of named mappings, called layouts, and allows the layout for a particular element to be selected in a number of different ways:
The layout to be used for a specific elements can be explicitly selected by supplying a conversion plan as input to the fn:element-to-map function.
It is possible to construct a conversion plan by analyzing a corpus of documents using the fn:element-to-map-plan function.
It is also possible to construct a conversion plan manually, or to modify the conversion plan produced by the fn:element-to-map-plan function before use.
In the absence of an explicit conversion plan, if the data has been schema-validated, the layout is inferred from the content model for the element type as defined in the schema.
When the data is untyped and no specific layout has been selected, a default layout is chosen based on the properties of the individual element instance.
The advantage of using schema information is that it gives a consistent representation for all elements of a particular type, even if they vary in content: for example if an element type allows optional attributes, the JSON representation will be consistent between those elements that have attributes and those without. In the absence of a schema, consistency can be achieved by supplying a conversion plan that applies uniformly to multiple documents.
The different layouts available are defined in the following sections. For each layout there is a table showing:
Layout name: the name to be used to select this layout in a conversion plan supplied to the fn:element-to-map function.
Usage: the situations for which this layout is designed.
Example input: an example of a typical element for which this layout is appropriate, shown as serialized XML.
Example output: the result of converting this example, shown as serialized JSON. The result is always shown as a singleton map, which is how it will appear when the layout is used for the top-level elements supplied in the $elements argument; when used to convert a descendant element, the corresponding key-value pair may appear as part of a larger map, depending on the layout chosen for its parent element..
Note:
The fn:element-to-map function produces a map as its result, but it is convenient to illustrate the form of the map by showing the effect of serializing the map as JSON.
Mapping rules: The rules for mapping the XML element to an XDM map representation.
Mapping for nilled elements: special rules that apply to an element having the attribute xsi:nil="true". These rules only apply if the element has been schema-validated.
Errors: situations where the layout cannot be used, and where attempting to use it will fail. For example, the empty layout cannot be used for an element that is not empty. In such a situation the recovery action is as follows, in order:
Attributes are dropped, and if this is sufficient to enable the layout to be used, then the element is converted without its attributes.
If the type of an element or attribute in the conversion plan is given as boolean or numeric, but the actual value of the element or attribute is not castable to xs:boolean or xs:numeric respectively, then the node is output ignoring the type property, that is, as an instance of xs:untypedAtomic.
If the conversion plan supplies a fallback layout (an entry with key "*"), then the fallback layout is used.
The element-to-map function fails with a dynamic error.
The rules for selecting the layout for a particular element are given later, in 14.5.5 Selecting an element layout.
Note that it is possible to request any layout for any element. If an inappropriate layout is chosen for a particular element (for example, empty layout for an element that is not empty), then the rules for that layout specify what happens. It is possible to specify a fallback layout for use when the selected layout fails: this will typically be a layout such as xml or mixed that can handle any element.
Note:
Acknowledgements for this categorization: see [Goessner]. Although Goessner's categories have been used, the detailed mappings vary from his proposal.
| Layout name |
|
|---|---|
| Usage | Intended for XML elements that have no content but may have attributes. |
| Example input | <hr class="ccc" id="zzz"/> |
| Example output | { "hr": { "@class": "ccc", "@id": "zzz" } } |
| Mapping rules | The content is represented by a map containing one entry for each attribute in the XML element; if there are no attributes, the content is represented as anthe empty map. The rules for attribute names are defined in 14.5.6 Element and Attribute Names, and the rules for attribute content in 14.5.7 Element and Attribute Content. |
| Mapping for nilled elements | An additional key-value pair |
| Errors | Child comment nodes, processing instructions, and whitespace-only text nodes are discarded. If any other child nodes are present, this layout fails. |
| Layout name |
|
|---|---|
| Usage | Intended for XML elements that act as wrappers for a list of child elements, all having the same element name; neither the element itself nor any of its children should have any attributes. The expected child element name may be present in the conversion plan. The names of the child elements are not retained in the output. |
| Example input (1) | <dates> <date>2023-03-20</date> <date>2023-04-12</date> <date>2023-05-30</date> </dates> |
| Example output (1) | { "dates": [ "2023-03-20", "2023-04-12", "2023-05-30" ] } |
| Example input (2) | <dates> <date><year>2023</year><month>03</month><day>20</day></date> <date><year>2023</year><month>04</month><day>12</day></date> <date><year>2023</year><month>05</month><day>30</day></date> </dates> |
| Example output (2) | { "dates": [
{ "year": "2023", "month": "03", "day": "20" },
{ "year": "2023", "month": "04", "day": "12" },
{ "year": "2023", "month": "05", "day": "30" }
] } |
| Mapping rules | The content is represented by an array, whose members correspond one-to-one with the children of the element. Each child element is converted to a map as if it were a top-level element: the resulting map contains a single key-value pair. The key part is discarded, and the value part is used as a member in the resulting array. If there are no children then the content is represented by anthe empty array. |
| Mapping for nilled elements | The array is replaced by the value |
| Errors | Attributes are discarded for both the element itself, and its children. Comments, processing instructions, and whitespace text nodes in the content are discarded. This layout fails if any child element is present with a name that differs from the expected child element name, or if there are non-whitespace text node children. |
| Layout name |
|
|---|---|
| Usage | Intended for XML elements that act as wrappers for a list of child elements, all having the same element name. The wrapper element may have attributes, but the children should not. and the name of the child elements is retained in the output. |
| Example input (1) | <dates id="x"> <date>2023-03-20</date> <date>2023-04-12</date> <date>2023-05-30</date> </dates> |
| Example output (1) | "dates": { "@id": "x", "date": ["2023-03-20", "2023-04-12", "2023-05-30"]} |
| Example input (2) | <dates id="x"> <date><year>2023</year><month>03</month><day>20</day></date> <date><year>2023</year><month>04</month><day>12</day></date> <date><year>2023</year><month>05</month><day>30</day></date> </dates> |
| Example output (2) | { "dates": {
"@id": "x",
"date": [
{ "year": "2023", "month": "03", "day": "20" },
{ "year": "2023", "month": "04", "day": "12" },
{ "year": "2023", "month": "05", "day": "30" }
] } } |
| Mapping rules | The content is represented by a map containing one entry for each attribute in the XML element, plus a property named after the child elements (the content property), whose value is an array containing the results of formatting the content in the same way as the If there are no children and the element is untyped (which can occur when this layout is chosen explicitly via the options to |
| Mapping for nilled elements | The array-valued entry in the result is replaced by the entry |
| Errors | Any attributes on the element's children are discarded. Comments, processing instructions, and whitespace text nodes in the content are discarded. This layout fails if any child element is present with a name that differs from the expected child element name, or if there are non-whitespace text node children. |
Converts an element node into a map that is suitable for JSON serialization.
fn:element-to-map( | ||
$element | as , | |
$options | as | := {} |
) as | ||
This function is deterministic, context-independent, and focus-independent.
This function returns a map derived from the element node supplied in $element. The map is in a form that is suitable for JSON serialization, thus providing a mechanism for conversion of arbitrary XML to JSON.
The map that is returned will always be a single-entry map; the key of this entry will be a string representing the element name, and the value of the entry will be a representation of the element's attributes and children.
The entries that may appear in the $options map are as follows. The option parameter conventions apply.
record( | |
plan? | as map(xs:string, record(layout?, child?, type?, *)), |
attribute-marker? | as xs:string, |
name-format? | as xs:string |
) | |
| Key | Value | Meaning |
|---|---|---|
| A conversion plan, supplied as a map whose keys represent element and attribute names. The plan might be generated using the function element-to-map-plan, or it might be constructed in some other way. The format of the plan is described in 14.5.2 Creating a conversion plan.
| |
| A string that is prepended to any key value in the output that represents an XDM attribute node in the input. The string may be empty. If, after applying the requested prefix (or no prefix) there is a conflict between the names of attributes and child elements, then the requested prefix (or lack thereof) is ignored and the default prefix "@" is used.
| |
| Indicates how the names of element and attribute nodes are handled.
| |
lexical | Names are output in the form produced by the fn:name function. | |
local | Names are output in the form produced by the fn:local-name function. | |
eqname | Names in a namespace are output in the form "Q{uri}local". Names in no namespace are output using the local name alone. | |
default | An element name is output as a local name alone if either (a) it is a top-level element and is in no namespace, or (b) it is in the same namespace as its parent element. An attribute name is output as a local name alone if it is in no namespace. All other names are output in the format "Q{uri}local" if in a namespace, or "Q{}local" if in no namespace. "Top-level" here means that the element is one that appears explicitly in the sequence of elements passed in the $elements argument, as distinct from a descendant of such an element. | |
If $element is anthe empty sequence, the result is anthe empty sequence.
The principles for conversion from elements to maps are described in 14.5.1 Element Layouts, and the rules for selecting an element layout for each element are given in 14.5.5 Selecting an element layout.
In general, every descendant element within the tree rooted at the supplied $element maps to a key-value pair in which the key represents the element name, and the corresponding value represents the attributes and children of the element. This key-value pair will be added to the content representing its parent element, in a way that depends on the parent element's layout.
The representation of a node of any other kind depends on the layout chosen for its parent element.
A dynamic error [err:FOJS0008] occurs if any element cannot be processed using the selected layout for that element, unless fallback processing is defined; or if error action is explicitly requested for an element.
Any error in the conversion plan is treated as a type error [err:XPTY0004]XP whether or not it is technically a contravention of the defined type for the value. This relieves users and implementers of the burden of distinguishing different kinds of error in the plan.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | element-to-map(
<list>
<item value='1'/>
<item value='2'/>
</list>, { 'attribute-marker': '' }
) |
| Result: | { "list": [
{ "value": "1" },
{ "value": "2" }
] } |
| Expression: | element-to-map(
<name>
<first>Jane</first>
<last>Smith</last>
</name>
) |
| Result: | { "name": {
"first": "Jane",
"last": "Smith"
} } |
| Expression: | element-to-map(
<name xmlns="http://example.ns/">
<first>Jane</first>
<middle>Elizabeth</middle>
<middle>Mary</middle>
<last>Smith</last>
</name>,
{ 'plan': {'Q{http://example.ns/}name': { 'layout': 'record' }},
'name-format' : 'local'
}
) |
| Result: | { "name": {
"first": "Jane",
"middle": ["Elizabeth", "Mary"],
"last": "Smith"
}
} |
| Expression: | element-to-map(
<name xmlns="http://example.ns/">
<first>Jane</first>
<middle>Elizabeth</middle>
<middle>Mary</middle>
<last>Smith</last>
</name>,
{ 'plan': {'Q{http://example.ns/}name': { 'layout': 'record' },
'Q{http://example.ns/}middle': { 'layout': 'deep-skip' }
},
'name-format' : 'local'
}
) |
| Result: | { "name": {
"first": "Jane",
"last": "Smith"
}
} |
This section is non-normative.
Because a map is a function item, functions that apply to functions also apply to maps. A map is an anonymous function, so fn:function-name returns the empty sequence; fn:function-arity always returns 1.
Maps may be compared using the fn:deep-equal function.
There is no function or operator to atomize a map or convert it to a string (other than fn:serialize, which can be used to serialize some maps as JSON texts).
XPath 4.0 defines a number of syntactic constructs that operate on maps. These all have equivalents in the function library:
The expression {} creates anthe empty map (see [XML Path Language (XPath) 4.0] section 4.14.1.1 Map Constructors). This is equivalent to the effect of the data model primitive dm:empty-map(). Using user-visible functions the same can be achieved by calling map:build or map:merge, supplying anthe empty sequence as the argument.
The map constructor { K1 : V1, K2 : V2, ... , Kn : Vn } is equivalent to map:merge((map:entry(K1, V1), map:entry(K1, V1), ..., map:entry(Kn, Vn)), { "duplicates": "reject" })
The lookup expression $map?* (see [XML Path Language (XPath) 4.0] section 4.14.3 Lookup Expressions) is equivalent to map:items($map).
The lookup expression $map?K, where K is a key value, is equivalent to map:get($map, K)
The expression for key $k value $v in $map return EXPR (see [XQuery 4.0: An XML Query Language] section 4.13.2 For Clause and [XML Path Language (XPath) 4.0] section 4.13.1 For Expressions) is equivalent to the function call map:for-each($map, fn($k, $v) { EXPR }).
Maps can be filtered using the construct $map?[predicate] (see [XML Path Language (XPath) 4.0] section 4.14.5 Filter Expressions for Maps and Arrays).
Arrays were introduced as a new datatype in XDM 3.1. This section describes functions that operate on arrays.
An array is an additional kind of item. An array of size N is a mapping from the integers (1 to N) to a set of values, called the members of the array, each of which is an arbitrary sequence. Because an array is an item, and therefore a sequence, arrays can be nested.
An array acts as a function from integer positions to associated values, so the function call $array($index) can be used to retrieve the array member at a given position. The function corresponding to the array has the signature function($index as xs:integer) as item()*. The fact that an array is a function item allows it to be passed as an argument to higher-order functions that expect a function item as one of their arguments.
The XDM data model ([XQuery and XPath Data Model (XDM) 4.0]) defines three primitive operations on arrays:
dm:empty-array constructs anthe empty array.
dm:array-append adds a member to an array.
dm:iterate-array applies a supplied function to every member of an array, in order.
The functions in this section are all specified by means of equivalent expressions that either call these primitives directly, or invoke other functions that rely on these primitives. The specifications avoid relying on XPath language constructs that manipulate arrays, such as array constructor syntax, lookup expressions, or FLWOR expressions. This is done to allow these language constructs to be specified by reference to this function library, without risk of circularity.
There is one exception to this rule: for convenience, the notation [] is used to represent anthe empty array, in preference to a call on dm:empty-array().
The formal equivalents are not intended to provide a realistic way of implementating the functions. They do, however, provide a framework that allows the correctness of a practical implementation to be verified.
The functions defined in this section use a conventional namespace prefix array, which is assumed to be bound to the namespace URI http://www.w3.org/2005/xpath-functions/array.
As with all other values, arrays are treated as immutable. For example, the array:reverse function returns an array that differs from the supplied array in the order of its members, but the supplied array is not changed by the operation. Two calls on array:reverse with the same argument will return arrays that are indistinguishable from each other; there is no way of asking whether these are “the same array”. Like sequences, arrays have no identity.
All functionality on arrays is defined in terms of two primitives:
The function array:members decomposes an array to a sequence of value records.
The function array:of-members composes an array from a sequence of value records.
A value record here is an item that encapsulates an arbitrary value; the representation chosen for a value record is record(value as item()*), that is, a map containing a single entry whose key is the string "value" and whose value is the encapsulated sequence.
Note that when the required type of an argument to a function such as array:build is an array type, then the coercion rules ensure that a JNode can be supplied in the function call: if the ·content· property of the JNode is an array, then the array is automatically extracted as if by the jnode-content function.
| Function | Meaning |
|---|---|
array:append | Returns an array containing all the members of a supplied array, plus one additional member at the end. |
array:build | Returns an array obtained by evaluating the supplied function once for each item in the input sequence. |
array:empty | Returns true if the supplied array contains no members. |
array:filter | Returns an array containing those members of the $array for which $predicate returns true. A return value of () is treated as false. |
array:flatten | Replaces any array appearing in a supplied sequence with the members of the array, recursively. |
array:fold-left | Evaluates the supplied function cumulatively on successive members of the supplied array. |
array:fold-right | Evaluates the supplied function cumulatively on successive values of the supplied array. |
array:foot | Returns the last member of an array. |
array:for-each | Returns an array whose size is the same as array:size($array), in which each member is computed by applying $action to the corresponding member of $array. |
array:for-each-pair | Returns an array obtained by evaluating the supplied function once for each pair of members at the same position in the two supplied arrays. |
array:get | Returns the value at the specified position in the supplied array (counting from 1). |
array:head | Returns the first member of an array, that is $array(1). |
array:index-of | Returns a sequence of positive integers giving the positions within the array $array of members that are equal to $target. |
array:index-where | Returns the positions in an input array of members that match a supplied predicate. |
array:insert-before | Returns an array containing all the members of the supplied array, with one additional member at a specified position. |
array:items | Returns the sequence concatenation of the members of an array. |
array:join | Concatenates the contents of several arrays into a single array. |
array:members | Delivers the contents of an array as a sequence of value records. |
array:of-members | Constructs an array from the contents of a sequence of value records. |
array:put | Returns an array containing all the members of a supplied array, except for one member which is replaced with a new value. |
array:remove | Returns an array containing all the members of the supplied array, except for the members at specified positions. |
array:reverse | Returns an array containing all the members of a supplied array, but in reverse order. |
array:size | Returns the number of members in the supplied array. |
array:slice | Returns an array containing selected members of a supplied input array based on their position. |
array:sort | Sorts a supplied array, based on the value of a sort key supplied as a function. |
array:sort-by | Sorts a supplied array, based on the value of a number of sort keys supplied as functions. |
array:sort-with | Sorts a supplied array, according to the order induced by the supplied comparator functions. |
array:split | Delivers the contents of an array as a sequence of single-member arrays. |
array:subarray | Returns an array containing all members from a supplied array starting at a supplied position, up to a specified length. |
array:tail | Returns an array containing all members except the first from a supplied array. |
array:trunk | Returns an array containing all members except the last from a supplied array. |
Returns true if the supplied array contains no members.
array:empty( | ||
$array | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function returns true if and only if $array contains no members.
The effect of the function is equivalent to the result of the following XPath expression.
array:size($array) eq 0
The test for emptiness is not the same as the test used by the xsl:on-empty instruction in XSLT. For example, an array is not considered empty by this function if it contains a single member that is itself anthe empty array.
| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
Returns an array containing those members of the $array for which $predicate returns true. A return value of () is treated as false.
array:filter( | ||
$array | as , | |
$predicate | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
Informally, the function returns an array containing those members of the input array that satisfy the supplied predicate.
The effect of the function is equivalent to the result of the following XQuery expression.
array:of-members(
filter(
array:members($array),
fn($item, $pos) { $predicate(map:get($item, 'value'), $pos) }
)
)As a consequence of the function signature and the function calling rules, a type error occurs if the supplied function $function returns anything other than a single xs:boolean item or anthe empty sequence; there is no conversion to an effective boolean value.
| Expression: | array:filter(
[ "A", "B", 1, 2 ],
fn($x) { $x instance of xs:integer }
) |
|---|---|
| Result: | [ 1, 2 ] |
| Expression: | array:filter(
[ "the cat", "sat", "on the mat" ],
function { count(tokenize(.)) > 1 }
) |
| Result: | [ "the cat", "on the mat" ] |
| Expression: |
|
| Result: |
|
| Expression: | let $array := [ 1, 1, 2, 3, 4, 4, 5 ]
return array:filter(
$array,
fn($item, $pos) { $pos > 1 and $item = $array($pos - 1) }
) |
| Result: | [ 1, 4 ] |
Returns the positions in an input array of members that match a supplied predicate.
array:index-where( | ||
$array | as , | |
$predicate | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The result of the function is a sequence of integers, in monotonic ascending order, representing the 1-based positions in the input array of those members for which the supplied predicate function returns true. A return value of () is treated as false.
The effect of the function is equivalent to the result of the following XPath expression.
dm:iterate-array($array, fn($member, $pos) {
if ($predicate($member, $pos)) { $pos }
})| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | array:index-where(
array { 1 to 10 },
function { . mod 2 = 0 }
) |
| Result: | 2, 4, 6, 8, 10 |
| Expression: | array:index-where(
[ "January", "February", "March", "April",
"May", "June", "July", "August", "September",
"October", "November", "December" ],
contains(?, "r")
) |
| Result: | 1, 2, 3, 4, 9, 10, 11, 12 |
| Expression: | array:index-where(
[ (1, 2, 3), (4, 5, 6), (7, 8) ],
fn($m) { count($m) = 3 }
) |
| Result: | 1, 2 |
| Expression: | array:index-where(
[ 1, 8, 2, 7, 3 ],
fn($member, $pos) { $member < 5 and $pos > 2 }
) |
| Result: | 3, 5 |
Sorts a supplied array, based on the value of a number of sort keys supplied as functions.
array:sort-by( | ||
$array | as , | |
$keys | as | |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on collations.
The result of the function is an array that contains the same members as $array, typically in a different order, the order being defined by the supplied sort key definitions.
A sort key definition is a record with three parts:
key: A sort key function, which is applied to each member of the input sequence to determine a sort key value. If no function is supplied, the default is fn:data#1, which atomizes the value of the array member.
collation: A collation, which is used when comparing sort key values that are of type xs:string or xs:untypedAtomic. If no collation is supplied, the default collation from the static context is used.
When comparing values of types other than xs:string or xs:untypedAtomic, the collation is ignored (but an error may be reported if it is invalid). For more information see 5.3.7 Choosing a collation.
order: An order direction, either "ascending" or "descending". The default is "ascending".
The number of sort key definitions is determined by the number of records supplied in the $keys argument. If the argument is absent or empty, the default is a single sort key definition using the function data#1, using the default collation from the static context, and with order ascending.
The result of the array:sort-by function is obtained as follows:
The result array contains the same members as $array, but generally in a different order.
The sort key definitions are established as described above. The sort key definitions are in major-to-minor order. That is, the position of two values $A and $B in the result sequence is determined first by the relative magnitude of their primary sort key values, which are computed by evaluating the sort key function in the first sort key definition. If those two sort key values are equal, then the position is determined by the relative magnitude of their secondary sort key values, computed by evaluating the sort key function in the second sort key definition, and so on.
When a pair of corresponding sort key values of $A and $B are found to be not equal, then $A precedes $B in the result sequence if both the following conditions are true, or if both conditions are false:
The sort key value for $A is less than the sort key value for $B, as defined below.
The order direction in the corresponding sort key definition is "ascending".
If all the sort key values for $A and $B are pairwise equal, then $A precedes $B in the result sequence if and only if $A precedes $B in the input sequence.
Note:
That is, the sort is stable.
Each sort key value for a given array member is obtained by applying the sort key function of the corresponding sort key definition to that member. The result of this function is in the general case a sequence of atomic items. Two sort key values $a and $b are compared as follows:
Let $C be the collation in the corresponding sort key definition.
Let $REL be the result of evaluating op:lexicographic-compare($key($A), $key($B), $C) where op:lexicographic-compare($a, $b, $C) is defined as follows:
if (empty($a) and empty($b)) then 0
else if (empty($a)) then -1
else if (empty($b)) then +1
else let $rel = op:simple-compare(head($a), head($b), $C)
return if ($rel eq 0)
then op:lexicographic-compare(tail($a), tail($b), $C)
else $relHere op:simple-compare($k1, $k2) is defined as follows:
if ($k1 instance of (xs:string | xs:anyURI | xs:untypedAtomic)
and $k2 instance of (xs:string | xs:anyURI | xs:untypedAtomic))
then compare($k1, $k2, $C)
else if ($k1 instance of xs:numeric and $k2 instance of xs:numeric)
then compare($k1, $k2)
else if ($k1 eq $k2) then 0
else if ($k2 lt $k2) then -1
else +1Note:
This raises an error if two keys are not comparable, for example if one is a string and the other is a number, or if both belong to a non-ordered type such as xs:QName.
If $REL is zero, then the two sort key values are deemed equal; if $REL is -1 then $a is deemed less than $b, and if $REL is +1 then $a is deemed greater than $b
The effect of the function is equivalent to the result of the following XPath expression.
$array
=> array:members()
=> fn:sort-by(
for $key-spec in ($keys otherwise {})
return map:put{$key-spec, 'key', fn($member as record(value)) as xs:anyAtomicType* {
map:get($key-spec, 'key', fn:data#1)(map:get($member, 'value'))
}
)
=> array:of-members()If the set of computed sort keys contains values that are not comparable using the lt operator then the sort operation will fail with a type error ([err:XPTY0004]XP).
The function is a generalization of the array:sort function available in 3.1, which is retained for compatibility. The enhancements allow multiple sort keys to be defined, each potentially with a different collation, and allow sorting in descending order.
If the sort key for an item evaluates to anthe empty sequence, the effect of the rules is that this item precedes any value for which the key is non-empty. This is equivalent to the effect of the XQuery option empty least. The effect of the option empty greatest can be achieved by adding an extra sort key definition with {'key': fn{empty(K(.)}}: when comparing boolean sort keys, false precedes true.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
Changes in 4.0 (next | previous)
Supplying anthe empty sequence as the value of an optional argument is equivalent to omitting the argument.
Returns an array containing all members from a supplied array starting at a supplied position, up to a specified length.
array:subarray( | ||
$array | as , | |
$start | as , | |
$length | as | := () |
) as | ||
This function is deterministic, context-independent, and focus-independent.
Except in error cases, the two-argument version of the function returns the same result as the three-argument version when called with $length equal to the value of array:size($array) - $start + 1.
Setting the third argument to the empty sequence has the same effect as omitting the argument.
The effect of the function is equivalent to the result of the following XPath expression, except in error cases.
$array => array:members() => subsequence($start, $length) => array:of-members()
A dynamic error is raised [err:FOAY0001] if $start is less than one or greater than array:size($array) + 1.
For the three-argument version of the function:
A dynamic error is raised [err:FOAY0002] if $length is less than zero.
A dynamic error is raised [err:FOAY0001] if $start + $length is greater than array:size($array) + 1.
The value of $start can be equal to array:size($array) + 1 provided that $length is either equal to zero or omitted. In this case the result will be anthe empty array.
| Expression | Result |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Returns an array containing all members except the first from a supplied array.
array:tail( | ||
$array | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function returns an array containing all members of the supplied array except the first.
The effect of the function is equivalent to the result of the following XPath expression.
array:remove($array, 1)
A dynamic error occurs [err:FOAY0001] if $array is empty.
If the supplied array contains exactly one member, the result will be anthe empty array.
| Expression | Result |
|---|---|
|
|
|
|
Returns an array containing all members except the last from a supplied array.
array:trunk( | ||
$array | as | |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function returns an array containing all members of the supplied array except the last.
The effect of the function is equivalent to the result of the following XPath expression.
array:remove($array, array:size($array))
A dynamic error occurs [err:FOAY0001] if $array is empty.
If the supplied array contains exactly one member, the result will be anthe empty array.
| Expression | Result |
|---|---|
|
|
|
|
This section is non-normative.
Arrays may be compared using the fn:deep-equal function.
The XPath language provides explicit syntax for certain operations on arrays. These constructs can all be specified in terms of function primitives:
AnThe empty array can be constructed using either of the expressions [] or array{}. The effect is the same as the data model primitive dm:empty-array(()) (see [XML Path Language (XPath) 4.0] section 4.14.2.1 Array Constructors). Using user-visible functions it can be achieved by calling array:build(()) or array:of-members(()).
The expression array { $sequence } constructs an array whose members are the items in $sequence. Every member of this array will be a singleton item. The effect is the same as array:build($sequence).
The expression [E1, E2, E3, ..., En] constructs an array in which E1 is the first member, E2 is the second member, and so on. The result is equivalent to the expression [] => array:append(E1) => array:append(E2) => ... => array:append(En))).
The lookup expression $array?* returns the sequence concatenationXP of the members of the array. It is equivalent to calling array:fold-left($array, (), fn($result, $next){ $result, $next }).
The lookup expression $array?$N, where $N is an integer within the bounds of the array, is equivalent to array:get($array, $N).
Similarly, applying the array as a function, $array($N), is also equivalent to array:get($array, [$N])
The expression for member $m in $array return EXPR is equivalent to array:for-each($array, fn($m){ EXPR }) (see [XQuery 4.0: An XML Query Language] section 4.13.2 For Clause and [XML Path Language (XPath) 4.0] section 4.13.1 For Expressions).
Arrays can be filtered using the construct $array?[predicate] (see [XML Path Language (XPath) 4.0] section 4.14.5 Filter Expressions for Maps and Arrays).
A JNodeDM is a wrapper around a map or array, or around a value that appears within the content of a map or array. JNodes are described at [XQuery and XPath Data Model (XDM) 4.0] section 8.4 JNodes. Wrapping a map or array in a JNode enables the use of path expressions such as $jnode/descendant::title, as described at [XML Path Language (XPath) 4.0] section 4.7 Path Expressions.
In addition to the functions defined in this section, functions that operate on JNodes include:
fn:distinct-ordered-nodesfn:generate-idfn:has-childrenfn:innermostfn:outermostfn:pathfn:rootfn:siblingsfn:transitive-closure
Returns the ·content· property of a JNode.
fn:jnode-content( | ||
$input | as | := . |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If the argument is omitted, it defaults to the context value (.).
If $input is anthe empty sequence, the function returns anthe empty sequence.
Otherwise, the function returns the ·content· property of $input.
The following errors may be raised when $node is omitted:
If the context value is absentDM, type error [err:XPDY0002]XP.
If the context value is not an instance of the sequence type jnode()?, type error [err:XPTY0004]XP.
In many cases it is unnecessary to make an explicit call on jnode-content, because the coercion rules will take care of this automatically. For example, in an expression such as $X / descendant::name [matches(., '^J')], the call on matches is supplied with a JNode as its first argument; atomization ensures that the actual value being passed to the first argument of matches is the atomized value of the ·content· property.
Other examples where the ·content· of a JNode is extracted automatically include:
Any context where the required type is an atomic value, for example arithmetic operations, value comparisons and general comparisons, and calls on functions that expect an atomic value.
Any context where the required type is a map or array, for example the first argument of functions such as map:size or array:size, a free-standing expression within a map constructor such as map{ $jnode }, the constructs for member and for key/value, the left-hand operand of the lookup operator ? (or the context value in the case of a unary lookup operator), and the operand of a map/array filter expression $jnode?[predicate].
Notable places where the ·content· is not automatically extracted include:
When computing the effective boolean value. As with XNodes, writing if ($array/child::*[1]) ... tests for the existence of a child, it does not test its value. To test its value, write if (jnode-content($array/child::*[1])) ..., or equivalently if (xs:boolean($array/child::*[1])) ....
When calling functions that accept arbitrary sequences, such as count or deep-equal.
It is possible (though probably unwise) to construct a JNode whose ·content· property itself contains another JNode. For example, the expression jtree([jtree([]), jtree([])]) creates a JNode whose ·content· is an array of JNodes, and applying the child axis to this JNode will return a sequence of two JNodes that themselves have further JNodes as their content. The jnode-content returns these contained JNodes, it does not recursively extract their content.
| Expression: | let $array := [1, 3, 4.5, 7, "eight", 10] return $array / child::type(xs:integer) =!> jnode-content() |
|---|---|
| Result: | 1, 3, 7, 10 |
| Expression: | let $map := {'Mo': 'Monday', 'Tu': 'Tuesday', 'We': 'Wednesday'}
return $map / child::get(("Mo", "We", "Fr", "Su")) =!> jnode-content() |
| Result: | "Monday", "Wednesday" |
| Expression: | let $array := [[4, 18], [30, 4, 22]] return $array / descendant::*[. > 25][1] / ancestor-or-self::* =!> jnode-content() |
| Result: | [[4, 18], [30, 4, 22]], [30, 4, 22] |
Returns the ·selector· property of a JNode.
fn:jnode-selector( | ||
$input | as | := . |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If the argument is omitted, it defaults to the context value (.).
If $input is anthe empty sequence, the function returns anthe empty sequence.
If $input is a root JNode (one in which the ·selector· property is absent), the function returns anthe empty sequence.
Otherwise, the function returns the ·selector· property of $input. In the case where the parent JNode wraps a map, this will be the key of the relevant entry within that map; in the case where the parent JNode wraps an array, it will be the 1-based index of the relevant member of the array.
The following errors may be raised when $node is omitted:
If the context value is absentDM, type error [err:XPDY0002]XP.
If the context value is not an instance of the sequence type jnode()?, type error [err:XPTY0004]XP.
| Expression: | let $array := [1, 3, 4.5, 7, "eight", 10] return $array / child::type(xs:integer) =!> jnode-selector() |
|---|---|
| Result: | 1, 2, 4, 6 |
| Expression: | let $map := {'Mo': 'Monday', 'Tu': 'Tuesday', 'We': 'Wednesday'}
return $map / child::get(("Mo", "We", "Fr", "Su")) =!> jnode-selector() |
| Result: | "Mo", "We" |
| Expression: | let $array := [[4, 18], [30, 4, 22]] return $array / descendant::*[. > 25] / ancestor-or-self::* =!> jnode-selector() |
| Result: | 2, 1 |
Returns the ·position· property of a JNode.
fn:jnode-position( | ||
$input | as | := . |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If the argument is omitted, it defaults to the context value (.).
If $input is anthe empty sequence, the function returns anthe empty sequence.
If $input is a root JNode (one in which the ·position· property is absent), the function returns anthe empty sequence.
Otherwise, the function returns the ·position· property of $input. The value of this property will be 1 (one) except in cases where the value of an entry in a map, or a member in an array, is a sequence that contains multiple items including maps and/or arrays; in such cases the position will be the 1-based position of the relevant map or array.
The following errors may be raised when $node is omitted:
If the context value is absentDM, type error [err:XPDY0002]XP.
If the context value is not an instance of the sequence type jnode()?, type error [err:XPTY0004]XP.
This function is relevant only when there are maps whose entries are multi-item sequences that include maps and arrays, or arrays whose members include such multi-item sequences. Such structures are uncommon, and never arise from parsing of JSON source text. It is generally best to avoid such structures by using arrays rather than sequences within array and map content; apart from other considerations, this allows the data to be serialized in JSON format.
If an entry within a map, or a member of an array, contains a sequence of items that mixes arrays and maps with other content (for example the array [1, 2, ([1,2], [3,4], 5)), then a lookup using the child axis will only construct JNodes in respect of those items that are non-empty maps or arrays. This may leave gaps in the position numbering sequence, as illustrated in the examples below.
| Expression: | let $input := {
"a": [10, 20, 30],
"b": ([40, 50, 60], [], 0, [70, 80, (90, 100)])
}
return $input / child::b / *
! { "position": jnode-position(),
"index": jnode-selector(),
"value": jnode-content()
} |
|---|---|
| Result: | { "position": 1, "index": 1, "value": 40 },
{ "position": 1, "index": 2, "value": 50 },
{ "position": 1, "index": 3, "value": 60 },
{ "position": 4, "index": 1, "value": 70 },
{ "position": 4, "index": 2, "value": 80 },
{ "position": 4, "index": 3, "value": (90, 100) } |
| Expression: | let $input := {
"a": {"x": 10, "y": 20, "z": 30},
"b": ( {"x": 40, "y": 50, "z": 60},
{},
{"x": 70, "y": 80, "z": (90, 100)})
}
return $input / child::b / *
! { "position": jnode-position(),
"key": jnode-selector(),
"value": jnode-content()
} |
| Result: | { "position": 1, "key": "x", "value": 40 },
{ "position": 1, "key": "y", "value": 50 },
{ "position": 1, "key": "z", "value": 60 },
{ "position": 3, "key": "x", "value": 70 },
{ "position": 3, "key": "y", "value": 80 },
{ "position": 3, "key": "z", "value": (90, 100) } |
These functions in this section access resources external to a query or stylesheet, and convert between external file formats and their XPath and XQuery data model representation.
The functions in this section provide access to resources (such as files) in the external environment.
| Function | Meaning |
|---|---|
fn:doc | Retrieves a document using a URI supplied as an xs:string, and returns the corresponding document node. |
fn:doc-available | The function returns true if and only if the function call fn:doc($source, $options) would return a document node. |
fn:collection | Returns a sequence of items identified by a collection URI; or a default collection if no URI is supplied. |
fn:uri-collection | Returns a sequence of xs:anyURI values representing the URIs in a URI collection. |
fn:unparsed-text | The fn:unparsed-text function reads an external resource (for example, a file) and returns a string representation of the resource. |
fn:unparsed-text-lines | The fn:unparsed-text-lines function reads an external resource (for example, a file) and returns its contents as a sequence of strings, one for each line of text in the string representation of the resource. |
fn:unparsed-text-available | Allows an application to determine whether a call on fn:unparsed-text with particular arguments would succeed. |
fn:unparsed-binary | The fn:unparsed-binary function reads an external resource (for example, a file) and returns its contents in binary. |
fn:environment-variable | Returns the value of a system environment variable, if it exists. |
fn:available-environment-variables | Returns a list of environment variable names that are suitable for passing to fn:environment-variable, as a (possibly empty) sequence of strings. |
Changes in 4.0 (next | previous)
The rule that multiple calls on fn:doc supplying the same absolute URI must return the same document node has been clarified; in particular the rule does not apply if the dynamic context for the two calls requires different processing of the documents (such as schema validation or whitespace stripping). [Issue 898 PR 905 9 January 2024]
An $options parameter is added. Note that the rules for the $options parameter control aspects of processing that were implementation-defined in earlier versions of this specification. An implementation may provide configuration options designed to retain backwards-compatible behavior when no explicit options are supplied. [Issue 1021 PR 1910 6 April 2025]
Retrieves a document using a URI supplied as an xs:string, and returns the corresponding document node.
fn:doc( | ||
$source | as , | |
$options | as | := {} |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on available documents, and executable base URI.
If $source is the empty sequence, the result is anthe empty sequence.
If $source is a relative URI reference, it is resolved relative to the value of the executable base URIXP property from the dynamic context of the caller. The resulting absolute URI is promoted to an xs:string.
If the [TERMDEF dt-available-documents IN XP40] provides a mapping from this string to a document node, the function returns that document node.
The way in which an absolute URI is dereferenced to obtain an external resource is described in [XML Path Language (XPath) 4.0] section 2.3 External Resources and Security.
Note:
The ability to access external resources depends on whether the calling code is trustedXP.
The URI may include a fragment identifier.
The $options argument, if present and non-empty, defines the detailed behavior of the function. The option parameter conventions apply. The options available are as follows:
record( | |
trusted? | as xs:boolean, |
dtd-validation? | as xs:boolean, |
stable? | as xs:boolean, |
strip-space? | as xs:boolean?, |
xinclude? | as xs:boolean, |
xsd-validation? | as xs:string, |
use-xsi-schema-location? | as xs:boolean |
) | |
| Key | Value | Meaning |
|---|---|---|
| Indicates whether processing the document may cause other external resources to be fetched (including, for example, external entities, an external DTD, or documents referenced using xsi:schemaLocation or XInclude elements).
| |
true | The document may include references to other external resources. | |
false | The document must not include references to other external resources unless access to these resources has been explicitly enabled. | |
| Determines whether DTD validation takes place.
| |
true | The input is parsed using a validating XML parser. The input must contain a DOCTYPE declaration to identify the DTD to be used for validation. The DTD may be internal or external. | |
false | DTD validation does not take place. However, if a DOCTYPE declaration is present, then it is read, for example to perform entity expansion. | |
| Determines whether two calls on the doc function, with the same URI, the same options, and the same context, are guaranteed to return the same document node. The default value is true, but this may be overridden by implementation-defined configuration options.
| |
true | Given the same explicit and implicit arguments, multiple calls return the same document node: that is, the function is deterministic. | |
false | Multiple calls with the same explicit and implicit arguments may return the same document node or different document nodes at the discretion of the implementation. | |
| Determines whether whitespace-only text nodes are removed from the resulting document. The default is defined by the host language or by the implementation. (Note: in XSLT, the xsl:strip-space and xsl:preserve-space declarations provide detailed control based on the parent element name.)
| |
true | All whitespace-only text nodes are stripped, unless either (a) they are within the scope of the attribute xml:space="preserve", or (b) XSD validation identifies that the parent element has a simple type or a complex type with simple content. | |
false | All whitespace-only text nodes are preserved, unless either (a) DTD validation marks them as ignorable, or (b) XSD validation recognizes the containing element as having element-only or empty content. | |
| Determines whether any xi:include elements in the input are to be processed using an XInclude processor.
| |
true | Any xi:include elements are expanded. If there are xi:include elements and no XInclude processor is available then a dynamic error is raised. | |
false | Any xi:include elements are handled as ordinary elements without expansion. | |
| Determines whether XSD validation takes place, using the schema definitions present in the static context. The effect of requesting validation is the same as invoking the doc function without validation, and then applying an XQuery validate expression to the result, with corresponding options.
| |
strict | Strict XSD validation takes place | |
lax | Lax XSD validation takes place | |
skip | No XSD validation takes place | |
type Q{uri}local | XSD validation takes place against the schema-defined type, present in the static context, that has the given URI and local name. | |
| When XSD validation takes place, determines whether schema components referenced using xsi:schemaLocation or xsi:noNamespaceSchemaLocation attributes within the source document are to be used. The option is ignored if XSD validation does not take place.
| |
true | XSD validation uses the schema components referenced using xsi:schemaLocation or xsi:noNamespaceSchemaLocation attributes in addition to the schema components present in the static context; these components must be compatible as described in [XQuery and XPath Data Model (XDM) 4.0] section 4.1.2 Schema Consistency. | |
false | Any xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes in the document are ignored. | |
By default, this function is deterministic. Two calls on this function return the same document node if the same URI Reference (after resolution to an absolute URI Reference) is supplied to both calls. Thus, the following expression (if it does not raise an error) will always return true:
doc("foo.xml") is doc("foo.xml")Note:
This equivalence applies only because the two calls on the fn:doc function have the same options and the same static and dynamic context, to the extent this is relevant. If two calls on fn:doc have different dynamic contexts, then the mapping from URIs to document nodes in the two contexts may differ, which means that different document nodes may be returned for the same URI. This can happen, for example, if the two calls appear in different XSLT packages with different validation options or whitespace-stripping options; one call might produce a schema-validated document, the other an untyped document.
The requirement to deliver a deterministic result has performance implications, and for this reason implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call of the function must either return a deterministic result or must raise a dynamic error [err:FODC0003].
Note:
If the $source URI is obtained from a source document, it is generally appropriate to resolve it relative to the base URI property of the relevant node in the source document. This can be achieved by calling the fn:resolve-uri function, and passing the resulting absolute URI as an argument to the fn:doc function.
If two calls to this function supply different absolute URI References as arguments, the same document node may be returned if the implementation can determine that the two arguments refer to the same resource.
By defining the semantics of this function in terms of a string-to-document-node mapping in the dynamic context, the specification is acknowledging that the results of this function are outside the purview of the language specification itself, and depend entirely on the run-time environment in which the expression is evaluated. This run-time environment includes not only an unpredictable collection of resources (“the web”), but configurable machinery for locating resources and turning their contents into document nodes within the XPath data model. Both the set of resources that are reachable, and the mechanisms by which those resources are parsed and validated, are implementation-dependent.
One possible processing model for this function is as follows. The resource identified by the URI Reference is retrieved. If the resource cannot be retrieved, a dynamic error is raised [err:FODC0002]. The data resulting from the retrieval action is then parsed as an XML document and a tree is constructed in accordance with the [XQuery and XPath Data Model (XDM) 3.0]. If the top-level media type is known and is "text", the content is parsed in the same way as if the media type were text/xml; otherwise, it is parsed in the same way as if the media type were application/xml. If the contents cannot be parsed successfully, a dynamic error is raised [err:FODC0002]. Otherwise, the result of the function is the document node at the root of the resulting tree. This tree is then optionally validated against a schema.
Various aspects of this processing are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:
The set of URI schemes that the implementation recognizes is implementation-defined. Implementations may allow the mapping of URIs to resources to be configured by the user, using mechanisms such as catalogs or user-written URI handlers.
The handling of non-XML media types is implementation-defined. Implementations may allow instances of the data model to be constructed from non-XML resources, under user control.
It is implementation-defined whether DTD validation and/or schema validation is applied to the source document.
Implementations may provide user-defined error handling options that allow processing to continue following an error in retrieving a resource, or in parsing and validating its content. When errors have been handled in this way, the function may return either anthe empty sequence, or a fallback document provided by the error handler.
Implementations may provide user options that relax the requirement for the function to return deterministic results.
The effect of a fragment identifier in the supplied URI is implementation-defined. One possible interpretation is to treat the fragment identifier as an ID attribute value, and to return a document node having the element with the selected ID value as its only child.
A dynamic error may be raised [err:FODC0005] if $source is not a valid URI reference.
A dynamic error is raised [err:FODC0002] if a relative URI reference is supplied, and the base-URI property in the static context is absent.
A dynamic error is raised [err:FODC0002] if the available documents provides no mapping for the absolutized URI.
A dynamic error is raised [err:FODC0002] if the resource cannot be retrieved or cannot be parsed successfully as XML using the selected options.
A dynamic error is raised [err:FODC0003] if the implementation is not able to guarantee that the result of the function will be deterministic, and the user has not indicated that an unstable result is acceptable.
Changes in 4.0 (next | previous)
An $options parameter is added. Note that the rules for the $options parameter control aspects of processing that were implementation-defined in earlier versions of this specification. An implementation may provide configuration options designed to retain backwards-compatible behavior when no explicit options are supplied. [Issue 1021 PR 1910 6 April 2025]
The function returns true if and only if the function call fn:doc($source, $options) would return a document node.
fn:doc-available( | ||
$source | as , | |
$options | as | := {} |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on available documents, and executable base URI.
If $source is anthe empty sequence, this function returns false.
If a call on fn:doc($source, $options) would return a document node, this function returns true.
In all other cases this function returns false. This includes the case where an invalid URI is supplied, and also the case where a valid relative URI reference is supplied, and cannot be resolved, for example because the static base URI is absent.
The recognized values for $options are the same as for the fn:doc function. The option parameter conventions apply. Note that if the stable option is set to true, then a result of true from this function guarantees that a call on fn:doc with the same explicit and implicit arguments will succeed, whereas a result of false from this function guarantees that the corresponding call on fn:doc will fail. Conversely, if the stable option is set to false, then the result of this function provides no guarantees regarding the outcome of a call on fn:doc with the same explicit and implicit arguments.
Like any other function, doc-available fails with an error if invalid arguments are supplied: for example if the first argument is not a string, or if unrecognized options are included in $options. However, it returns false rather than raising an error if the first argument is invalid as a URI.
The function also returns false (rather than raising an error) if the document is unavailable because of processor limitations, for example if schema validation is requested and the processor is not schema-aware.
Returns a sequence of items identified by a collection URI; or a default collection if no URI is supplied.
fn:collection( | ||
$source | as | := () |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on available collections, and executable base URI.
This function takes an xs:string as argument and returns a sequence of items obtained by interpreting $source as an xs:anyURI and resolving it according to the mapping specified in available collections described in [XML Path Language (XPath) 4.0] section B.2 Dynamic Context Components.
If available collections provides a mapping from this string to a sequence of items, the function returns that sequence. If available collections maps the string to anthe empty sequence, then the function returns an empty sequence.
If $source is not specified, the function returns the sequence of items in the default collection in the dynamic context. See [XML Path Language (XPath) 4.0] section B.2 Dynamic Context Components.
If $source is a relative URI reference, it is resolved relative to the value of the executable base URIXP property from the dynamic context of the caller. The resulting absolute URI is promoted to an xs:string.
If $source is the empty sequence, the function behaves as if it had been called without an argument. See above.
By default, this function is deterministic. This means that repeated calls on the function with the same argument will return the same result. However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call to this function must either return a deterministic result or must raise a dynamic error [err:FODC0003].
There is no requirement that any nodes in the result should be in document order, nor is there a requirement that the result should contain no duplicates.
Note:
The ability to access external resources depends on whether the calling code is trustedXP.
A dynamic error is raised [err:FODC0002] if no URI is supplied and the value of the default collection is absentDM.
A dynamic error is raised [err:FODC0002] if a relative URI reference is supplied, and the base-URI property in the static context is absent.
A dynamic error is raised [err:FODC0002] if available node collections provides no mapping for the absolutized URI.
A dynamic error may be raised [err:FODC0004] if $source is not a valid xs:anyURI.
In earlier versions of this specification, the primary use for the fn:collection function was to retrieve a collection of XML documents, perhaps held as lexical XML in operating system filestore, or perhaps held in an XML database. In this release the concept has been generalised to allow other resources to be retrieved: for example JSON documents might be returned as arrays or maps, non-XML text files might be returned as strings, and binary files might be returned as instances of xs:base64Binary.
The abstract concept of a collection might be realized in different ways by different implementations, and the ways in which URIs map to collections can be equally variable. Specifying resources using URIs is useful because URIs are dynamic, can be parameterized, and do not rely on an external environment.
Returns a sequence of xs:anyURI values representing the URIs in a URI collection.
fn:uri-collection( | ||
$source | as | := () |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on available URI collections, and executable base URI.
The zero-argument form of the function returns the URIs in the default URI collection described in [XML Path Language (XPath) 4.0] section B.2 Dynamic Context Components.
If $source is a relative URI reference, it is resolved relative to the value of the executable base URIXP property from the dynamic context of the caller. The resulting absolute URI is promoted to an xs:string.
If $source is the empty sequence, the function behaves as if it had been called without an argument. See above.
The single-argument form of the function returns the sequence of URIs corresponding to the supplied URI in the available URI collections described in [XML Path Language (XPath) 4.0] section B.2 Dynamic Context Components.
By default, this function is deterministic. This means that repeated calls on the function with the same argument will return the same result. However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call to this function must either return a deterministic result or must raise a dynamic error [err:FODC0003].
There is no requirement that the URIs returned by this function should all be distinct, and no assumptions can be made about the order of URIs in the sequence, unless the implementation defines otherwise.
Note:
The ability to access external resources depends on whether the calling code is trustedXP.
A dynamic error is raised [err:FODC0002] if no URI is supplied (that is, if the function is called with no arguments, or with a single argument that evaluates to anthe empty sequence), and the value of the default resource collection is absentDM.
A dynamic error is raised [err:FODC0002] if a relative URI reference is supplied, and the base-URI property in the static context is absent.
A dynamic error is raised [err:FODC0002] if available resource collections provides no mapping for the absolutized URI.
A dynamic error may be raised [err:FODC0004] if $source is not a valid xs:anyURI.
In some implementations, there might be a close relationship between collections (as retrieved by the fn:collection function), and URI collections (as retrieved by this function). For example, a collection might return XML documents, and the corresponding URI collection might return the URIs of those documents. However, this specification does not impose such a close relationship. For example, there may be collection URIs accepted by one of the two functions and not by the other; a collection might contain items that do not have any URI; or a URI collection might contain URIs that cannot be dereferenced to return any resource.
In the case where fn:uri-collection returns the URIs of resources that could also be retrieved directly using fn:collection, there are several reasons why it might be appropriate to use this function in preference to the fn:collection function. For example:
It allows different URIs for different kinds of resource to be dereferenced in different ways: for example, the returned URIs might be referenced using the fn:unparsed-text function rather than the fn:doc function.
In XSLT 3.0 it allows the documents in a collection to be processed in streaming mode using the xsl:stream instruction.
It allows recovery from failures to read, parse, or validate individual documents, by calling the fn:doc (or other dereferencing) function within the scope of try/catch.
It allows selection of which documents to read based on their URI, for example they can be filtered to select those whose URIs end in .xml, or those that use the https scheme.
An application might choose to limit the number of URIs processed in a single run, for example it might process only the first 50 URIs in the collection; or it might present the URIs to the user and allow the user to select which of them need to be further processed.
It allows the URIs to be modified before they are dereferenced, for example by adding or removing query parameters, or by redirecting the request to a local cache or to a mirror site.
For some of these use cases, this assumes that the cost of calling fn:collection might be significant (for example, it might involving retrieving all the documents in the collection over the network and parsing them). This will not necessarily be true of all implementations.
Changes in 4.0 (next | previous)
The $options parameter has been added. [Issue 1116 PR 1117 21 May 2024]
It is no longer automatically an error if the resource (after decoding) contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. [Issue 414 PR 546 25 July 2023]
The specification now describes in more detail how to determine the effective encoding value. [Issue 2221 PR 2249 21 October 2025]
The fn:unparsed-text function reads an external resource (for example, a file) and returns a string representation of the resource.
fn:unparsed-text( | ||
$source | as , | |
$options | as | := () |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on executable base URI.
The $source argument must be a string in the form of a URI reference, which must contain no fragment identifier, and must identify a resource for which a string representation is available.
If $source is a relative URI reference, it is resolved relative to the value of the executable base URIXP property from the dynamic context of the caller. The resulting absolute URI is promoted to an xs:string.
The $options argument, for backwards compatibility reasons, may be supplied either as a map, or as a string. Supplying a value $S that is not a map is equivalent to supplying the map { "encoding": $S }. After that substitution, the option parameter conventions apply.
The entries that may appear in the $options map are as follows:
record( | |
encoding? | as xs:string?, |
normalize-newlines? | as xs:boolean |
) | |
| Key | Value | Meaning |
|---|---|---|
| Defines the encoding of the resource, as described below.
| |
| Determines whether CR and CRLF character sequences are treated as equivalent to NL characters.
| |
false | No normalization of line endings takes place. | |
true | The character U+000D (CARRIAGE RETURN) and the character pair (U+000D (CARRIAGE RETURN) , U+000A (NEWLINE) ) are converted to the single character U+000A (NEWLINE) . | |
The mapping of URIs to the string representation of a resource is the mapping defined in the [TERMDEF dt-available-text-resources IN XP40]available text resources component of the dynamic context.
The way in which an absolute URI is dereferenced to obtain an external resource is described in [XML Path Language (XPath) 4.0] section 2.3 External Resources and Security.
Note:
The ability to access external resources depends on whether the calling code is trustedXP.
If the $source argument is anthe empty sequence, the function returns anthe empty sequence.
The encoding option, if present and non-empty, is the name of an encoding. The values for this option follow the same rules as for the encoding attribute in an XML declaration. The values which an implementation is required to recognize are UTF-8, UTF-16, UTF-16LE, and UTF-16BE (in any upper/lower case combination).
The effective encoding candidate E is chosen as follows:
external encoding information if available; otherwise
the encoding recognized as specified in [Extensible Markup Language (XML) 1.0 (Fifth Edition)] if the media type of the resource is text/xml or application/xml (see [RFC 2376]), or if it matches the conventions text/*+xml or application/*+xml (see [RFC 7303] and/or its successors); otherwise
the value of the encoding option if present.; otherwise
the encoding inferred from the initial octets of the resource, or from implementation-defined heuristics as defined by the rules of the bin:infer-encoding function.
The effective encoding is determined as follows:
UTF-8 if E is UTF-8 or absent, and if the initial octets xEF, xBB and xBF can be consumed; otherwise
UTF-16LE if E is UTF-16, UTF-16LE or absent, and if the initial octets xFF and xFE can be consumed; otherwise
UTF-16BE if E is UTF-16, UTF-16BE or absent, and if the initial octets xFE and xFF can be consumed; otherwise
UTF-16BE if E is UTF-16; otherwise
the value of E if present; otherwise
UTF-8, or a value that results from implementation-defined heuristics.
Note:
Encoding names are compared without regard to case.
If a UTF encoding is determined, and if the input starts with a byte order mark, it is ignored.
If the input (as decoded using the effective encoding) starts with a byte order mark, then the byte order mark is not included in the result.
The result of the function is a string containing the string representation of the resource retrieved using the URI, decoded according to the effective encoding.
Note:
The ability to access external resources depends on whether the calling code is trustedXP.
A dynamic error is raised [err:FOUT1170] if the $source argument contains a fragment identifier, or if it cannot be resolved to an absolute URI (for example, because the base-URI property in the static context is absent), or if it cannot be used to retrieve the string representation of a resource.
A dynamic error is raised [err:FOUT1190] if the value of the encoding option is not a valid encoding name, if the processor does not support the specified encoding, if the string representation of the retrieved resource contains octets that cannot be decoded into Unicode characters using the specified encoding, or if any resulting character is not a permitted character.
A dynamic error is raised [err:FOUT1200] if the encoding option is absent and the processor cannot infer the encoding using external information and the actual encoding is not UTF-8.
If it is appropriate to use a base URI other than the executable base URIXP (for example, when resolving a relative URI reference read from a source document) then it is advisable to resolve the relative URI reference using the fn:resolve-uri function before passing it to the fn:unparsed-text function.
There is no essential relationship between the sets of URIs accepted by the two functions fn:unparsed-text and fn:doc (a URI accepted by one may or may not be accepted by the other), and if a URI is accepted by both there is no essential relationship between the results (different resource representations are permitted by the architecture of the web).
There are no constraints on the MIME type of the resource.
The fact that the resolution of URIs is defined by a mapping in the dynamic context means that in effect, various aspects of the behavior of this function are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:
The set of URI schemes that the implementation recognizes is implementation-defined. Implementations may allow the mapping of URIs to resources to be configured by the user, using mechanisms such as catalogs or user-written URI handlers.
The handling of media types is implementation-defined.
Implementations may provide user options that relax the requirement for the function to return deterministic results.
Implementations may provide user-defined error handling options that allow processing to continue following an error in retrieving a resource, or in reading its content. When errors have been handled in this way, the function may return a fallback document provided by the error handler.
The rules for determining the encoding are chosen for consistency with [XML Inclusions (XInclude) Version 1.0 (Second Edition)]. Files with an XML media type are treated specially because there are use cases for this function where the retrieved text is to be included as unparsed XML within a CDATA section of a containing document, and because processors are likely to be able to reuse the code that performs encoding detection for XML external entities.
If the text file contains characters such as < and &, these will typically be output as < and & if the string is serialized as XML or HTML. If these characters actually represent markup (for example, if the text file contains HTML), then an XSLT stylesheet can attempt to write them as markup to the output file using the disable-output-escaping attribute of the xsl:value-of instruction. Note, however, that XSLT implementations are not required to support this feature.
This XSLT example attempts to read a file containing “boilerplate” HTML and copy it directly to the serialized output file: | |
<xsl:output method="html"/>
<xsl:template match="/">
<xsl:value-of select="unparsed-text('header.html', 'iso-8859-1')"
disable-output-escaping="yes"/>
<xsl:apply-templates/>
<xsl:value-of select="unparsed-text('footer.html', 'iso-8859-1')"
disable-output-escaping="yes"/>
</xsl:template> |
Allows an application to determine whether a call on fn:unparsed-text with particular arguments would succeed.
fn:unparsed-text-available( | ||
$source | as , | |
$options | as | := () |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on executable base URI.
The fn:unparsed-text-available function determines whether a call on the fn:unparsed-text function with identical arguments would return a string.
If the first argument is anthe empty sequence, the function returns false.
In other cases, the function returns true if a call on fn:unparsed-text or fn:unparsed-text-lines with the same arguments would succeed, and false if a call on fn:unparsed-text or fn:unparsed-text-lines with the same arguments would fail with a non-recoverable dynamic error.
The functions fn:unparsed-text and fn:unparsed-text-available have the same requirement for determinism as the functions fn:doc and fn:doc-available. This means that unless the user has explicitly stated a requirement for a reduced level of determinism, either of these functions if called twice with the same arguments during the course of a transformation must return the same results each time; moreover, the results of a call on fn:unparsed-text-availablemust be consistent with the results of a subsequent call on unparsed-text with the same arguments.
Note:
The ability to access external resources depends on whether the calling code is trustedXP.
This function was introduced before XQuery and XSLT allowed errors to be caught; with current versions of these host languages, catching an error from fn:unparsed-text may provide a better alternative.
The specification requires that the fn:unparsed-text-available function should actually attempt to read the resource identified by the URI, and check that it is correctly encoded and contains no characters that are invalid in XML. Implementations may avoid the cost of repeating these checks for example by caching the validated contents of the resource, to anticipate a subsequent call on the fn:unparsed-text or fn:unparsed-text-lines function. Alternatively, implementations may be able to rewrite an expression such as if (unparsed-text-available(A)) then unparsed-text(A) else ... to generate a single call internally.
Since the function fn:unparsed-text-lines succeeds or fails under exactly the same circumstances as fn:unparsed-text, the fn:unparsed-text-available function may equally be used to test whether a call on fn:unparsed-text-lines would succeed.
The fn:unparsed-binary function reads an external resource (for example, a file) and returns its contents in binary.
fn:unparsed-binary( | ||
$source | as | |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on executable base URI.
The $source argument must be a string in the form of a URI reference, which must contain no fragment identifier, and must identify a resource for which a binary representation is available.
If $source is a relative URI reference, it is resolved relative to the value of the executable base URIXP property from the dynamic context of the caller. The resulting absolute URI is promoted to an xs:string.
The mapping of URIs to the binary representation of a resource is the mapping defined in the [TERMDEF dt-available-binary-resources IN XP40] component of the dynamic context.
If the $source argument is anthe empty sequence, the function returns anthe empty sequence.
The result of the function is an atomic item of type xs:base64Binary containing the binary representation of the resource retrieved using the URI.
Note:
The ability to access external resources depends on whether the calling code is trustedXP.
A dynamic error is raised [err:FOUT1170] if the $source argument contains a fragment identifier, or if it cannot be resolved to an absolute URI (for example, because the base-URI property in the static context is absent), or if it cannot be used to retrieve the binary representation of a resource.
If it is appropriate to use a base URI other than the executable base URIXP (for example, when resolving a relative URI reference read from a source document) then it is advisable to resolve the relative URI reference using the fn:resolve-uri function before passing it to the fn:unparsed-text function.
There is no essential relationship between the sets of URIs accepted by the function fn:unparsed-binary and other functions such as fn:doc and fn:unparsed-text (a URI accepted by one may or may not be accepted by the others), and if a URI is accepted by more than one of these functions then there is no essential relationship between the results (different resource representations are permitted by the architecture of the web).
There are no constraints on the MIME type of the resource.
The fact that the resolution of URIs is defined by a mapping in the dynamic context means that in effect, various aspects of the behavior of this function are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:
The set of URI schemes that the implementation recognizes is implementation-defined. Implementations may allow the mapping of URIs to resources to be configured by the user, using mechanisms such as catalogs or user-written URI handlers.
The handling of media types is implementation-defined.
Implementations may provide user options that relax the requirement for the function to return deterministic results.
Implementations may provide user-defined error handling options that allow processing to continue following an error in retrieving a resource, or in reading its content. When errors have been handled in this way, the function may return a fallback document provided by the error handler.
There is no function (analogous to fn:doc-available or fn:unparsed-text-available) to determine whether a suitable resource is available. In XQuery and XSLT, try/catch constructs are available to catch the error.
The choice of xs:base64Binary rather than xs:hexBinary for the result is arbitrary. The two types have the same value space and are interchangeable for nearly all purposes, the notable exception being conversion to xs:string.
A comprehensive set of functions for manipulating binary data is available in the EXPath binary module: see [EXPath]. In addition, the EXPath file module provides a function file:read-binary with similar functionality to fn:unparsed-binary, the notable differences being (a) that it takes a file name rather than a URI, and (b) that it is defined to be nondeterministic.
The following XQuery, adapted from an example in the EXPath binary module [EXPath], reads a JPEG image and determines its size in pixels: | |
declare namespace bin = "http://expath.org/ns/binary";
let $content := fn:unparsed-binary("image.jpeg")
let $int16-at := bin:unpack-unsigned-integer(
$content, ?, 2, 'most-significant-first')
let $loc := bin:find($content, 0, bin:hex('FFC0'))
return { "width": $int16-at($loc + 5),
"height": $int16-at($loc + 7) } | |
The example assumes that the functions in the EXPath binary module are available. |
These functions convert between the lexical representation of XML and the tree representation.
(The fn:serialize function also handles HTML and JSON output, but is included in this section for editorial convenience.)
| Function | Meaning |
|---|---|
fn:parse-xml | This function takes as input an XML document, and returns the document node at the root of an XDM tree representing the parsed document. |
fn:parse-xml-fragment | This function takes as input an XML external entity represented as a string, and returns the document node at the root of an XDM tree representing the parsed document fragment. |
fn:serialize | This function serializes the supplied input sequence $input as described in [XSLT and XQuery Serialization 3.1], returning the serialized representation of the sequence as a string. |
fn:xsd-validator | Given an XSD schema, delivers a function item that can be invoked to validate a document, element, or attribute node against this schema. |
This function serializes the supplied input sequence $input as described in [XSLT and XQuery Serialization 3.1], returning the serialized representation of the sequence as a string.
fn:serialize( | ||
$input | as , | |
$options | as | := () |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The value of the first argument $input acts as the input sequence to the serialization process, which starts with sequence normalization.
The second argument $options, if present, provides serialization parameters. These may be supplied in either of two forms:
As an output:serialization-parameters element, having the format described in 3.1 Setting Serialization Parameters by Means of a Data Model Instance SER31. In this case the type of the supplied argument must match the required type element(output:serialization-parameters).
As a map. In this case the type of the supplied argument must match the required type map(*)
The single-argument version of this function has the same effect as the two-argument version called with $options set to anthe empty sequence. This in turn is the same as the effect of passing an output:serialization-parameters element with no child elements.
The final stage of serialization, that is, encoding, is skipped. If the serializer does not allow this phase to be skipped, then the sequence of octets returned by the serializer is decoded into a string by reversing the character encoding performed in the final stage.
If the second argument is omitted, or is supplied in the form of an output:serialization-parameters element, then the values of any serialization parameters that are not explicitly specified is implementation-defined, and may depend on the context.
If the second argument is supplied as a map, then the option parameter conventions apply. In this case:
Each entry in the map defines one serialization parameter.
The key of the entry is an xs:string value in the cases of parameter names defined in these specifications, or an xs:QName (with non-absent namespace) in the case of implementation-defined serialization parameters.
The required type of each parameter, and its default value, are defined by the following table. The default value is used when the map contains no entry for the parameter in question, and also when an entry is present, with the empty sequence as its value. The table also indicates how the value of the map entry is to be interpreted in cases where further explanation is needed.
| Parameter | Required type | Interpretation | Default Value |
|---|---|---|---|
allow-duplicate-names | xs:boolean? | true means "yes", false means "no" | no |
byte-order-mark | xs:boolean? | true means "yes", false means "no" | no |
canonical | xs:boolean? | true() means "yes", false() means "no" | no |
cdata-section-elements | xs:QName* | () | |
doctype-public | xs:string? | Zero-length string and () both represent "absent" | absent |
doctype-system | xs:string? | Zero-length string and () both represent "absent" | absent |
encoding | xs:string? | UTF-8 | |
escape-solidus | xs:boolean? | true means "yes", false means "no" | yes |
escape-uri-attributes | xs:boolean? | true means "yes", false means "no" | yes |
html-version | xs:decimal? | 5 | |
include-content-type | xs:boolean? | true means "yes", false means "no" | yes |
indent | xs:boolean? | true means "yes", false means "no" | no |
item-delimiter | xs:string? | absent | |
json-lines | xs:boolean? | true means "yes", false means "no" | no |
json-node-output-method | (xs:string | xs:QName)? | See Notes 1, 2 | xml |
media-type | xs:string? | (a media type suitable for the chosen method) | |
method | (xs:string | xs:QName)? | See Notes 1, 2 | xml |
normalization-form | xs:string? | none | |
omit-xml-declaration | xs:boolean? | true means "yes", false means "no" | yes |
standalone | xs:boolean? | true means "yes", false means "no", () means "omit" | omit |
suppress-indentation | xs:QName* | () | |
undeclare-prefixes | xs:boolean? | true means "yes", false means "no" | no |
use-character-maps | map(xs:string, xs:string)? | See Note 3 | {} |
version | xs:string? | 1.0 |
Notes to the table:
The notation (A | B) represents a union type whose member types are A and B.
If an xs:QName is supplied for the method or json-node-output-method options, then it must have a non-absent namespace URI. This means that system-defined serialization methods such as xml and json are defined as strings, not as xs:QName values.
For the use-character-maps option, the value is a map, whose keys are the characters to be mapped (as xs:string instances), and whose corresponding values are the strings to be substituted for these characters.
A type error [err:XPTY0004]XP occurs if the $options argument is present and does not match either of the types element(output:serialization-parameters)? or map(*).
Note:
This is defined as a type error so that it can be enforced via the function signature by implementations that generalize the type system in a suitable way.
If the host language makes serialization an optional feature and the implementation does not support serialization, then a dynamic error [err:FODC0010] is raised.
When the second argument is supplied as a map, and the supplied value is of the wrong type for the particular parameter, for example if the value of indent is a string rather than a boolean, then as defined by the option parameter conventions, a type error [err:XPTY0004]XP is raised. If the value is of the correct type, but does not satisfy the rules for that parameter defined in [XSLT and XQuery Serialization 3.1], then a dynamic error [err:SEPM0016]SER31 is raised. (For example, this occurs if the map supplied to use-character-maps includes a key that is a string whose length is not one (1)).
If any serialization error occurs, including the detection of an invalid value for a serialization parameter as described above, this results in the fn:serialize call failing with a dynamic error.
One use case for this function arises when there is a need to construct an XML document containing nested XML documents within a CDATA section (or on occasions within a comment). See fn:parse-xml for further details.
Another use case arises when there is a need to call an extension function that expects a lexical XML document as input.
Another use case for this function is serializing instances of the data model into a human readable format for the purposes of debugging. Using the 10 Adaptive Output Method SER31 by specifying it as the output method defined in the second argument via output:serialization-parameters, allows for serializing any valid XDM instance without raising a serialization error.
There are also use cases where the application wants to post-process the output of a query or transformation, for example by adding an internal DTD subset, or by inserting proprietary markup delimiters such as the <% ... %> used by some templating languages.
The ability to specify the serialization parameters in an output:serialization-parameters element provides backwards compatibility with the 3.0 version of this specification; the ability to use a map takes advantage of new features in the 3.1 version. The default parameter values are implementation-defined when an output:serialization-parameters element is used (or when the argument is omitted), but are fixed by this specification in the case where a map (including anthe empty map) is supplied for the argument.
| Variables | |
|---|---|
let $params := <output:serialization-parameters
xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization">
<output:omit-xml-declaration value="yes"/>
</output:serialization-parameters> | |
let $data := <a b="3"/> | |
Given the variables: | |
The following call might produce the output shown: | |
| Expression: |
|
|---|---|
| Result: |
|
The following call would also produce the output shown (though the second argument could equally well be supplied as anthe empty map ( | |
| Expression: | serialize(
$data,
{ "method": "xml", "omit-xml-declaration": true() }
) |
| Result: | '<a b="3"/>' |
| Expression: |
|
| Result: |
|
| Expression: | serialize(
array { "a", 3, attribute test { "true" } },
{ "method": "adaptive"
}) |
| Result: | '["a",3,test="true"]' |
Given an XSD schema, delivers a function item that can be invoked to validate a document, element, or attribute node against this schema.
fn:xsd-validator( | ||
$options | as | := {} |
) as | ||
This function is deterministic, context-dependent, and focus-independent.
The fn:xsd-validator function returns a function item that can be used to validate a document node, element node, or attribute node with respect to a supplied schema.
The details of how the schema is assembled, and the way it is used, are defined by the supplied $options. If the $options argument is absent or empty the effect is to use the schema components from the static context of the call on fn:xsd-validator. In the general case, however, the schema used for validation may include components from any or all of the following:
The static context of the function call
Explicitly supplied schema documents
Schema components referenced in xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes within the instance document being validated.
More details of schema assembly appear below. Taken together, the assembled components must constitute a valid schema.
The function is designed to separate the process of assembling a schema from the process of performing instance validation. However, if the schema is to include components identified in xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes, then the process of assembling the schema cannot be completed until the instance document is available.
The options recognized are as follows. The option parameter conventions apply.
record( | |
trusted? | as xs:boolean, |
use-imported-schema? | as xs:boolean, |
schema? | as element(xs:schema)*, |
target-namespace? | as xs:anyURI*, |
schema-location? | as xs:anyURI*, |
use-xsi-schema-location? | as xs:boolean, |
xsd-version? | as xs:decimal, |
validation-mode? | as xs:string, |
type? | as xs:QName?, |
return-typed-node? | as xs:boolean, |
return-error-details? | as xs:boolean |
) | |
| Key | Value | Meaning |
|---|---|---|
| Indicates whether the validation process may cause external resources to be fetched (including, for example, documents referenced using the schema-location property, or xsi:schemaLocation attributes within the document being validated).
| |
true | The validation process may retrieve external resources. | |
false | The validation process must not retrieve any external resources unless access to these resources has been explicitly enabled. | |
| If true, the schema to be used for validation includes the schema components available in the static context of the function call. If false, these components are not used.
| |
| A list of XDM nodes containing XSD schema documents to be used for validation.
| |
| A list of target namespaces identifying schema components to be used for validation. The way in which the processor locates schema components for the specified target namespaces is implementation-defined. A zero-length string denotes a no-namespace schema.
| |
| A list of locations of XSD schema documents to be used to assemble a schema. Any relative URIs are resolved relative to the base URI of the function call. Access to the schema documents at these locations is allowed regardless of the value of the trusted option; access to indirectly referenced schema documents (for example, using xs:include is allowed only if the trusted option is set to true.
| |
| If true, the schema to be used for validation includes any schema documents referenced by xsi:schemaLocation or xsi:noNamespaceSchemaLocation attributes in the instance document being validated. If false, these attributes are ignored.
| |
| Set to the decimal value 1.0 or 1.1 to indicate which version of XSD is to be used. The default is implementation-defined. A processor may use a later version of XSD than the version requested, but must not use an earlier version.
| |
| The validation mode.
| |
strict | Validates the input using the element or attribute declaration for the operand node. This element or attribute declaration must exist. This is the default when the type option is absent. | |
lax | Validates the input using the element or attribute declaration for the operand node, if it exists. | |
by-type | Validates the input using the supplied governing type. This is the default when the type option is present. | |
| Establishes the governing type for validation. The type must be present in the assembled schema.
| |
| If true, the result of the generated validation function, when validation is successful, includes the property typed-node which contains a copy of the target node augmented with type annotations and expanded default values. If false, the typed node is not included in the result. If a node containing type annotations is to be returned, then the schema used for validation must be compatible with all other schemas used within the same query or stylesheet, as described in [XQuery and XPath Data Model (XDM) 4.0] section 4.1.2 Schema Consistency; this is to ensure that the type annotations in the validated document have a consistent interpretation.
| |
| If true, the result of the generated validation function, when validation is unsuccessful, includes detailed information about the nature of the validity errors that were found. If false, the result only includes an indication that the document was invalid. Note that setting the value to false means that validation can complete as soon as the first error is found.
| |
The first task of the function is to assemble a schema (that is, a collection of schema components). Schema components can come from a number of sources, and a schema can be assembled from more than one source, provided that the total collection of components comprises a valid schema: the main thing that will prevent this is if two sources contain conflicting definitions of the same named component.
The default is to use the in-scope schema components from the static context of the function call.
Instead, or in addition, schema components may be loaded explictly for this validator. Supplementary schema components may be requested in a number of ways:
The schema-location option can specify one or more URIs that are interpreted as locations for source XSD schema documents, which are then assembled into a schema as described in the XSD specifications.
The schema option can be used to identify one or more xs:schema element nodes holding source schema documents. This allows a schema to be constructed dynamically by the application, or to be held as a global variable in the source code of a query or stylesheet module.
The target-namespace option can be used to supply the target namespaces of additional schema components that are known to the system or that are made available using some external mechanism. For example, the system might have built-in schemas for common namespaces such as the xml, fn, or xlink namespaces, or it might have a mechanism allowing schemas for a particular namespace to be registered using an external API or configuration mechanism.
The use-xsi-schema-location also allows the application to request that schema documents referenced from xsi:schemaLocation or xsi:noNamespaceSchemaLocation attributes should be included in the schema. By default these attributes are ignored.
It is acceptable to assemble a schema from more than one of these sources. In addition, any of these sources can bring in additional components by the use of the XSD directives xsl:include and xsl:import. The important constraint is that the result should be a valid schema. This will only be the case if the sources used to assemble the schema are compatibleDM with each other: see [XQuery and XPath Data Model (XDM) 4.0] section 4.1.2 Schema Consistency.
The XSD specification allows a schema to be used for validation even when it contains unresolved references to absent schema components. It is implementation-defined whether this function allows the schema to be incomplete in this way. For example, some processors might allow validation using a schema in which an element declaration contains a reference to a type declaration that is not present in the schema, provided that the element declaration is never needed in the course of a particular validation episode.
Having assembled a schema, the next task is to validate a supplied node (and the subtree rooted at that node).
Note:
This description is a deliberate simplification. If the use-xsi-schema-location option is true, then assembly of the schema is not completed until the instance document is available, and in practice overlaps with the validation process.
The xsd-validator function returns a function item (call it V) with the following characteristics:
V has an arity of one. Call the value of the supplied argument $target. The required type of $target is (document-node(*) | element() | attribute())?: that is, it accepts either a well-formed document node, or an element node, or an attribute node, or anthe empty sequence.
If the argument is anthe empty sequence then the result of V is also anthe empty sequence.
In other cases, the result of a call on V is a record containing the following fields:
is-valid as xs:boolean. This field is always present, and indicates whether the supplied $target node was found to be valid against the schema. The value is true if either (a) the validation outcome was valid, or (b) lax validation was requested and the validation outcome was notKnown. In other cases it is false.
typed-node as (document-node(*) | element() | attribute()). This field is present only when (a) the option return-typed-node was set (explicitly or implicitly) to true, and (b) the value of the is-valid field is true. It represents the root of a tree that is a deep copy of the input tree, augmented with type annotations and default values.
error-details as map(*)*. This field is present only when (a) the option return-error-details was set to true, and (b) the supplied document was found to be invalid. The value is a sequence of maps, each containing details of one invalidity that was found. The precise details of the invalidities are implementation-defined, but they may include the following fields, if the information is available:
message. A string containing the text of an error message, intended for a human reader.
rule. A reference to the rule in the XSD specification that was violated. This is a string comprising four parts separated by the character U+007C (VERTICAL BAR, |) :
"1.0" or "1.1" indicating whether the reference is to the XSD 1.0 or 1.1 specification.
"1" or "2" indicating whether the reference is to part 1 or part 2 of the specification.
The name of the validation rule (for example "Datatype Valid").
The clause number within that validation rule (for example "2.3").
For example, if an attribute is declared to be of type xs:integer, but the actual value is not in the lexical space of xs:integer, the value of rule might be "1.1|2|Datatype Valid|2.1".
node. The node that was found to be invalid. Note that when a containing element C is invalid because a child element D is not allowed by its content model, the invalid node is C, not D.
error-node. The node whose presence led to detection of the invalidity. In the above example, this would be D.
error-uri. The URI of the XML entity in which the error was detected.
line-number. The line number where the error was detected, within its external entity.
column-number. The column number where the error was detected, within the error line number.
The validation is performed as described in 17.2.4 XSD validation, with the assembled schema as the effective schema and $target as the operand node.
If the use-xsi-schema-location option is true and a failure occurs processing an xsi:schemaLocation or xsi:noNamespaceSchemaLocation attribute (for example, because a schema document cannot be retrieved, or because the referenced schema document is invalid, or because it is incompatible with other schema components) this is treated as an invalidity, not as a dynamic error: V returns successfully with is-valid set to false.
The function V may fail with a dynamic error if it is not possible to determine whether or not the instance document is valid. This may happen, for example, if processor-defined limits are exceeded.
A dynamic error is raised [err:FODC0009] if the processor is not schema-aware, or if no schema processor with the required capabilities (such as XSD 1.1 support) is available.
A dynamic error is raised [err:FODC0015] if it is not possible to assemble a valid and consistent schema.
Both XQuery and XSLT provide capabilities for XSD-based schema validation in earlier versions of the specifications, and those are retained in 4.0. This function provides additional capability:
It is possible to control validation more precisely, through a wider range of options;
It is possible to validate different instance documents against different schemas;
Information about any invalidities is made available to the application, rather than simply causing a dynamic error;
The capability is provided by means of a function rather than custom syntax, making it easier to integrate into an application.
The capability is available through XPath alone, and therefore with host languages other than XQuery and XSLT.
Three possible ways of using the function include:
To simply test whether or not a document is valid against a schema, set the options return-typed-node and return-error-details to false, and simply test the value of the is-valid field returned when the validation function is called.
To obtain a typed XDM tree from an input document that is expected to be valid, set the option return-typed-node to true. On return from the validation function, test the value of the is-valid field; call fn:error if the value is false; otherwise use the typed-node property of the result. The main benefit of using a typed XDM tree is that it allows static type checking of path expressions: this benefit only applies when the schema used for validation is the imported schema used in the static context. However, there are cases where validation against a different schema is appropriate, for example when validating the result of one query or transformation that is to be used as input to another.
To validate an input document and provide feedback to the document author about any validity problems that were found, set return-error-details to true. If the result of the validation function has is-valid = false(), process the returned error-details. The information available for this part of the processing may not be 100% interoperable, though with care it should be possible to write the query in such a way that it works with different processors.
The function has no effect on the static context. Schemas loaded using this function, either directly or via the effect of xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes, are not added to the static context and have no effect on any other validation episodes. A processor may cache schema components to reduce the cost of processing the same schema repeatedly, but this has no observable effect other than on performance.
| Expression: | let $schema :=
<xs:schema>
<xs:element name="distance" type="xs:decimal"/>
</xs:schema>
let $validator := xsd-validator({'schema': $schema})
return ($validator(<distance>8.5</distance>)?is-valid,
$validator(<distance>8.5km</distance>)?is-valid) |
|---|---|
| Result: | true(), false() |
| Expression: | let $schema :=
<xs:schema>
<xs:element name="distance" type="xs:decimal"/>
</xs:schema>
let $validator := xsd-validator({'schema': $schema})
let $typed-result := $validator(<distance>8.5</distance>)?typed-node
return $typed-result instance of element(distance, xs:decimal) |
| Result: | true() |
This function converts between the lexical representation of HTML and the XDM tree representation.
| Function | Meaning |
|---|---|
fn:parse-html | This function takes as input an HTML document, and returns the document node at the root of an XDM tree representing the parsed document. |
fn:html-doc | Reads an external resource containing HTML, and returns the result of parsing the resource as HTML. |
The fn:parse-html function conceptually works in two phases:
The lexical HTML (supplied as a string) is parsed into an HTML DOM as defined by the HTML5 specification: see [HTML: Living Standard] and [DOM: Living Standard].
The resulting DOM is converted to an XDM tree as described in this section. This is described by defining the actions of the accessor functions defined in [XQuery and XPath Data Model (XDM) 4.0] section 7.6 Accessors.
Note:
Because the [DOM: Living Standard] and [HTML: Living Standard] are not fixed, it is implementation-defined which versions are used.
Note:
An implementation must match the semantics of the mapping described in this section, but the specific way it achieves that is implementation-dependent.
Some possible implementation strategies are:
Parse the HTML to an HTML DOM and then convert the HTML DOM to an XDM node tree.
Parse the HTML to an HTML DOM and then implement a wrapper or facade that presents an XDM interface to the HTML DOM.
Parse the lexical HTML directly to an XDM node tree, bypassing the HTML DOM.
The [DOM: Living Standard] defines parsing algorithms for two different formats, which it refers to as the HTML and XML serializations (or concrete syntaxes). The XML serialization is an XML document which typically uses the namespace http://www.w3.org/1999/xhtml and the content type application/xhtml+xml, and is popularly referred to as XHTML. The HTML parsing algorithm constructs an HTML DOM HTMLDocument document object for the HTML document. The XHTML parsing algorithm constructs an HTML DOM XMLDocument object for the HTML document, following XML parsing rules. This mapping supports both of these document types.
The [DOM: Living Standard] specification defines HTML DOM nodes that are mapped to XDM nodes as follows:
The HTML DOM Document interface maps to [XQuery and XPath Data Model (XDM) 4.0] section 7.5.1 Document nodes.
The HTML DOM Element interface maps to [XQuery and XPath Data Model (XDM) 4.0] section 7.5.2 Element nodes. But see below for the mapping of an HTML template element.
The HTML DOM Attr interface maps to [XQuery and XPath Data Model (XDM) 4.0] section 7.5.3 Attribute nodes.
Note:
Any HTML DOM Attr instances in an HTML DOM HTMLDocument that represent namespace declarations will have been filtered out: see 17.3.1.1 attributes Accessor.
The HTML DOM ProcessingInstruction interface maps to [XQuery and XPath Data Model (XDM) 4.0] section 7.5.5 Processing instruction nodes.
Note:
The HTML parsing algorithm does not generate processing instruction nodes. If encountered they are parsed as comment nodes. The HTML DOM ProcessingInstruction interface is relevant only when the XHTML parsing algorithm is used.
The HTML DOM Comment interface maps to [XQuery and XPath Data Model (XDM) 4.0] section 7.5.6 Comment nodes.
The HTML DOM Text interface maps to [XQuery and XPath Data Model (XDM) 4.0] section 7.5.7 Text nodes. Adjacent HTML DOM Text nodes are combined into a single [XQuery and XPath Data Model (XDM) 4.0] section 7.5.7 Text nodes.
Note:
The HTML DOM CDATASection interface is an instance of HTML DOM Text, so CDATA sections also map to [XQuery and XPath Data Model (XDM) 4.0] section 7.5.7 Text nodes.
The use of CDATA sections can result in the HTML DOM containing adjacent text nodes, which the mapping to XDM will merge into a single node.
An HTML template element is mapped to an XDM template element with children corresponding to the children of the HTML DOM DocumentFragment that is the value of the template contents property of the HTML DOM template element.
Note:
Given source HTML such as <template><p>Lorem ipsum</p></template>, the HTML DOM represents the element <p>Lorem ipsum</p> not as a child of the template element, but as the child of a free-standing document fragment which is accessible (in the DOM API) as the value of the template.content property of the element node. The XDM representation produced by the parse-html does not follow this convention: instead, the element <p>Lorem ipsum</p> appears as an ordinary child node of the template element.
Note:
The HTML DOM DocumentFragment interface is not supported as an XML node. There are two places in the HTML DOM where this is used:
The HTML DOM ShadowRoot interface is not present in the main HTML DOM tree. It is only accessible via JavaScript.
The template element’s content property contains the child nodes of the template element. The behaviour of this is described above.
If an implementation allows these nodes to be passed in via an API or similar mechanism, their behaviour is implementation-defined.
The result of the [XQuery and XPath Data Model (XDM) 4.0] section 7.6.1 attributes Accessordm:attributes($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Element then the result is the value of the Element.attributes property mapped to a sequence as described below;
Otherwise, the result is anthe empty sequence.
An HTML DOM NamedNodeMap is mapped to a sequence as follows:
NamedNodeMap.length is the length of the sequence, where a length of 0 results in anthe empty sequence;
NamedNodeMap.item(n) is the nth element of the sequence.
That sequence is then filtered as follows:
If the Attr.namespaceURI property is "http://www.w3.org/2000/xmlns/", the attribute is not included in this sequence;
If the Attr.localName property is "xmlns", the attribute is not included in this sequence;
If the Attr.localName property starts with "xmlns:", the attribute is not included in this sequence;
Otherwise, the attribute is included in this sequence using the XDM mapping rules described in this section.
Note:
The HTML DOM Element.attributes property includes namespace and non-namespace attributes in the list when the HTML or XML parser is used. As such, the namespace attributes have to be filtered from the resulting XDM attribute sequence.
Note:
When the resulting document is an HTML DOM HTMLDocument, the Attr.localName and Attr.name properties of HTML DOM Attr nodes are both set to the qualified name. This includes namespace declarations which are filtered out by the logic in this section.
Note:
The Attr.localName property will be ASCII lowercase. The [HTML: Living Standard] section 13.2.5.33, Attribute name state specifies that ASCII upper alpha characters are appended to the attribute’s name in lowercase.
The result of the [XQuery and XPath Data Model (XDM) 4.0] section 7.6.2 base-uri Accessordm:base-uri($node) for an HTML DOM Node is the value of the Node.baseURI property mapped as follows:
If the value is null or an emptythe zero-length string, then the result is anthe empty sequence;
Otherwise, the string value is cast to an xs:anyURI.
The result of the [XQuery and XPath Data Model (XDM) 4.0] section 7.6.3 children Accessordm:children($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Document then the result is the value of the Node.childNodes property mapped to a sequence;
If the node is an instance of HTML DOM HTMLTemplateElement then the result is the HTML DOM DocumentFragment’s Node.childNodes property, mapped to a sequence;
If the node is an instance of HTML DOM Element then the result the value of the Node.childNodes property mapped to a sequence;
Otherwise, the result is anthe empty sequence.
An HTML DOM NodeList is mapped to a sequence as follows:
NodeList.length is the length of the sequence, where a length of 0 results in anthe empty sequence;
NodeList.item(n) is the nth element of the sequence.
That sequence is then filtered as follows:
If the child is an instance of HTML DOM DocumentType, that child is not included in this sequence;
A sequence of consecutive HTML DOM Text nodes is combined into a single XDM text node;
Otherwise, the HTML DOM Node nodes are mapped to XDM according to the rules in this section.
The result of the [XQuery and XPath Data Model (XDM) 4.0] section 7.6.4 document-uri Accessordm:document-uri($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Document then the value of the Document.documentURI property mapped as follows:
If the value is null or an emptythe zero-length string, then the result is anthe empty sequence;
Otherwise, the string value is cast to an xs:anyURI.
Otherwise, the result is anthe empty sequence.
The result of the [XQuery and XPath Data Model (XDM) 4.0] section 7.6.6 is-idrefs Accessordm:is-idrefs($node) for an HTML DOM Node is anthe empty sequence.
The result of the [XQuery and XPath Data Model (XDM) 4.0] section 7.6.10 node-name Accessordm:node-name($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Element then the result is determined as follows:
The local name is the value of the Element.localName property. This is derived as follows:
The local name is initially set to the ASCII lowercase tag name. The [HTML: Living Standard] section 13.2.5.8, Tag name state specifies that ASCII upper alpha characters are appended to the element’s name in lowercase.
If the local name is an SVG element name, the case-sensitive name is used. [HTML: Living Standard] section 13.2.6.5, The rules for parsing tokens in foreign content has a table mapping the lowercase element names to their SVG names.
If the local name contains a character that is not a valid XML NameStartChar or NameChar, then an implementation-defined replacement string is used. The result must be a valid NCName.
Note:
[HTML: Living Standard] section 13.2.9 Coercing an HTML DOM into an infoset uses a Unnnnnn escape sequence. That would map : to U00003A.
This local name escaping applies only to the HTML parsing algorithm. If the XHTML parsing algorithm is used, the localName and prefix will be correctly set for QName-based node names.
The namespace prefix is the value of the Element.prefix property, or empty if the value is null;
The namespace URI is the value of the Element.namespaceURI property, or empty if the value is null.
If the element is an HTML element, the namespace URI is "http://www.w3.org/1999/xhtml".
If the element is an SVG element, the namespace URI is "http://www.w3.org/2000/svg".
If the element is a MathML element, the namespace URI is "http://www.w3.org/1998/Math/MathML".
If the node is an instance of HTML DOM Attr then the result is determined as follows:
The attribute name is the tokenized attribute name. The [HTML: Living Standard] section 13.2.5.33, Attribute name state specifies that ASCII upper alpha characters are appended to the attribute’s name in lowercase.
The local name is the value of the Attr.localName property. This is derived as follows:
The local name is initially set to the attribute name.
If the local name is an SVG or MathML attribute name, the case-sensitive name is used. [HTML: Living Standard] section 13.2.6.1, Creating and inserting nodes has a table mapping the lowercase attribute names to their SVG/MathML names.
If the local name is an allowed xlink, xml, or xmlns attribute name the local name is the value of the local name column of the attribute name mapping table in [HTML: Living Standard] section 13.2.6.1, Creating and inserting nodes.
If the local name contains a character that is not a valid XML NameStartChar or NameChar, then an implementation-defined replacement string is used. The result must be a valid NCName.
Note:
[DOM: Living Standard] section 13.2.9 Coercing an HTML DOM into an infoset uses a Unnnnnn escape sequence. That would map : to U00003A.
This local name escaping applies only to the HTML parsing algorithm. If the XHTML parsing algorithm is used, the localName and prefix will be correctly set for QName-based node names.
The namespace prefix is the value of the Attr.prefix property, or empty if the value is null.
If the attribute name is an allowed xlink, xml, or xmlns attribute name the namespace prefix is the value of the prefix column of the attribute name mapping table in [HTML: Living Standard] section 13.2.6.1, Creating and inserting nodes.
The namespace URI is the value of the Attr.namespaceURI property, or empty if the value is null;
If the attribute name is an allowed xlink, xml, or xmlns attribute name the namespace URI is the value of the namespace column of the attribute name mapping table in [HTML: Living Standard] section 13.2.6.1, Creating and inserting nodes.
If the node is an instance of HTML DOM ProcessingInstruction then the result is an xs:QName constructed as follows:
The local name is the value of the ProcessingInstruction.target property;
The namespace prefix is empty;
The namespace URI is empty;
Otherwise, the result is anthe empty sequence.
Note:
When the resulting document is an HTML DOM HTMLDocument, the Element.localName and Element.name properties of HTML DOM Element nodes are both set to the qualified name.
Note:
When the resulting document is an HTML DOM HTMLDocument, the Attr.localName and Attr.name properties of HTML DOM Attr nodes are both set to the qualified name.
The result of the [XQuery and XPath Data Model (XDM) 4.0] section 7.6.11 parent Accessordm:parent($node) for an HTML DOM Node is as follows:
Let $parent be the Node.parentNode property of the node;
If $parent is an instance of HTML DOM DocumentFragment, then for each HTML DOM HTMLTemplateElement$template in the parsed DOM tree:
Let $content be the value of the HTMLTemplateElement.content property of $template;
If $content is the same node as $parent, then the result is $template using the XDM mapping rules described in this section;
If there are no more $template nodes, then the result is an empty sequence;
If $parent is null, then the result is anthe empty sequence;
Otherwise, the result is $parent using the XDM mapping rules described in this section.
Note:
The current node can have a HTML DOM DocumentFragment parent node only if the include-template-content key of the html-parser-options is true().
Note:
The HTML DOM DocumentFragment’s Node.parentNode property is null, and a DocumentFragment attached to HTMLTemplateElement.content property does not have a host property connecting the fragment back to the template element.
If a future version of [DOM: Living Standard] adds a DocumentFragment.host property that references the node’s template element, or the implementation has access to that internal property, the implementation may choose to use that instead of traversing the parsed HTML tree.
The result of the [XQuery and XPath Data Model (XDM) 4.0] section 7.6.13 type-name Accessordm:type-name($node) for an HTML DOM Node is as follows:
If the node is an instance of HTML DOM Element then the result is xs:untyped.
If the node is an instance of HTML DOM Attr then the result is xs:untypedAtomic.
If the node is an instance of HTML DOM Text then the result is xs:untypedAtomic.
Otherwise, the result is anthe empty sequence.
The result of the [XQuery and XPath Data Model (XDM) 4.0] section 7.6.15 unparsed-entity-public-id Accessordm:unparsed-entity-public-id($node) for an HTML DOM Node is anthe empty sequence.
The result of the [XQuery and XPath Data Model (XDM) 4.0] section 7.6.16 unparsed-entity-system-id Accessordm:unparsed-entity-system-id($node) for an HTML DOM Node is anthe empty sequence.
This function takes as input an HTML document, and returns the document node at the root of an XDM tree representing the parsed document.
fn:parse-html( | ||
$value | as , | |
$options | as | := {} |
) as | ||
This function is nondeterministic, context-independent, and focus-independent.
If $value is the empty sequence the function returns the empty sequence.
In other cases, $value is expected to contain an HTML document supplied either as a string or a binary value.
The entries that may appear in the $options map are as follows:
record( | |
encoding? | as xs:string, |
fail-on-error? | as xs:boolean |
) | |
| Key | Value | Meaning |
|---|---|---|
| The character encoding to use to decode a sequence of octets that represents an HTML document. Note that encoding names are case-insensitive.
| |
| Indicates whether the function should fail with a dynamic error if the input is not syntactically valid.
| |
false | Parsing errors should be handled as described in [HTML: Living Standard] section 13.2.2, Parse Errors. | |
true | A parsing error should result in the function failing with a dynamic error. | |
The option parameter conventions apply.
If $value is not the empty sequence, an input byte stream is constructed as follows:
If $value is an xs:string, then in principle no decoding is needed. Conceptually, however, the HTML parsing algorithm always starts by decoding an octet stream. The string is therefore first encoded using UTF-8, and the resulting octet stream is then passed to the HTML parser with a known definite encoding of UTF-8, as described in [HTML: Living Standard] section 13.2.3.1, Parsing with a known character encoding.
If the first codepoint of the string is U+FEFF, this should be stripped, since it might otherwise lead to an incorrect encoding inference.
If the type of $value is a sequence of octets (xs:hexBinary or xs:base64Binary) the encoding of the input byte stream is determined in a way consistent with [HTML: Living Standard] section 13.2.3.2, Determining the character encoding:
The encoding key of $options is interpreted in step 2 of Determining the character encoding as the user instructing the user agent to override the document’s character encoding with the specified encoding.
If the encoding key of $options is not specified, step 2 of Determining the character encoding is skipped.
Tokenizing the byte stream according to the HTML parsing algorithm as described in [HTML: Living Standard] section 13.2.5, Tokenization.
Constructing a HTMLDocument object for HTML documents, or an XMLDocument for XML/XHTML documents as described in [HTML: Living Standard] section 13.2.6, Tree construction.
Building an XDM representation of the HTMLDocument or XMLDocument according to the rules in 17.3.1 XDM Mapping from HTML DOM Nodes.
The implementation should process any input HTML that adheres to the current practice of mainstream web browsers, as this evolves over time. Since this is defined by a “living standard” (see [HTML: Living Standard]), no specific version is prescribed. An implementation may define additional options to control aspects of the HTML parsing algorithm, including the selection of a specific HTML parsing library; it may also provide options to process alternative HTML versions or dialects.
The implementation should recognize and process XHTML (referred to in [HTML: Living Standard] as the XML concrete syntax of HTML).
The function is nondeterministic with respect to node identity: that is, if the function is called twice with the same arguments, it is implementation-dependent whether the same node is returned on both occasions.
A dynamic error is raised [err:FODC0011] if the content of $value is not a well-formed HTML document. This includes the case where $value cannot be decoded using the specified encoding.
If the HTML parser accepts a string as the input then that may be used directly when $value is an xs:string instead of converting the string to a sequence of octets in an implementation-dependent encoding. The HTML parser must not perform character encoding processing on that input, treating the HTML string as being in a known character encoding that matches the encoding of the string.
The WHATWG Encoding specification defines the ISO 8859-1 (latin1) and ASCII encodings as aliases of the windows-1252 encoding.
The expression | |
The expression | |
The expression |
Reads an external resource containing HTML, and returns the result of parsing the resource as HTML.
fn:html-doc( | ||
$source | as , | |
$options | as | := {} |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on executable base URI.
If the second argument is omitted or anthe empty sequence, the result is the same as calling the two-argument form with anthe empty map as the value of the $options argument.
The effect of the two-argument function call fn:html-doc($H, $M)is equivalent to the function composition fn:unparsed-binary($H) => fn:parse-html($M).
If $source is the empty sequence, the function returns the empty sequence.
Note:
The ability to access external resources depends on whether the calling code is trustedXP.
The function may raise any error defined for the fn:unparsed-binary or fn:parse-html functions.
The functions listed in this section parse or serialize JSON data.
JSON is a popular format for exchange of structured data on the web: it is specified in [RFC 7159]. This section describes facilities allowing JSON data to be converted to and from XDM values.
This specification describes two ways of representing JSON data losslessly using XDM constructs. The first method uses XDM maps to represent JSON objects, and XDM arrays to represent JSON arrays. The second method represents all JSON constructs using XDM element and attribute nodes.
| Function | Meaning |
|---|---|
fn:parse-json | Parses input supplied in the form of a JSON text, returning the results typically in the form of a map or array. |
fn:json-doc | Reads an external resource containing JSON, and returns the result of parsing the resource as JSON. |
fn:json-to-xml | Parses a string supplied in the form of a JSON text, returning the results in the form of an XML document node. |
fn:xml-to-json | Converts an XML tree, whose format corresponds to the XML representation of JSON defined in this specification, into a string conforming to the JSON grammar. |
Note also:
The function fn:serialize has an option to generate JSON output from a structure of maps and arrays.
The function fn:element-to-map enables arbitrary XML node trees to be converted to trees of maps and arrays suitable for serializing as JSON.
Changes in 4.0 (next | previous)
The rules regarding use of non-XML characters in JSON texts have been relaxed. [Issue 414 PR 546 25 July 2023]
An option is provided to control how the JSON null value should be handled. [Issue 960 PR 1028 20 February 2024]
An option is provided to control how JSON numbers should be formatted. [Issues 973 1037 PRs 975 1058 1246 12 March 2024]
It is now recommended that out-of-range xs:double values should translate to positive or negative infinity. [Issue 641 PR 2387 16 January 2026]
The default for the escape option has been changed to false. The 3.1 specification gave the default value as true, but this appears to have been an error, since it was inconsistent with examples given in the specification and with tests in the test suite. [Issue 1555 PR 1565 11 November 2024]
The order of entries in maps is retained. [Issue 1651 PR 1703 14 January 2025]
Parses input supplied in the form of a JSON text, returning the results typically in the form of a map or array.
fn:parse-json( | ||
$value | as , | |
$options | as | := {} |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If the second argument is omitted or anthe empty sequence, the result is the same as calling the two-argument form with anthe empty map as the value of the $options argument.
The first argument is a JSON text as defined in [RFC 7159], in the form of a string or binary value. The function parses this input to return an XDM value.
If $value is the empty sequence, the function returns the empty sequence.
Note:
If the input is "null", the result will also be an empty sequence.
The $options argument can be used to control the way in which the parsing takes place. The option parameter conventions apply.
The entries that may appear in the $options map are as follows:
record( | |
liberal? | as xs:boolean, |
duplicates? | as xs:string, |
escape? | as xs:boolean, |
fallback? | as (fn(xs:string) as xs:anyAtomicType)?, |
null? | as item()*, |
number-parser? | as (fn(xs:untypedAtomic) as item()?)? |
) | |
| Key | Value | Meaning |
|---|---|---|
| Determines whether deviations from the syntax of RFC7159 are permitted.
| |
false | The input must consist of an optional byte order mark (which is ignored) followed by a string that conforms to the grammar of JSON-text in [RFC 7159]. An error must be raised [err:FOJS0001] if the input does not conform to the grammar. | |
true | The input may contain deviations from the grammar of [RFC 7159], which are handled in an implementation-defined way. (Note: some popular extensions include allowing quotes on keys to be omitted, allowing a comma to appear after the last item in an array, allowing leading zeroes in numbers, and allowing control characters such as tab and newline to be present in unescaped form.) Since the extensions accepted are implementation-defined, an error may be raised [err:FOJS0001] if the input does not conform to the grammar. | |
| Determines the policy for handling duplicate keys in a JSON object. To determine whether keys are duplicates, they are compared using the Unicode codepoint collation, after expanding escape sequences, unless the escape option is set to true, in which case keys are compared in escaped form.
| |
reject | An error is raised [err:FOJS0003] if duplicate keys are encountered. | |
use-first | If duplicate keys are present in a JSON object, all but the first of a set of duplicates are ignored. | |
use-last | If duplicate keys are present in a JSON object, all but the last of a set of duplicates are ignored. | |
| Determines whether special characters are represented in the XDM output in backslash-escaped form.
| |
false | Any permitted character in the input, whether or not it is represented in the input by means of an escape sequence, is represented as an unescaped character in the result. Any other character or codepoint (for example, an unpaired surrogate) is passed to the fallback function as described below; in the absence of a fallback function, it is replaced by U+FFFD (REPLACEMENT CHARACTER, �) . | |
true | JSON escape sequences are used in the result to represent special characters in the JSON input, as defined below, whether or not they were represented using JSON escape sequences in the input. The characters that are considered “special” for this purpose are:
\t), or a six-character escape sequence otherwise (for example \uDEAD). Characters other than these are not escaped in the result, even if they were escaped in the input. | |
| Provides a function which is called when the input contains an escape sequence that represents a character that is not a permitted character. It is an error to supply the fallback option if the escape option is present with the value true.
| |
User-supplied function | The function is called when the JSON input contains character that is not a permitted character It is called once for any surrogate that is not properly paired with another surrogate. The untyped atomic item supplied as the argument will always be a two- or six-character escape sequence, starting with a backslash, that conforms to the rules in the JSON grammar (as extended by the implementation if liberal:true() is specified): for example \b or \uFFFF or \uDEAD. By default, the escape sequence is replaced with the Unicode | |
| Determines how the JSON null value should be represented.
| |
Value | The supplied XDM value is used to represent the JSON null value. The default representation of null is anthe empty sequence, which works well in cases where setting a property of an object to null has the same meaning as omitting the property. It works less well in cases where null is used with some other meaning, because expressions such as the lookup operator ? flatten the result to a single sequence of items, which means that any entries whose value is anthe empty sequence effectively disappear. The property can be set to any XDM value; a suggested value is the xs:QName value fn:QName("http://www.w3.org/2005/xpath-functions", "null"), which is recognized by the JSON serialization method as representing the JSON value null. | |
| Determines how numeric values should be processed.
| |
User-supplied function | The supplied function is called to process the string value of any JSON number in the input. By default, numbers are processed by converting to xs:double using the XPath casting rules. Supplying the value xs:decimal#1 will instead convert to xs:decimal (which potentially retains more precision, but disallows exponential notation), while supplying a function that casts to (xs:decimal | xs:double) will treat the value as xs:decimal if there is no exponent, or as xs:double otherwise. Supplying the value fn:identity#1 causes the value to be retained unchanged as an xs:untypedAtomic. If the liberal option is false (the default), then the supplied number-parser is called if and only if the value conforms to the JSON grammar for numbers (for example, a leading plus sign and redundant leading zeroes are not allowed). If the liberal option is true then it is also called if the value conforms to an implementation-defined extension of this grammar. | |
The various structures that can occur in JSON are transformed recursively to XDM values as follows:
A JSON object is converted to a map. The entries in the map correspond to the key/value pairs in the JSON object. The key is always of type xs:string; the associated value may be of any type, and is the result of converting the JSON value by recursive application of these rules. For example, the JSON text { "x": 2, "y": 5 } is transformed to the value { "x": 2, "y": 5 }.
If duplicate keys are encountered in a JSON object, they are handled as determined by the duplicates option defined above.
The order of entries is retained.
A JSON array is transformed to an array whose members are the result of converting the corresponding member of the array by recursive application of these rules. For example, the JSON text [ "a", "b", null ] is transformed (by default) to the value [ "a", "b", () ].
A JSON string is converted to an xs:string value. The handling of special characters depends on the escape and fallback options, as described in the table above.
A JSON number is processed using the function supplied in the number-parser option; by default it is converted to an xs:double value using the rules for casting from xs:string to xs:double.
Note:
The casting rules leave implementations some flexibility as to how values should be handled when they are too large to represent as an xs:double. It is recommended that when parsing JSON, out-of-range values should be converted to positive or negative infinity. This option enables round-tripping behavior, since the JSON serialization method represents positive and negative infinity as 1e9999 or -1e9999 respectively.
Note:
Round-tripping of NaN can be achieved by setting the option "null": number("NaN")
The JSON boolean values true and false are converted to the corresponding xs:boolean values.
The JSON value null is converted to the value given by the null option, which defaults to an empty sequence.
A dynamic error [err:FOJS0001] occurs if the value of $value does not conform to the JSON grammar, unless the option "liberal":true() is present and the processor chooses to accept the deviation.
A dynamic error [err:FOJS0003] occurs if the option "duplicates": "reject" is present and the value of $value contains a JSON object with duplicate keys.
A dynamic error [err:FOJS0005] occurs if the $options map contains an entry whose key is defined in this specification and whose value is not valid for that key, or if it contains an entry with the key fallback when the option "escape":true() is also present.
The result of the function will be an instance of one of the following types. An instance of test (or in XQuery, typeswitch) can be used to distinguish them:
map(xs:string, item()?) for a JSON object
array(item()?) for a JSON array
xs:string for a JSON string
xs:double for a JSON number
xs:boolean for a JSON boolean
empty-sequence() for a JSON null (or for empty input)
If the source of the JSON input is a resource accessible by URI, then it may be preferable to use the fn:json-doc function. If the source is a binary value (xs:hexBinary or xs:base64Binary) then this can first be decoded as a string using the functions bin:infer-encoding and bin:decode-string.
If the input starts with a byte order mark, this function ignores it. The byte order mark may have been added to the data stream in order to facilitate decoding of an octet stream to a character string, but since this function takes a character string as input, the byte order mark serves no useful purpose.
The possibility of the input containing characters that are not valid in XML (for example, unpaired surrogates) arises only when such characters are expressed using JSON escape sequences. This is because the input to the function is an instance of xs:string, which by definition (see [XQuery and XPath Data Model (XDM) 4.0] section 4.1.5 XML and XSD Versions) cannot contain unpaired surrogates.
The serializer provides an option to output data in json-lines format. This is a format for structured data containing one JSON value (usually but not necessarily a JSON object) on each line. There is no corresponding option to parse json-lines input, but this can be achieved using the expression unparsed-text-lines($uri) =!> parse-json().
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: | parse-json(
'{ "x": "\\", "y": "\u0025" }',
{ 'escape': true() }
) |
| Result: | { "x": "\\", "y": "%" } |
| Expression: | parse-json(
'{ "x": "\\", "y": "\u0000" }'
) |
| Result: | { "x": "\", "y": char(0xFFFD) } |
| Expression: | parse-json(
'{ "x": "\\", "y": "\u0000" }',
{ 'escape': true() }
) |
| Result: | { "x": "\\", "y": "\u0000" } |
| Expression: | parse-json(
'{ "x": "\\", "y": "\u0000" }',
{ 'fallback': fn($s) { '[' || $s || ']' } }
) |
| Result: | { "x": "\", "y": "[\u0000]" } |
| Expression: | parse-json(
"1984.2",
{ 'number-parser': fn { xs:integer(round(.)) } }
) |
| Result: | 1984 |
| Expression: | parse-json(
'[ 1, -1, 2 ]',
{ 'number-parser': fn { boolean(. >= 0) } }
) |
| Result: | [ true(), false(), true() ] |
| Expression: | parse-json('[ "a", null, "b" ]',
{ 'null': #fn:null }
) |
| Result: | [ "a", #fn:null, "b" ] |
Changes in 4.0 (next | previous)
Additional options are available, as defined by fn:parse-json.
It is no longer automatically an error if the input contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. [Issue 414 PR 546 25 July 2023]
Reads an external resource containing JSON, and returns the result of parsing the resource as JSON.
fn:json-doc( | ||
$source | as , | |
$options | as | := {} |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on executable base URI.
If the second argument is omitted or anthe empty sequence, the result is the same as calling the two-argument form with anthe empty map as the value of the $options argument.
The effect of the two-argument function call fn:json-doc($H, $M)is equivalent to the function composition fn:unparsed-text($H) => fn:parse-json($M), except that:
The function may accept a resource in any encoding. [RFC 7159] requires UTF-8, UTF-16, or UTF-32 to be accepted, but it is not an error if a different encoding is used. Unless external encoding information is available, the function must assume that the encoding is one of UTF-8, UTF-16, or UTF-32, and must distinguish these cases by examination of the initial octets of the resource.
Having established the encoding, the function must accept any codepoint that can validly occur in a JSON text, with the exception of unpaired surrogates.
If $source is the empty sequence, the function returns the empty sequence.
Note:
The ability to access external resources depends on whether the calling code is trustedXP.
The function may raise any error defined for the fn:unparsed-text or fn:parse-json functions.
An initial byte order mark is dropped, as with the fn:unparsed-text function.
If the input cannot be decoded (that is, converted into a sequence of Unicode codepoints, which may or may not represent characters), then a dynamic error occurs as with the fn:unparsed-text function.
If the input can be decoded, then the possibility still arises that the resulting sequence of codepoints includes codepoints that are not permitted characters. Such codepoints are translated into JSON escape sequences (for example, \uFFFF), and the JSON escape sequence is then passed to the fallback function specified in the $options argument, which in turn defaults to a function that returns xFFFD.
The function may accept a resource in any encoding. [RFC 7159] requires UTF-8, UTF-16, or UTF-32 to be accepted, but it is not an error if a different encoding is used. The function detects the encoding using the same rules as the unparsed-text function, except that the special handling of media types such as text/xml and application/xml may be skipped.
Parses a string supplied in the form of a JSON text, returning the results in the form of an XML document node.
fn:json-to-xml( | ||
$value | as , | |
$options | as | := {} |
) as | ||
This function is nondeterministic, context-dependent, and focus-independent. It depends on executable base URI.
If the second argument is omitted or anthe empty sequence, the result is the same as calling the two-argument form with anthe empty map as the value of the $options argument.
The first argument is a JSON text as defined in [RFC 7159], in the form of a string. The function parses this string to return an XDM node.
If $value is anthe empty sequence, the function returns the empty sequence.
The $options argument can be used to control the way in which the parsing takes place. The option parameter conventions apply.
The entries that may appear in the $options map are as follows:
record( | |
liberal? | as xs:boolean, |
duplicates? | as xs:string, |
validate? | as xs:boolean, |
escape? | as xs:boolean, |
fallback? | as (fn(xs:string) as xs:anyAtomicType)?, |
number-parser? | as (fn(xs:untypedAtomic) as item()?)? |
) | |
| Key | Value | Meaning |
|---|---|---|
| Determines whether deviations from the syntax of RFC7159 are permitted.
| |
false | The input must consist of an optional byte order mark (which is ignored) followed by a string that conforms to the grammar of JSON-text in [RFC 7159]. An error must be raised (see below) if the input does not conform to the grammar. | |
true | The input may contain deviations from the grammar of [RFC 7159], which are handled in an implementation-defined way. (Note: some popular extensions include allowing quotes on keys to be omitted, allowing a comma to appear after the last item in an array, allowing leading zeroes in numbers, and allowing control characters such as tab and newline to be present in unescaped form.) Since the extensions accepted are implementation-defined, an error may be raised (see below) if the input does not conform to the grammar. | |
| Determines the policy for handling duplicate keys in a JSON object. To determine whether keys are duplicates, they are compared using the Unicode codepoint collation, after expanding escape sequences, unless the escape option is set to true, in which case keys are compared in escaped form.
| |
reject | An error is raised [err:FOJS0003] if duplicate keys are encountered. | |
use-first | If duplicate keys are present in a JSON object, all but the first of a set of duplicates are ignored. | |
retain | If duplicate keys are present in a JSON object, the XML result of the function will also contain duplicates (making it invalid against the schema). This value is therefore incompatible with the option validate=true [err:FOJS0005] | |
| Determines whether the generated XML tree is schema-validated.
| |
true | Indicates that the resulting XDM instance must be typed; that is, the element and attribute nodes must carry the type annotations that result from validation against the schema given at D.2 Schema for the result of fn:json-to-xml, or against an implementation-defined schema if the liberal option has the value true. | |
false | Indicates that the resulting XDM instance must be untyped. | |
| Determines whether special characters are represented in the XDM output in backslash-escaped form.
| |
false | All characters in the input that are valid in the version of XML supported by the implementation, whether or not they are represented in the input by means of an escape sequence, are represented as unescaped characters in the result. Any characters or codepoints that are not valid XML characters (for example, unpaired surrogates) are passed to the fallback function as described below; in the absence of a fallback function, they are replaced by the character U+FFFD (REPLACEMENT CHARACTER, �) . The attributes escaped and escaped-key will not be present in the XDM output. | |
true | JSON escape sequences are used in the result to represent special characters in the JSON input, as defined below, whether or not they were represented using JSON escape sequences in the input. The characters that are considered “special” for this purpose are:
\t), or a six-character escape sequence otherwise (for example \uDEAD). Characters other than these will not be escaped in the result, even if they were escaped in the input. In the result:
| |
| Provides a function which is called when the input contains an escape sequence that represents a character that is not valid in the version of XML supported by the implementation. It is an error to supply the fallback option if the escape option is present with the value true.
| |
User-supplied function | The function is called when the JSON input contains an escape sequence that is valid according to the JSON grammar, but which does not represent a character that is valid in the version of XML supported by the processor. In the case of surrogates, it is called once for any six-character escape sequence that is not properly paired with another surrogate. The untyped atomic item supplied as the argument will always be a two- or six-character escape sequence, starting with a backslash, that conforms to the rules in the JSON grammar (as extended by the implementation if liberal:true() is specified): for example \b or \uFFFF or \uDEAD. By default, the escape sequence is replaced with the Unicode | |
| Determines how numeric values should be processed.
| |
User-supplied function | The supplied function is called to process the string value of any JSON number in the input. The string value of the number element generated in the result will be the value obtained by calling the supplied function, and then converting its result to a string by calling fn:string#1. By default, numbers are represented in the XML output exactly as they were written in the input. Supplying the value | |
The various structures that can occur in JSON are transformed recursively to XDM values according to the rules given in 17.4.2 XML Representation of JSON.
The function returns a document node, whose only child is the element node representing the outermost construct in the JSON text.
The function is nondeterministic with respect to node identity: that is, if the function is called twice with the same arguments, it is implementation-dependent whether the same node is returned on both occasions.
The base URI of the returned document node is taken from the executable base URIXP of the function call.
The choice of namespace prefix (or absence of a prefix) in the names of constructed nodes is implementation-dependent.
The XDM tree returned by the function does not contain any unnecessary (albeit valid) nodes such as whitespace text nodes, comments, or processing instructions. It does not include any whitespace in the value of number or boolean element nodes, nor in the value of escaped or escaped-key attribute nodes.
If the result is typed, every element named string will have an attribute named escaped whose value is either true or false, and every element having an attribute named key will also have an attribute named escaped-key whose value is either true or false.
If the result is untyped, the attributes escaped and escaped-key will either be present with the value true, or will be absent. They will never be present with the value false.
An error is raised [err:FOJS0001] if the value of $value does not conform to the JSON grammar as defined by [RFC 7159], unless the option "liberal":true() is present and the processor chooses to accept the deviation.
An error is raised [err:FOJS0004] if the value of the validate option is true and the processor does not support schema validation or typed data.
An error is raised [err:FOJS0005] if the value of $options includes an entry whose key is defined in this specification, and whose value is not a permitted value for that key.
To read a JSON file, this function can be used in conjunction with the fn:unparsed-text function.
Many JSON implementations allow commas to be used after the last item in an object or array, although the specification does not permit it. The option spec="liberal" is provided to allow such deviations from the specification to be accepted. Some JSON implementations also allow constructors such as new Date("2000-12-13") to appear as values: specifying spec="liberal" allows such extensions to be accepted, but does not guarantee it. If such extensions are accepted, the resulting value is implementation-defined, and will not necessarily conform to the schema at D.2 Schema for the result of fn:json-to-xml.
If the input starts with a byte order mark, this function ignores it. The byte order mark may have been added to the data stream in order to facilitate decoding of an octet stream to a character string, but since this function takes a character string as input, the byte order mark serves no useful purpose.
The possibility of the input containing characters that are not valid in XML (for example, unpaired surrogates) arises only when such characters are expressed using JSON escape sequences. This is the only possibility because the input to the function is an instance of xs:string, which by definition can contain only those characters that are valid in XML.
| Expression: | json-to-xml(
'{ "x": 1, "y": [ 3, 4, 5 ] }',
{ "validate": false() }
) |
|---|---|
| Result: | <map xmlns="http://www.w3.org/2005/xpath-functions"> <number key="x">1</number> <array key="y"> <number>3</number> <number>4</number> <number>5</number> </array> </map> (with whitespace added for legibility) |
| Expression: | json-to-xml(
'"abcd"',
{ 'liberal': false() }
) |
| Result: | <string xmlns="http://www.w3.org/2005/xpath-functions">abcd</string> |
| Expression: | json-to-xml(
'{ "x": "\\", "y": "\u0025" }',
{ "validate": false() }
) |
| Result: | <map xmlns="http://www.w3.org/2005/xpath-functions"> <string key="x">\</string> <string key="y">%</string> </map> (with whitespace added for legibility) |
| Expression: | json-to-xml(
'{ "x": "\\", "y": "\u0025" }',
{ 'escape': true(), "validate": false() }
) |
| Result: | <map xmlns="http://www.w3.org/2005/xpath-functions"> <string escaped="true" key="x">\\</string> <string key="y">%</string> </map> (with whitespace added for legibility) (But see the detailed rules for alternative values of the |
The following example illustrates use of the | |
let $json := unparsed-text('http://example.com/endpoint')
let $options := {
'liberal': true(),
'fallback': fn($char as xs:string) as xs:string {
let $c0chars := {
'\u0000': '[NUL]',
'\u0001': '[SOH]',
'\u0002': '[STX]',
...
'\u001E': '[RS]',
'\u001F': '[US]'
}
let $replacement := $c0chars($char)
return if (exists($replacement)) then (
$replacement
) else (
error( #err:invalid-char,
'Error: ' || $char || ' is not a C0 control character.'
)
)
}
}
return json-to-xml($json, $options) | |
Changes in 4.0 (next | previous)
An option has been added to suppress the escaping of the solidus (forwards slash) character. [Issue 1347 PR 1353 3 September 2024]
Numbers now retain their original lexical form, except for any changes needed to satisfy JSON syntax rules (for example, stripping leading zero digits). [Issue 1445 PR 1455 1 October 2024]
Converts an XML tree, whose format corresponds to the XML representation of JSON defined in this specification, into a string conforming to the JSON grammar.
fn:xml-to-json( | ||
$node | as , | |
$options | as | := {} |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If the second argument is omitted or anthe empty sequence, the result is the same as calling the two-argument form with anthe empty map as the value of the $options argument.
The first argument $node is a node; the subtree rooted at this node will typically be the XML representation of a JSON document as defined in 17.4.2 XML Representation of JSON.
If $node is the empty sequence, the function returns the empty sequence.
The $options argument can be used to control the way in which the conversion takes place. The option parameter conventions apply.
The entries that may appear in the $options map are as follows:
record( | |
escape-solidus? | as xs:boolean, |
indent? | as xs:boolean |
) | |
| Key | Value | Meaning |
|---|---|---|
| Determines whether the character U+002F (SOLIDUS, FORWARD SLASH, /) should be escaped as \/. By default the character is escaped, but this is only necessary when the resulting JSON is embedded in HTML.
| |
false | The character U+002F (SOLIDUS, FORWARD SLASH, /) is output as is, without escaping. | |
true | The character U+002F (SOLIDUS, FORWARD SLASH, /) is escaped by preceding it with U+005C (REVERSE SOLIDUS, BACKSLASH, \) . | |
| Determines whether additional whitespace should be added to the output to improve readability.
| |
false | The processor must not insert any insignificant whitespace between JSON tokens. | |
true | The processor may insert whitespace between JSON tokens in order to improve readability. The specification imposes no constraints on how this is done. | |
The node supplied as $node must be one of the following: [err:FOJS0006]
An element node whose name matches the name of a global element declaration in the schema given in D.2 Schema for the result of fn:json-to-xml (“the schema”) and that is valid as defined below:
If the type annotation of the element matches the type of the relevant element declaration in the schema (indicating that the element has been validated against the schema), then the element is considered valid.
Otherwise, the processor may attempt to validate the element against the schema, in which case it is treated as valid if and only if the outcome of validation is valid.
Otherwise (if the processor does not attempt validation using the schema), the processor must ensure that the content of the element, after stripping all attributes (at any depth) in namespaces other than http://www.w3.org/2005/xpath-functions, is such that validation against the schema would have an outcome of valid.
Note:
The process described here is not precisely equivalent to schema validation. For example, schema validation will fail if there is an invalid xsi:type or xsi:nil attribute, whereas this process will ignore such attributes.
An element node E having a key attribute and/or an escaped-key attribute provided that E would satisfy one of the above conditions if the key and/or escaped-key attributes were removed.
A document node having exactly one element child and no text node children, where the element child satisfies one of the conditions above.
Furthermore, $node must satisfy the following constraint (which cannot be conveniently expressed in the schema). Every element M that is a descendant-or-self of $node and has local name map and namespace URI http://www.w3.org/2005/xpath-functions must satisfy the following rule: there must not be two distinct children of M (say C1 and C2) such that the normalized key of C1 is equal to the normalized key of C2. The normalized key of an element C is as follows:
If C has the attribute value escaped-key="true", then the value of the key attribute of C, with all JSON escape sequences replaced by the corresponding Unicode characters according to the JSON escaping rules.
Otherwise (the escaped-key attribute of C is absent or set to false), the value of the key attribute of C.
Nodes in the input tree are handled by applying the following rules, recursively. In these rules the phrase “an element named N” means “an element node whose local name is N and whose namespace URI is http://www.w3.org/2005/xpath-functions”.
A document node having a single element node child is processed by processing that child.
An element named null results in the output null.
An element $E named boolean results in the output true or false depending on the result of xs:boolean(fn:string($E)).
An element $E named number is processed as follows.
The input is required to conform to the XSD rules defining a valid instance of xs:double (excluding infinity and NaN), while the output is required to conform to the JSON rules defining a valid JSON number. These rules are slightly different.
Specifically, the XSD rules require the value (after removing leading and trailing whitespace) to match the regular expression:
(\+|-)?([0-9]+(\.[0-9]*)?|\.[0-9]+)([Ee](\+|-)?[0-9]+)?
while the JSON rules require:
-?(0|[1-9][0-9]*)(\.[0-9]+)?([Ee](\+|-)?[0-9]+)?
If the input value does not match the required JSON format, it must therefore be adjusted by applying the following steps:
Remove leading and trailing whitespace.
Remove any leading plus sign.
Remove any leading zero digits in the integer part, while ensuring that at least one digit remains.
If there is a decimal point that is not preceded by a digit, add a zero digit before the decimal point.
If there is a decimal point that is not followed by a digit, add a zero digit after the decimal point.
Note:
The output uses exponential notation if and only if the input uses exponential notation.
The rules have changed since version 3.1 of this specification. In previous versions, the supplied number was cast to an xs:double, and then serialized using the rules of the fn:string function. This resulted in JSON numbers using exponential notation for values outside the range 1e-6 to 1e6, and led to a loss of precision for 64-bit integer values.
An element named string results in the output of the string value of the element, enclosed in quotation marks, with any special characters in the string escaped as described below.
An element named array results in the output of the children of the array element, each processed by applying these rules recursively: the items in the resulting list are enclosed between square brackets, and separated by commas.
An element named map results in the output of a sequence of map entries corresponding to the children of the map element, enclosed between curly braces and separated by commas. Each entry comprises the value of the key attribute of the child element, enclosed in quotation marks and escaped as described below, followed by a colon, followed by the result of processing the child element by applying these rules recursively. The order of properties in the output JSON representation retains the order of the children of the map element.
Comments, processing instructions, and whitespace text node children of map and array are ignored.
Strings are escaped as follows:
If the attribute escaped="true" is present for a string value, or escaped-key="true" for a key value, then:
any valid JSON escape sequence present in the string is copied unchanged to the output;
any invalid JSON escape sequence results in a dynamic error [err:FOJS0007];
any unescaped occurrence of U+0022 (QUOTATION MARK, ") , U+0008 (BACKSPACE) , U+000C (FORM FEED) , U+000A (NEWLINE) , U+000D (CARRIAGE RETURN) , U+0009 (TAB) , or (subject to the escape-solidus option) U+002F (SOLIDUS, FORWARD SLASH, /) is replaced by \", \b, \f, \n, \r, \t, or \/ respectively;
any other codepoint in the range 1-31 or 127-159 is replaced by an escape in the form \uHHHH where HHHH is the upper-case hexadecimal representation of the codepoint value.
Otherwise (that is, in the absence of the attribute escaped="true" for a string value, or escaped-key="true" for a key value):
any occurrence of backslash is replaced by \\
any occurrence of U+0022 (QUOTATION MARK, ") , U+0008 (BACKSPACE) , U+000C (FORM FEED) , U+000A (NEWLINE) , U+000D (CARRIAGE RETURN) , or U+0009 (TAB) is replaced by \", \b, \f, \n, \r, or \t respectively;
any other codepoint in the range 1-31 or 127-159 is replaced by an escape in the form \uHHHH where HHHH is the upper-case hexadecimal representation of the codepoint value.
A dynamic error is raised [err:FOJS0005] if the value of $options includes an entry whose key is defined in this specification, and whose value is not a permitted value for that key.
A dynamic error is raised [err:FOJS0006] if the value of $node is not a document or element node or is not valid according to the schema for the XML representation of JSON, or if a map element has two children whose normalized key values are the same.
A dynamic error is raised [err:FOJS0007] if the value of $node includes a string labeled with escaped="true", or a key labeled with escaped-key="true", where the content of the string or key contains an invalid JSON escape sequence: specifically, where it contains a backslash (\) that is not followed by one of the characters ", \, /, b, f, n, r, t, or u, or where it contains the characters \u not followed by four hexadecimal digits (that is [0-9A-Fa-f]{4}).
The rule requiring schema validity has a number of consequences, including the following:
The input cannot contain no-namespace attributes, or attributes in the namespace http://www.w3.org/2005/xpath-functions, except where explicitly allowed by the schema. Attributes in other namespaces, however, are ignored.
Nodes that do not affect schema validity, such as comments, processing instructions, namespace nodes, and whitespace text node children of map and array, are ignored.
Numeric values are restricted to those that are valid in JSON: the schema disallows positive and negative infinity and NaN.
Duplicate key values are not permitted. Most cases of duplicate keys are prevented by the rules in the schema; additional cases (where the keys are equal only after expanding JSON escape sequences) are prevented by the prose rules of this function. For example, the key values \n and \u000A are treated as duplicates even though the rules in the schema do not treat them as such.
The rule allowing the top-level element to have a key attribute (which is ignored) allows any element in the output of the fn:json-to-xml function to be processed: for example, it is possible to take a JSON document, convert it to XML, select a subtree based on the value of a key attribute, and then convert this subtree back to JSON, perhaps after a transformation. The rule means that an element with the appropriate name will be accepted if it has been validated against one of the types mapWithinMapType, arrayWithinMapType, stringWithinMapType, numberWithinMapType, booleanWithinMapType, or nullWithinMapType.
The input | |
The input | |
The input |
This section describes functions that parse CSV data.
[Definition] The term comma separated values or CSV refers to a wide variety of plain-text tabular data formats with fields and records separated by standard character delimiters (often, but not invariably, commas).
A CSV is a 2-dimensional tabular data structure consisting of multiple rows (also known as records). Each row contains multiple fields. Fields occupying the same position in successive rows constitute a column. Columns are identified by position and optionally by name. Column names can be assigned within a CSV using an optional header row.
CSV has developed informally for decades, and many variations are found. This specification refers to [RFC 4180], which provides a standardized grammar. This specification extends the grammar defined in [RFC 4180] as follows:
This specification uses the term row where RFC 4180 uses record.
Line endings are normalized: specifically, the character sequences U+000D (CARRIAGE RETURN) , or U+000D (CARRIAGE RETURN) followed by U+000A (NEWLINE) , are converted to a single U+000A (NEWLINE) character. This applies whether or not the line ending appears within a quoted string, and whether or not U+000A (NEWLINE) is the chosen row delimiter.
Row delimiters other than newline are recognized.
Field delimiters other than U+002C (COMMA, ,) are recognized.
Quote characters other than U+0022 (QUOTATION MARK, ") are recognized.
Non-ASCII characters are recognized.
This specification defines a mapping from this extended grammar to constructs in the XDM model, and provides illustrative examples of how these constructs can be combined with other language features to process CSV data.
| Function | Meaning |
|---|---|
fn:csv-to-arrays | Parses CSV data supplied as a string, returning the results in the form of a sequence of arrays of strings. |
fn:parse-csv | Parses CSV data, returning the results in the form of a record containing information about the names in the header, as well as the data itself. |
fn:csv-doc | Reads an external resource containing CSV, and returns the results as a record containing information about the names in the header, as well as the data itself. |
fn:csv-to-xml | Parses CSV data supplied as a string, returning the results as an XML document, as described by 17.5.9 Representing CSV data as XML. |
The most basic function for parsing CSV is fn:csv-to-arrays which recognizes the delimiters for rows and fields and returns a sequence of arrays each corresponding to one row. The fields within each array are represented as instances of xs:string.
The other two functions recognize column names, and make it easier to address individual fields using these names. The parse-csv function delivers this capability using XDM maps and functions, while csv-to-xml function represents the information using XDM element nodes.
The delimiters used for rows, columns, and quoting are configurable. An error is raised if the same delimiter string is used in multiple roles [err:FOCV0003].
Rows in CSV files are typically delimited with CRLF (U+000D (CARRIAGE RETURN) , U+000A (NEWLINE) ), LF (U+000A (NEWLINE) ), or CR (U+000D (CARRIAGE RETURN) ) line endings, although RFC 4180 specifies CRLF. The CSV parsing functions normalize these line endings to LF (U+000A (NEWLINE) ). They therefore use LF as the default row delimiter.
The last row in the file may or may not be followed by a row delimiter. Anthe empty file is treated as containing zero rows, while a file consisting solely of a row delimiter is treated as containing one empty row. In all other cases, a file that does not end with a row delimiter is treated as if a row delimiter were added at the end.
Fields in CSV are frequently delimited with a comma. Other field delimiters are useful, for example when numeric data uses comma as a decimal separator. The chosen field delimiter is then often U+003B (SEMICOLON, ;) or U+0009 (TAB) .
The column delimiter thus defaults to U+002C (COMMA, ,) . The value may be any single Unicode character. An error is raised if the column-delimiter option is set to a multi-character string.
Parses CSV data supplied as a string, returning the results in the form of a sequence of arrays of strings.
fn:csv-to-arrays( | ||
$value | as , | |
$options | as | := {} |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The $value argument is CSV data, as defined in [RFC 4180], in the form of an xs:string value. The function parses this string, after normalizing newlines so that U+000D (CARRIAGE RETURN) and (U+000D (CARRIAGE RETURN) , U+000A (NEWLINE) ) sequences are converted to U+000A (NEWLINE) . The result of the function is a sequence of arrays of strings, that is array(xs:string)*; each array represents one row of the CSV input.
If $value is the empty sequence or a zero-length string, the function returns anthe empty sequence.
The $options argument can be used to control the way in which the parsing takes place. The option parameter conventions apply.
If the $options argument is omitted or anthe empty sequence, the result is the same as calling the two-argument form with anthe empty map as the value of the $options argument.
The entries that may appear in the $options map are as follows:
record( | |
field-delimiter? | as xs:string, |
row-delimiter? | as xs:string, |
quote-character? | as xs:string, |
trim-whitespace? | as xs:boolean |
) | |
| Key | Value | Meaning |
|---|---|---|
| The character used to delimit fields within a record. An instance of xs:string whose length is exactly one.
| |
| The character used to delimit rows within the CSV string. An instance of xs:string whose length is exactly one. Defaults to a single newline character (U+000A (NEWLINE) ).
| |
| The character used to quote fields within the CSV string. An instance of xs:string whose length is exactly one.
| |
| Determines whether leading and trailing whitespace is removed from the content of unquoted fields.
| |
false | Unquoted fields will be returned with any leading or trailing whitespace intact. | |
true | Unquoted fields will be returned with leading or trailing whitespace removed, and all other whitespace preserved. | |
An empty field is represented by a zero-length string. An empty field is deemed to exist when a field delimiter immediately follows either another field delimiter, or a row delimiter, or the start of $value; or when a row delimiter or the end of $value immediately follows a field delimiter.
A blank row is represented as anthe empty array (not as an array containing a single empty field). A blank row is deemed to exist when a row delimiter immediately follows either another row delimiter or the start of $value, after trimming of whitespace if the trim-whitespace option is true. No blank row occurs after the final row delimiter.
If $value is a zero-length string, the CSV is considered to contain no rows; while if $value consists of a single row delimiter, it is considered to contain a single blank row. The presence or absence of a final row delimiter generally has no effect on the result, except when it appears at the start of the input, in which case it causes a single blank row to exist.
A dynamic error [err:FOCV0001] occurs if the value of $csv does not conform to the required grammar.
A dynamic error [err:FOCV0002] occurs if the value of the field-delimiter, row-delimiter, or quote-character option is not a single character.
A dynamic error [err:FOCV0003] occurs if the same character is used for more than one of the field-delimiter, row-delimiter, and quote-character.
The default row delimiter is a single newline character U+000A (NEWLINE) . Alternative line endings such as CR and CRLF will already have been normalized to a single newline.
All fields are returned as xs:string values.
Quoted fields in the input are returned without the quotes.
The first row is not treated specially.
For more discussion of the returned data, see 17.5.3 Basic parsing of CSV to arrays.
Handling trivial input: | |
| Expression: | csv-to-arrays(()) |
|---|---|
| Result: | () |
| Expression: | csv-to-arrays("") |
| Result: | () |
| Expression: | csv-to-arrays(char('\n')) |
| Result: | [] |
| Expression: | csv-to-arrays(" ", { 'trim-whitespace': true() }) |
| Result: | () |
| Expression: | csv-to-arrays(" ", { 'trim-whitespace': false() }) |
| Result: | [ " " ] |
| Expression: | csv-to-arrays(` { char('\n') }`, { 'trim-whitespace': true() }) |
| Result: | [] |
| Expression: | csv-to-arrays(` { char('\n') }`, { 'trim-whitespace': false() }) |
| Result: | [ " " ] |
| Expression: | csv-to-arrays(`{ char('\n') } `, { 'trim-whitespace': true() }) |
| Result: | [] |
| Expression: | csv-to-arrays(`{ char('\n') } `, { 'trim-whitespace': false() }) |
| Result: | [], [ " " ] |
Using newline separators: | |
| Expression: | csv-to-arrays(
`name,city{ char('\n') }` ||
`Bob,Berlin{ char('\n') }` ||
`Alice,Aachen{ char('\n') }`
) |
| Result: | [ "name", "city" ], [ "Bob", "Berlin" ], [ "Alice", "Aachen" ] |
| Expression: | let $CRLF := `{ char('\r') }{ char('\n') }`
return csv-to-arrays(
`name,city{ $CRLF }` ||
`Bob,Berlin{ $CRLF }` ||
`Alice,Aachen{ $CRLF }`
) |
| Result: | [ "name", "city" ], [ "Bob", "Berlin" ], [ "Alice", "Aachen" ] |
Quote handling: | |
| Expression: | csv-to-arrays(
string-join(
(`"name","city"`, `"Bob","Berlin"`, `"Alice","Aachen"`),
char('\n')
)
) |
| Result: | [ "name", "city" ], [ "Bob", "Berlin" ], [ "Alice", "Aachen" ] |
| Expression: | csv-to-arrays(
`"name","city"{ char('\n') }` ||
`"Bob ""The Exemplar"" Mustermann","Berlin"{ char('\n') }`
) |
| Result: | ( [ "name", "city" ], [ 'Bob "The Exemplar" Mustermann', "Berlin" ] ) |
Non-default record- and field-delimiters: | |
| Expression: | csv-to-arrays(
"name;city§Bob;Berlin§Alice;Aachen",
{ "row-delimiter": "§", "field-delimiter": ";" }
) |
| Result: | [ "name", "city" ], [ "Bob", "Berlin" ], [ "Alice", "Aachen" ] |
Non-default quote character: | |
| Expression: | csv-to-arrays(
string-join(
("|name|,|city|", "|Bob|,|Berlin|"),
char('\n')
),
{ "quote-character": "|" }
) |
| Result: | [ "name", "city" ], [ "Bob", "Berlin" ] |
Trimming whitespace in fields: | |
| Expression: | csv-to-arrays(
string-join(
("name ,city ", "Bob ,Berlin ", "Alice ,Aachen "),
char('\n')
),
{ "trim-whitespace": true() }
) |
| Result: | [ "name", "city" ], [ "Bob", "Berlin" ], [ "Alice", "Aachen" ] |
This record type is used to hold the result of the fn:parse-csv function.
| Name | Meaning |
|---|---|
| This entry holds a sequence of strings containing column names. The content depends on the setting of the
|
| This entry holds a map from column names (as strings) to column positions (as 1-based positive integers). The content depends on the setting of the
|
| This entry is a sequence of arrays of strings, holding the parsed rows of the CSV data. The format is the same as the result of the
|
| A function providing ready access to a given field in a given row. The function($row as xs:positiveInteger, $column as (xs:positiveInteger | xs:string)) as xs:string The function takes two arguments: the first is an integer giving the row number (1-based), the second identifies a column either by its name or by its 1-based position. Except in error cases (described below), the function call The properties of the function are as follows:
|
Parses CSV data, returning the results in the form of a record containing information about the names in the header, as well as the data itself.
fn:parse-csv( | ||
$value | as , | |
$options | as | := {} |
) as parsed-csv-structure-record? | ||
This function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence, the function returns the empty sequence.
The input supplied in $value is CSV data, as defined in [RFC 4180]. The function first parses the input using fn:csv-to-arrays, and then further processes the result. The initial parsing is exactly as defined for fn:csv-to-arrays, and can be controlled using the same options. Additional options are available to control the way in which header information and column names are handled.
If the input is the a zero-length string, the function returns a parsed-csv-structure-record whose rows entry is the empty sequence.
The $options argument can be used to control the way in which the parsing takes place. The option parameter conventions apply.
If the $options argument is omitted or is anthe empty sequence, the result is the same as calling the two-argument form with anthe empty map as the value of the $options argument.
The entries that may appear in the $options map are as follows:
record( | |
field-delimiter? | as xs:string, |
row-delimiter? | as xs:string, |
quote-character? | as xs:string, |
trim-whitespace? | as xs:boolean, |
header? | as item()*, |
select-columns? | as xs:positiveInteger*, |
trim-rows? | as xs:boolean |
) | |
| Key | Value | Meaning |
|---|---|---|
| The character used to delimit fields within a record. An instance of xs:string whose length is exactly one.
| |
| The character used to delimit rows within the CSV string. An instance of xs:string whose length is exactly one. Defaults to a single newline character (U+000A (NEWLINE) ). Note that this is tested after line endings are normalized.
| |
| The character used to quote fields within the CSV string. An instance of xs:string whose length is exactly one.
| |
| Determines whether leading and trailing whitespace is removed from the content of unquoted fields.
| |
false | Unquoted fields will be returned with any leading or trailing whitespace intact. | |
true | Unquoted fields will be returned with leading or trailing whitespace removed, and all other whitespace preserved. | |
| Determines whether the first row of the CSV should be treated as a list of column names, or whether column names are being supplied by the caller. The value must either be a single boolean, or a sequence of zero or more strings.
| |
true | Column names are taken from the first row of the CSV data. | |
false | Column names are not available; all references to columns are by ordinal position. | |
xs:string* | Supplies explicit names for the columns. The Nth name in the list applies to the Nth column after any filtering or rearrangement. A zero-length string can be used when there is a column that requires no name. | |
| A sequence of integers indicating which columns to include and in which order. If this option is absent or empty, all columns are returned in their original order. For example, the value 1 to 4 indicates that the output contains the first, second, third, and fourth columns from the input, in order, while (1, 5, 4) indicates that the output contains three columns, taken from the first, fifth, and fourth columns of the input, in that order. An integer in the sequence is treated as the 1-based index of the column to include. Any other columns are dropped. If a particular row includes no field at the specified index, anthe empty field is included at the relevant position in the result. If an integer appears more than once then the result will include duplicated columns.
| |
| Determines whether all rows should be adjusted to contain the same number of fields. This option is ignored if select-columns is specified.
| |
false | No padding or trimming of rows takes place, unless requested using the select-columns option. | |
true | The number of fields in the first row (whether this be a header or a data row) determines the number of fields in every subsequent row; to achieve this, excess fields are removed, or additional zero-length fields are added. | |
The result of the function is a parsed-csv-structure-record, as defined in 17.5.6 Record fn:parsed-csv-structure-record.
A dynamic error [err:FOCV0001] occurs if the value of $csv does not conform to the required grammar.
A dynamic error [err:FOCV0002] occurs if any of the options field-delimiter, row-delimiter, or quote-character is not a single character.
A dynamic error [err:FOCV0003] occurs if the same character is used for more than one of the options field-delimiter, row-delimiter, and quote-character.
The default row delimiter is a single newline character U+000A (NEWLINE) . Alternative line endings such as CR and CRLF will already have been normalized to a single newline.
All fields are returned as xs:string values.
Quoted fields in the input are returned without the quotes.
For more discussion of the returned data, see 17.5.5 Enhanced parsing of CSV data to maps and arrays.
If the source of the CSV input is a resource accessible by URI, then it may be preferable to use the fn:csv-doc function. If the source is a binary value (xs:hexBinary or xs:base64Binary) then this can first be decoded as a string using the functions bin:infer-encoding and bin:decode-string.
| Variables | |
|---|---|
let $display := fn($result) {
(: tidy up the result for display (function items cannot be properly displayed) :)
map:put($result, "get", "(: function :)")
} | |
Default delimiters, no column headers: | |
| Expression: | let $input := string-join(
("name,city", "Bob,Berlin", "Alice,Aachen"),
char('\n')
)
let $result := parse-csv($input)
return (
$result => $display(),
$result?get(1, 2),
$result?get(2, 2)
) |
|---|---|
| Result: | {
"columns": (),
"column-index": {},
"rows": ([ "name", "city" ], [ "Bob", "Berlin" ], [ "Alice", "Aachen" ]),
"get": "(: function :)"
},
"city",
"Berlin" |
Default delimiters, column headers: | |
| Expression: | let $input := string-join(
("name,city", "Bob,Berlin", "Alice,Aachen"),
char('\n')
)
let $result := parse-csv($input, { "header": true() })
return (
$result => $display(),
$result?get(1, "name"),
$result?get(2, "city")
) |
| Result: | {
"columns": ("name", "city"),
"column-index": { "name": 1, "city": 2 },
"rows": ([ "Bob", "Berlin" ], [ "Alice", "Aachen" ]),
"get": "(: function :)"
},
"Bob",
"Aachen" |
Custom delimiters, no column headers: | |
| Expression: | let $options := {
"row-delimiter": "§",
"field-delimiter": ";",
"quote-character": "|"
}
let $input := "|name|;|city|§|Bob|;|Berlin|§|Alice|;|Aachen|"
let $result := parse-csv($input, $options)
return (
$result => $display(),
$result?get(3, 1)
) |
| Result: | {
"columns": (),
"column-index": {},
"rows": ([ "name", "city" ], [ "Bob", "Berlin" ], [ "Alice", "Aachen" ]),
"get": "(: function :)"
},
"Alice" |
Supplied column names: | |
| Expression: | let $headers := ("Person", "Location")
let $options := { "header": $headers, "row-delimiter": ";" }
let $input := "Alice,Aachen;Bob,Berlin;"
let $parsed-csv := parse-csv($input, $options)
return (
$parsed-csv => $display(),
$parsed-csv?get(2, "Location")
) |
| Result: | {
"columns": ("Person", "Location"),
"column-index": { "Person": 1, "Location": 2 },
"rows": ([ "Alice", "Aachen" ], [ "Bob", "Berlin" ]),
"get": "(: function :)"
},
"Berlin" |
Filtering columns, with ragged input and | |
| Expression: | let $input := string-join((
"date,name,city,amount,currency,original amount,note",
"2023-07-19,Bob,Berlin,10.00,USD,13.99",
"2023-07-20,Alice,Aachen,15.00",
"2023-07-20,Charlie,Celle,15.00,GBP,11.99,cake,not a lie"
), char('\n'))
let $options := {
"header": true(),
"select-columns": (2, 1, 4)
}
let $result := parse-csv($input, $options)
return (
$result => $display(),
$result?get(2, "amount")
) |
| Result: | {
"columns": ("name", "date", "amount"),
"column-index": { "name": 1, "date": 2, "amount": 3 },
"rows": (
[ "Bob", "2023-07-19", "10.00" ],
[ "Alice", "2023-07-20", "15.00" ],
[ "Charlie", "2023-07-20", "15.00" ]
),
"get": "(: function :)"
},
"15.00" |
Filtering columns, with supplied column map | |
| Expression: | let $input := string-join((
"2023-07-20,Alice,Aachen,15.00",
"2023-07-19,Bob,Berlin,10.00,USD,13.99",
"2023-07-20,Charlie,Celle,15.00,GBP,11.99,cake,not a lie"
), char('\n'))
let $options := {
"header": ( "Person", "", "Amount" ),
"select-columns": (2, 1, 4)
}
let $result := parse-csv($input, $options)
return (
$result => $display(),
$result?get(2, "Person"),
$result?get(2, "Amount")
) |
| Result: | {
"columns": ("Person", "", "Amount"),
"column-index": { "Person": 1, "Amount": 3 },
"rows": ([ "Alice", "2023-07-20", "15.00" ],
[ "Bob", "2023-07-19", "10.00" ],
[ "Charlie", "2023-07-20", "15.00" ]),
"get": "(: function :)"
},
"Bob",
"10.00" |
Specifying the number of columns explicitly, with | |
| Expression: | let $input := string-join((
"date, name, amount, currency, original amount",
"2023-07-19,Bob, 10.00, USD, 13.99",
"2023-07-20,Alice, 15.00",
"2023-07-20,Charlie, 15.00, GBP, 11.99, extra data"
), char('\n'))
let $options := {
"header": false(),
"select-columns": 1 to 5,
"trim-whitespace" :true()
}
let $result := parse-csv($input, $options)
return (
$result => $display(),
$result?get(4, 3)
) |
| Result: | {
"columns": (),
"column-index": {},
"rows": (
[ "date", "name", "amount", "currency", "original amount" ],
[ "2023-07-19", "Bob", "10.00", "USD", "13.99" ],
[ "2023-07-20", "Alice", "15.00", "", "" ],
[ "2023-07-20", "Charlie", "15.00", "GBP", "11.99" ]
),
"get": "(: function :)"
},
"15.00" |
Specifying the number of columns with a number and | |
| Expression: | let $input := string-join((
"date,name,city,amount,currency,original amount,note",
"2023-07-19,Bob,Berlin,10.00,USD,13.99",
"2023-07-20,Alice,Aachen,15.00",
"2023-07-20,Charlie,Celle,15.00,GBP,11.99,cake,not a lie"
), char('\n'))
let $options := { "header": true(), "select-columns": 1 to 6 }
let $result := parse-csv($input, $options)
return (
$result => $display(),
$result?get(3, "original amount")
) |
| Result: | {
"columns": ("date", "name", "city",
"amount", "currency", "original amount"),
"column-index": {
"date": 1, "name": 2, "city": 3, "amount": 4,
"currency": 5, "original amount": 6
},
"rows": (
[ "2023-07-19", "Bob", "Berlin", "10.00", "USD", "13.99"],
[ "2023-07-20", "Alice", "Aachen", "15.00", "", ""],
[ "2023-07-20", "Charlie", "Celle", "15.00", "GBP", "11.99"]
),
"get": "(: function :)"
},
"11.99" |
Reads an external resource containing CSV, and returns the results as a record containing information about the names in the header, as well as the data itself.
fn:csv-doc( | ||
$source | as , | |
$options | as | := {} |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on executable base URI.
If the second argument is omitted or anthe empty sequence, the result is the same as calling the two-argument form with anthe empty map as the value of the $options argument.
The effect of the two-argument function call fn:csv-doc($H, $M)is equivalent to the function composition fn:unparsed-binary($H) => fn:parse-csv($M).
If $source is the empty sequence, the function returns the empty sequence.
Note:
The ability to access external resources depends on whether the calling code is trustedXP.
The function may raise any error defined for the fn:unparsed-text or fn:parse-csv functions.
The fn:csv-to-xml function returns an XDM node tree representing the CSV data. Following is a CSV text and the XML serialization of the corresponding node tree.
Name,Date,Amount Alice,2023-07-14,1.23 Bob,2023-07-14,2.34
<csv xmlns="http://www.w3.org/2005/xpath-functions">
<columns>
<column>Name</column>
<column>Date</column>
<column>Amount</column>
</columns>
<rows>
<row>
<field column="Name">Alice</field>
<field column="Date">2023-07-14</field>
<field column="Amount">1.23</field>
</row>
<row>
<field column="Name">Bob</field>
<field column="Date">2023-07-14</field>
<field column="Amount">2.34</field>
</row>
</rows>
</csv>If no non-empty column names are available, then the columns element and all column attributes are absent. If non-empty column names are available for some columns but not for others, then (a) anthe empty column element is included within the columns element if and only if there is a subsequent column with a non-empty name, and (b) the column attribute for the corresponding field elements is absent.
For example (when no column names are available):
<csv xmlns="http://www.w3.org/2005/xpath-functions">
<rows>
<row>
<field>Name</field>
<field>Date</field>
<field>Amount</field>
</row>
<row>
<field>Alice</field>
<field>2023-07-14</field>
<field>1.23</field>
</row>
<row>
<field>Bob</field>
<field>2023-07-14</field>
<field>2.34</field>
</row>
</rows>
</csv>An XSD 1.0 schema for the XML representation is provided in D.3 Schema for the result of fn:csv-to-xml.
Parses CSV data supplied as a string, returning the results as an XML document, as described by 17.5.9 Representing CSV data as XML.
fn:csv-to-xml( | ||
$value | as , | |
$options | as | := {} |
) as | ||
This function is deterministic, context-dependent, and focus-independent. It depends on executable base URI.
The arguments have the same meaning, and are subject to the same constraints, as the arguments of fn:parse-csv.
If $value is the empty sequence, the function returns the empty sequence.
In other cases, the effect of the function is equivalent to the result of the following XQuery expression (where $options is anthe empty map if the argument is not supplied):
let $parsedCSV := parse-csv($value, $options)
let $colNames := $parsedCSV?columns
return document {
<csv xmlns="http://www.w3.org/2005/xpath-functions"> {
if (exists($colNames)) {
<columns>{ $colNames ! <column>{ . }</column> }</columns>
},
<rows>{
for $row in $parsedCSV?rows
return <row>{
for member $field at $col in $row
return <field>{
if ($colnames[$col]) {
attribute column { $colnames[$col] }
},
$field
}</field>
}</row>
}</rows>
}</csv>
}The elements in the returned XML are in the namespace http://www.w3.org/2005/xpath-functions; the namespace prefix that is used (or its absence) is implementation-dependent.
If the function is called twice with the same arguments, it is implementation-dependent whether the two calls return the same element node or distinct (but deep equal) element nodes. In this respect it is nondeterministic with respect to node identity.
The base URI of the element nodes in the result is implementation-dependent.
A schema is defined for the structure of the returned document: see D.3 Schema for the result of fn:csv-to-xml.
The result of the function will always be such that validation against this schema would succeed. However, it is implementation-defined whether the result is typed or untyped, that is, whether the elements and attributes in the returned tree have type annotations that reflect the result of validating against this schema.
See fn:parse-csv.
| Variables | |
|---|---|
let $crlf := char('\r') || char('\n') | |
let $csv-string := `name,city{ $crlf }Bob,Berlin{ $crlf }Alice,Aachen{ $crlf }` | |
let $csv-uneven-cols := concat(
`date,name,city,amount,currency,original amount,note{ $crlf }`,
`2023-07-19,Bob,Berlin,10.00,USD,13.99{ $crlf }`,
`2023-07-20,Alice,Aachen,15.00{ $crlf }`,
`2023-07-20,Charlie,Celle,15.00,GBP,11.99,cake,not a lie{ $crlf }`
) | |
An empty CSV with default column extraction (false): | |
| Expression: | csv-to-xml(()) |
|---|---|
| Result: | () |
| Expression: | csv-to-xml("") |
| Result: | <csv xmlns="http://www.w3.org/2005/xpath-functions"> <rows/> </csv> (with whitespace added for legibility) |
| Expression: | csv-to-xml(char('\n')) |
| Result: | <csv xmlns="http://www.w3.org/2005/xpath-functions">
<rows>
<row/>
</rows>
</csv>(with whitespace added for legibility) |
An empty CSV with header extraction: | |
| Expression: | csv-to-xml("", { "header": true() }) |
| Result: | <csv xmlns="http://www.w3.org/2005/xpath-functions"> <rows/> </csv> (with whitespace added for legibility) |
An empty CSV with explicit column names: | |
| Expression: | csv-to-xml("", { "header": ("name", "", "city") }) |
| Result: | <csv xmlns="http://www.w3.org/2005/xpath-functions">
<columns>
<column>name</column>
<column/>
<column>city</column>
</columns>
<rows/>
</csv>(with whitespace added for legibility) |
With defaults for delimiters and quotes, recognizing headers: | |
| Expression: | csv-to-xml($csv-string, { "header": true() }) |
| Result: | <csv xmlns="http://www.w3.org/2005/xpath-functions">
<columns>
<column>name</column>
<column>city</column>
</columns>
<rows>
<row>
<field column="name">Bob</field>
<field column="city">Berlin</field>
</row>
<row>
<field column="name">Alice</field>
<field column="city">Aachen</field>
</row>
</rows>
</csv>(with whitespace added for legibility) |
Filtering columns | |
| Expression: | csv-to-xml(
$csv-uneven-cols,
{ "header": true(), "select-columns": (2, 1, 4) }
) |
| Result: | <csv xmlns="http://www.w3.org/2005/xpath-functions">
<columns>
<column>name</column>
<column>date</column>
<column>amount</column>
</columns>
<rows>
<row>
<field column="name">Bob</field>
<field column="date">2023-07-19</field>
<field column="amount">10.00</field>
</row>
<row>
<field column="name">Alice</field>
<field column="date">2023-07-20</field>
<field column="amount">15.00</field>
</row>
<row>
<field column="name">Charlie</field>
<field column="date">2023-07-20</field>
<field column="amount">15.00</field>
</row>
</rows>
</csv>(with whitespace added for legibility) |
Ragged rows | |
| Expression: | csv-to-xml(
$csv-uneven-cols,
{ "header": true() }
) |
| Result: | <csv xmlns="http://www.w3.org/2005/xpath-functions">
<columns>
<column>date</column>
<column>name</column>
<column>city</column>
<column>amount</column>
<column>currency</column>
<column>original amount</column>
<column>note</column>
</columns>
<rows>
<row>
<field column="date">2023-07-19</field>
<field column="name">Bob</field>
<field column="city">Berlin</field>
<field column="amount">10.00</field>
<field column="currency">USD</field>
<field column="original amount">13.99</field>
</row>
<row>
<field column="date">2023-07-20</field>
<field column="name">Alice</field>
<field column="city">Aachen</field>
<field column="amount">15.00</field>
</row>
<row>
<field column="date">2023-07-20</field>
<field column="name">Charlie</field>
<field column="city">Celle</field>
<field column="amount">15.00</field>
<field column="currency">GBP</field>
<field column="original amount">11.99</field>
<field column="note">cake</field>
<field>not a lie</field>
</row>
</rows>
</csv>(with whitespace added for legibility) |
Trimming rows to constant width | |
| Expression: | csv-to-xml(
$csv-uneven-cols,
{ "header": true(), "trim-rows": true() }
) |
| Result: | <csv xmlns="http://www.w3.org/2005/xpath-functions">
<columns>
<column>date</column>
<column>name</column>
<column>city</column>
<column>amount</column>
<column>currency</column>
<column>original amount</column>
<column>note</column>
</columns>
<rows>
<row>
<field column="date">2023-07-19</field>
<field column="name">Bob</field>
<field column="city">Berlin</field>
<field column="amount">10.00</field>
<field column="currency">USD</field>
<field column="original amount">13.99</field>
<field column="note"/>
</row>
<row>
<field column="date">2023-07-20</field>
<field column="name">Alice</field>
<field column="city">Aachen</field>
<field column="amount">15.00</field>
<field column="currency"/>
<field column="original amount"/>
<field column="note"/>
</row>
<row>
<field column="date">2023-07-20</field>
<field column="name">Charlie</field>
<field column="city">Celle</field>
<field column="amount">15.00</field>
<field column="currency">GBP</field>
<field column="original amount">11.99</field>
<field column="note">cake</field>
</row>
</rows>
</csv>(with whitespace added for legibility) |
Specifying a fixed number of columns | |
| Expression: | csv-to-xml(
$csv-uneven-cols,
{ "header": true(), "select-columns": 1 to 6 }
) |
| Result: | <csv xmlns="http://www.w3.org/2005/xpath-functions">
<columns>
<column>date</column>
<column>name</column>
<column>city</column>
<column>amount</column>
<column>currency</column>
<column>original amount</column>
</columns>
<rows>
<row>
<field column="date">2023-07-19</field>
<field column="name">Bob</field>
<field column="city">Berlin</field>
<field column="amount">10.00</field>
<field column="currency">USD</field>
<field column="original amount">13.99</field>
</row>
<row>
<field column="date">2023-07-20</field>
<field column="name">Alice</field>
<field column="city">Aachen</field>
<field column="amount">15.00</field>
<field column="currency"/>
<field column="original amount"/>
</row>
<row>
<field column="date">2023-07-20</field>
<field column="name">Charlie</field>
<field column="city">Celle</field>
<field column="amount">15.00</field>
<field column="currency">GBP</field>
<field column="original amount">11.99</field>
</row>
</rows>
</csv>(with whitespace added for legibility) |
The following functions allow dynamic loading and evaluation of XQuery queries, XSLT stylesheets, and XPath binary operators.
| Function | Meaning |
|---|---|
fn:load-xquery-module | Provides access to the public functions and global variables of a dynamically loaded XQuery library module. |
fn:transform | Invokes a transformation using a dynamically loaded XSLT stylesheet. |
fn:op | Returns a function whose effect is to apply a supplied binary operator to two arguments. |
Changes in 4.0 (next | previous)
It has been clarified that loading a module has no effect on the static or dynamic context of the caller. [Issue 725 PR 727 10 October 2023]
The return type is now specified more precisely. [Issue 883 PR 1072 19 March 2024]
A new option is provided to allow the content of the loaded module to be supplied as a string. [Issue 1329 PR 1333 22 July 2024]
Provides access to the public functions and global variables of a dynamically loaded XQuery library module.
fn:load-xquery-module( | ||
$module-uri | as , | |
$options | as | := {} |
) as load-xquery-module-record | ||
This function is deterministic, context-dependent, and focus-dependent.
The function loads an implementation-defined set of modules having the target namespace $module-uri.
If the second argument is omitted or anthe empty sequence, the result is the same as calling the two-argument form with anthe empty map as the value of the $options argument.
The $options argument can be used to control the way in which the function operates. The option parameter conventions apply.
If the query module is retrieved as an external resource, this is subject to the trust levelXP of the calling code. In addition, the ability of the loaded query module to access additional resources is subject to the value of the supplied trusted option.
Note:
Versions of XQuery prior to 4.0 do not define any constraints on access to external resources. Many XQuery implementations, however, provide such mechanisms. An implementation of fn:load-xquery-module that allows execution of an untrustedXP query must ensure that the ability of that query to access resources is appropriately restricted, regardless of the version of XQuery in use.
record( | |
xquery-version? | as xs:decimal, |
trusted? | as xs:boolean, |
location-hints? | as xs:string*, |
content? | as xs:string?, |
context-item? | as item()?, |
variables? | as map(xs:QName, item()*), |
vendor-options? | as map(xs:QName, item()*) |
) | |
| Key | Value | Meaning |
|---|---|---|
| The minimum level of the XQuery language that the processor must support.
| |
| Indicates whether the returned query module is trusted to access external resources. This applies both to resources statically referenced in the query module (such as imported schemas and imported library modules), and to resources accessed dynamically by invoking functions in the retrieved module, for example by use of the fn:doc function.
| |
true | The loaded query has the same level of trust as the caller, and may therefore access all external resources available to the caller. | |
false | The functions and variables in the returned XQuery module are untrustedXP, and are therefore unable to access external resources unless these have been made explicitly available by a trusted caller. | |
| A sequence of URIs (in the form of xs:string values) which may be used or ignored in an implementation-defined way.
| |
| The content of the query module as a string. When this option is used, the location-hints option is ignored. The static base URI of the dynamically loaded module is the same as the executable base URIXP of the caller.
| |
| The item to be used as the initial context item when evaluating global variables in the library module. Supplying anthe empty sequence is equivalent to omitting the entry from the map, and indicates the absence of a context item. If the library module specifies a required type for the context item, then the supplied value must conform to this type, without conversion.
| |
| Values for external variables defined in the library module. Values must be supplied for external variables that have no default value, and may be supplied for external variables that do have a default value. The supplied value must conform to the required type of the variable, without conversion. The map contains one entry for each external variable: the key is the variable’s name, and the associated value is the variable’s value. The option parameter conventions do not apply to this contained map.
| |
| Values for vendor-defined configuration options for the XQuery processor used to process the request. The key is the name of an option, expressed as a QName: the namespace URI of the QName should be a URI controlled by the vendor of the XQuery processor. The meaning of the associated value is implementation-defined. Implementations should ignore options whose names are in an unrecognized namespace. The option parameter conventions do not apply to this contained map.
| |
The result of the function is a map R with two entries, as defined in 18.1 Record fn:load-xquery-module-record.
The static and dynamic context of the library module are established according to the rules in C Context Components XQ31.
It is implementation-defined whether constructs in the library module are evaluated in the same execution scope as the calling module.
The library module that is loaded may import other modules using an import module declaration. The result of fn:load-xquery-module does not include global variables or functions declared in such a transitively imported module. However, the options map supplied in the function call may (and if no default is defined, must) supply values for external variables declared in transitively loaded library modules.
The library module that is loaded may import schema declarations using an import schema declaration. It is implementation-defined whether schema components in the in-scope schema definitions of the calling module are automatically added to the in-scope schema definitions of the dynamically loaded module. The in-scope schema definitions of the calling and called modules must be consistent, according to the rules defined in 2.2.5 Consistency Constraints XQ31.
Where nodes are passed to or from the dynamically loaded module, for example as an argument or result of a function, they should if possible retain their node identity, their base URI, their type annotations, and their relationships to all other nodes in the containing tree (including ancestors and siblings). If this is not possible, for example because the only way of passing nodes to the chosen XQuery implementation is by serializing and re-parsing, then a node may be passed in the form of a deep copy, which may lose information about the identity of the node, about its ancestors and siblings, about its base URI, about its type annotations, and about its relationships to other nodes passed across the interface.
If $module-uri is a zero length string, a dynamic error is raised [err:FOQM0001].
If the implementation is not able to find a library module with the specified target namespace, an error is raised [err:FOQM0002].
If a static error (including a statically detected type error) is encountered when processing the library module, a dynamic error is raised [err:FOQM0003].
If a value is supplied for the initial context item or for an external variable and the value does not conform to the required type declared in the dynamically loaded module, a dynamic error is raised [err:FOQM0005].
If no suitable XQuery processor is available, a dynamic error is raised [err:FOQM0006]. This includes (but is not limited to) the following cases:
No XQuery processor is available;
Use of the function has been disabled;
No XQuery processor supporting the requested version of XQuery is available;
No XQuery processor supporting the optional Module Feature is available.
If a dynamic error (including a dynamically detected type error) is encountered when processing the module (for example, when evaluating its global variables), the dynamic error is returned as is.
If a function declaration F in the loaded module declares (say) four parameters of which one is optional, its arity range will be from 3 to 4, so the result will include two function items corresponding to F#3 and F#4. In the lower-arity function item, F#3, the fourth parameter will take its default value. If the expression that initializes the default value is context sensitive, the static and dynamic context for its evaluation are the static and dynamic contexts of the fn:load-xquery-module function call itself.
As with all other functions in this specification, conformance requirements depend on the host language. For example, a host language might specify that provision of this function is optional, or that it is excluded entirely, or that implementations are required to support XQuery modules using a specified version of XQuery.
Even where support for this function is mandatory, it is recommended for security reasons that implementations should provide a user option to disable its use, or to disable aspects of its functionality.
The load-xquery-module function does not modify the static or dynamic context. Functions and variables from the loaded module become available within the result returned by the function, but they are not added to the static or dynamic context of the caller. This means, for example, that function-lookup will not locate functions from the loaded module.
| Expression: | let $expr := "2 + 2"
let $module := `
xquery version "4.0";
module namespace dyn="http://example.com/dyn";
declare %public variable $dyn:value := { $expr };
`
let $exec := load-xquery-module(
"http://example.com/dyn",
{ 'content':$module }
)
let $variables := $exec?variables
return $variables( #Q{http://example.com/dyn}value ) |
|---|---|
| Result: | 4 |
Invokes a transformation using a dynamically loaded XSLT stylesheet.
fn:transform( | ||
$options | as | |
) as | ||
This function is nondeterministic, context-dependent, and focus-independent.
This function loads an XSLT stylesheet and invokes it to perform a transformation.
The inputs to the transformation are supplied in the form of a map. The option parameter conventions apply to this map; they do not apply to any nested map unless otherwise specified.
The function first identifies the requested XSLT version, as follows:
If the xslt-version option is present, the requested XSLT version is the value of that option.
Otherwise, the requested XSLT version is the value of the [xsl:]version attribute of the outermost element in the supplied stylesheet or package.
The function then attempts to locate an XSLT processor that implements the requested XSLT version.
If a processor that implements the requested XSLT version is available, then it is used.
Otherwise, if a processor that implements a version later than the requested version is available, then it is used.
Otherwise, the function fails indicating that no suitable XSLT processor is available.
Note:
The phrase locate an XSLT processor includes the possibility of locating a software product and configuring it to act as an XSLT processor that implements the requested XSLT version.
If more than one XSLT processor is available under the above rules, then the one that is chosen may be selected according to the availability of requested features: see below.
Once an XSLT processor has been selected that implements a given version of XSLT, the processor follows the rules of that version of the XSLT specification. This includes any decision to operate in backwards or forwards compatibility mode. For example, if an XSLT 2.0 processor is selected, and the stylesheet specifies version="1.0", then the processor will operate in backwards compatibility mode; if the same processor is selected and the stylesheet specifies version="3.0", the processor will operate in forwards compatibility mode.
If the stylesheet to be executed is retrieved as an external resource, this is subject to the trust levelXP of the calling code. In addition, the ability of the loaded stylesheet to access additional resources is subject to the value of the supplied trusted option.
Note:
Versions of XSLT prior to 4.0 do not define any constraints on access to external resources. Many XSLT implementations, however, provide such mechanisms. An implementation of fn:transform that allows an XSLT processor to execute an untrustedXP stylesheet must ensure that the ability of that stylesheet to access resources is appropriately restricted, regardless of the version of XSLT.
The combinations of options that are relevant to each version of XSLT, other than xslt-version itself, are listed below. This is followed by a table giving the meaning of each option.
For invocation of an XSLT 1.0 processor (see [XSL Transformations (XSLT) Version 1.0]), the supplied options must include all of the following (if anything else is present, it is ignored):
The stylesheet, provided by supplying exactly one of the following:
stylesheet-locationstylesheet-nodestylesheet-text
The source tree, provided as the value of the source-node option.
Zero or more of the following additional options:
stylesheet-base-uristylesheet-params(defaults to anthe empty map)initial-mode(defaults to the stylesheet’s default mode)delivery-format(defaults todocument)serialization-params(defaults to anthe empty map)enable-messages(default is implementation-defined)requested-properties(default is anthe empty map)trusted(default isfalse)vendor-options(defaults to anthe empty map)cache(default is implementation-defined)
For invocation of an XSLT 2.0 processor (see [XSL Transformations (XSLT) Version 2.0]), the supplied options must include all of the following (if anything else is present, it is ignored):
The stylesheet, provided by supplying exactly one of the following:
stylesheet-locationstylesheet-nodestylesheet-text
Invocation details, as exactly one of the following:
For apply-templates invocation, all of the following:
source-node
Optionally, initial-mode (defaults to the stylesheet’s default mode)
For call-template invocation, all of the following:
initial-template
Optionally, source-node
Zero or more of the following additional options:
stylesheet-base-uristylesheet-params(defaults to anthe empty map)base-output-uri(defaults to absent)delivery-format(defaults todocument)serialization-params(defaults to anthe empty map)enable-messages(default is implementation-defined)enable-trace(default is implementation-defined)requested-properties(default is anthe empty map)trusted(default isfalse)vendor-options(defaults to anthe empty map)cache(default is implementation-defined)
For invocation of an XSLT 3.0 or XSLT 4.0 processor (see [XSL Transformations (XSLT) Version 4.0]), the supplied options must include all of the following (if anything else is present, it is ignored):
The stylesheet, provided either by supplying exactly one of the following:
stylesheet-locationstylesheet-nodestylesheet-text
Or by supplying exactly one of the following:
package-locationpackage-nodepackage-textpackage-nameplus optionallypackage-version
Invocation details, as exactly one of the following combinations:
For apply-templates invocation, all of the following:
Exactly one of source-node, source-location, or initial-match-selection
Optionally, initial-mode
Optionally, template-params
Optionally, tunnel-params
For call-template invocation using an explicit template name, all of the following:
initial-template
Optionally, template-params
Optionally, tunnel-params
Optionally, source-node
For call-template invocation using the defaulted template name xsl:initial-template, all of the following:
Optionally, template-params
Optionally, tunnel-params
Note:
If the source-node or source-locationoption is present and initial-template is absent, then apply-templates invocation will be used. To use call-template invocation on the template named xsl:initial-template while also supplying a context item for use when evaluating global variables, either (a) supply the context item using the global-context-item option, or (b) supply source-node, and set the initial-template option explicitly to the QName xsl:initial-template
For call-function invocation, all of the following:
initial-function
function-params
Note:
The invocation method can be determined as the first of the following which applies:
If initial-function is present, then call-function invocation.
If initial-template is present, then call-template invocation.
If source-node or source-location or initial-match-selection is present, then apply-templates invocation.
Otherwise, call-template invocation using the default entry point xsl:initial-template.
Zero or more of the following additional options:
stylesheet-base-uristatic-params(defaults to anthe empty map)stylesheet-params(defaults to anthe empty map)global-context-item(defaults to absent)base-output-uri(defaults to absent)delivery-formatserialization-params(defaults to anthe empty map)enable-assertions(default isfalse)enable-messages(default is implementation-defined)enable-trace(default is implementation-defined)requested-properties(default is anthe empty map)trusted(default isfalse)vendor-options(defaults to anthe empty map)cache(default is implementation-defined)
The meanings of each option are defined in the table below.
record( | |
base-output-uri? | as xs:string, |
cache? | as xs:boolean, |
delivery-format? | as xs:string, |
enable-assertions? | as xs:boolean, |
enable-messages? | as xs:boolean, |
enable-trace? | as xs:boolean, |
function-params? | as array(item()*), |
global-context-item? | as item(), |
initial-function? | as xs:QName, |
initial-match-selection? | as item()*, |
initial-mode? | as xs:QName, |
initial-template? | as xs:QName, |
package-name? | as xs:string, |
package-location? | as xs:string, |
package-node? | as node(), |
package-text? | as xs:string, |
package-version? | as xs:string, |
post-process? | as fn(xs:string, item()*) as item()*, |
requested-properties? | as map(xs:QName, xs:anyAtomicType), |
serialization-params? | as map(xs:anyAtomicType, item()*), |
source-location? | as node(), |
source-node? | as node(), |
static-params? | as map(xs:QName, item()*), |
stylesheet-base-uri? | as xs:string, |
stylesheet-location? | as xs:string, |
stylesheet-node? | as node(), |
stylesheet-params? | as map(xs:QName, item()*), |
stylesheet-text? | as xs:string, |
template-params? | as map(xs:QName, item()*), |
tunnel-params? | as map(xs:QName, item()*), |
trusted? | as xs:boolean, |
vendor-options? | as { xs:QName, item()* }, |
xslt-version? | as xs:decimal |
) | |
| Key | Applies to | Value | Meaning |
|---|---|---|---|
| 1.0, 2.0, 3.0, 4.0 | The URI of the principal result document; also used as the base URI for resolving relative URIs of secondary result documents. If the value is a relative reference, it is resolved against the executable base URIXP of the fn:transform function call.
| |
| 1.0, 2.0, 3.0, 4.0 | This option has no effect on the result of the transformation but may affect efficiency. The value true indicates an expectation that the same stylesheet is likely to be used for more than one transformation; the value false indicates an expectation that the stylesheet will be used once only.
| |
| 1.0, 2.0, 3.0, 4.0 | The manner in which the transformation results should be delivered. Applies both to the principal result document and to secondary result documents created using xsl:result-document.
| |
document | The result is delivered as a document node. | ||
serialized | The result is delivered as a string, representing the results of serialization. Note that (as with the fn:serialize function) the final encoding stage of serialization (which turns a sequence of characters into a sequence of octets) is either skipped, or reversed by decoding the octet stream back into a character stream. | ||
raw | The result of the initial template or function is returned as an arbitrary XDM value (after conversion to the declared type, but without wrapping in a document node, and without serialization): when this option is chosen, the returned map contains the raw result. | ||
file | The serialized result is written to persistent storage. This means that the fn:transform function has side-effects and becomes nondeterministic, so the option should be used with care, and the precise behavior may be implementation-defined. When this option is used, the URIs used for the base-output-uri and the URIs of any secondary result documents must be writable locations. | ||
| 3.0, 4.0 | Indicates whether any xsl:assert instructions in the stylesheet are to be evaluated.
| |
| 1.0, 2.0, 3.0, 4.0 | Indicates whether any xsl:message instructions in the stylesheet are to be evaluated. The destination and formatting of any such messages is implementation-defined.
| |
| 2.0, 3.0, 4.0 | Indicates whether any fn:trace functions in the stylesheet are to generate diagnostic messages. The destination and formatting of any such messages is implementation-defined.
| |
| 3.0, 4.0 | An array of values to be used as the arguments to the initial function call. The value is converted to the required type of the declared parameter using the function conversion rules.
| |
| 3.0, 4.0 | The value of the global context item, as defined in XSLT 3.0
| |
| 3.0, 4.0 | The name of the initial function to be called for call-function invocation. The arity of the function is inferred from the length of function-params.
| |
| 3.0, 4.0 | The value of the initial match selection, as defined in XSLT 3.0
| |
| 1.0, 2.0, 3.0, 4.0 | The name of the initial processing mode.
| |
| 2.0, 3.0, 4.0 | The name of a named template in the stylesheet to act as the initial entry point.
| |
| 3.0, 4.0 | The name of the top-level stylesheet package to be invoked (an absolute URI)
| |
| 3.0, 4.0 | The location of the top-level stylesheet package, as a relative or absolute URI
| |
| 3.0, 4.0 | A document or element node containing the top-level stylesheet package
| |
| 3.0, 4.0 | The top-level stylesheet package in the form of unparsed lexical XML.
| |
| 3.0, 4.0 | The version of the top-level stylesheet package to be invoked.
| |
| 1.0, 2.0, 3.0, 4.0 | A function that is used to post-process each result document of the transformation (both the principal result and secondary results), in whatever form it would otherwise be delivered (document, serialized, or raw). The first argument of the function is the key used to identify the result in the map return by the fn:transform function (for example, this will be the supplied base output URI in the case of the principal result, or the string “output” if no base output URI was supplied). The second argument is the actual value. The value that is returned in the result of the fn:transform function is the result of applying this post-processing. Note: If the implementation provides a way of writing or invoking functions with side-effects, this post-processing function might be used to save a copy of the result document to persistent storage. For example, if the implementation provides access to the EXPath File library [EXPath], then a serialized document might be written to filestore by calling the If the primary purpose of the post-processing function is achieved by means of such side-effects, and if the actual results are not needed by the caller of the Calls to
| |
| 1.0, 2.0, 3.0, 4.0 | The keys in the map are QNames that could legitimately be supplied in a call to the XSLT system-property function; the values in the map are the requested settings of the corresponding property. The boolean values true and false are equivalent to the string values yes and no. As a special case, setting a value for xsl:version has no effect, because of the potential for conflict with other options. For example:
xsl:supports-dynamic-evaluation to false is interpreted as an explicit request for a processor in which the value of the property is false. The effect if the requests cannot be precisely met is implementation-defined. In some cases it may be appropriate to ignore the request or to provide an alternative (for example, a later version of the product than the one requested); in other cases it may be more appropriate to raise an error [err:FOXT0001] indicating that no suitable XSLT processor is available.
| |
| 1.0, 2.0, 3.0, 4.0 | Serialization parameters for the principal result document. The supplied map follows the same rules that apply to a map supplied as the second argument of fn:serialize.
| |
| 1.0, 2.0, 3.0, 4.0 | When source-location is supplied then it is expected to be an absolute or relative URI identifying an unparsed XML document. If relative, it is resolved against the static base URI of the fn:transform function call. The document at this location is parsed, and the document node acts as the initial-match-selection, that is, stylesheet execution starts by applying templates to this node. If the initial mode is streamable and a streaming XSLT 3.0 or XSLT 4.0 processor is used, then the supplied document is processed in streaming mode.
| |
| 1.0, 2.0, 3.0, 4.0 | When source-node is supplied then the global-context-item (the context item for evaluating global variables) is the root of the tree containing the supplied node. In addition, for apply-templates invocation, the source-node acts as the initial-match-selection, that is, stylesheet execution starts by applying templates to this node.
| |
| 3.0, 4.0 | The values of static parameters defined in the stylesheet; the keys are the names of the parameters, and the associated values are their values. The value is converted to the required type of the declared parameter using the coercion rules.
| |
| 1.0, 2.0, 3.0, 4.0 | A string intended to be used as the static base URI of the principal stylesheet module. This value must be used if no other static base URI is available. If the supplied stylesheet already has a base URI (which will generally be the case if the stylesheet is supplied using stylesheet-node or stylesheet-location) then it is implementation-defined whether this parameter has any effect. If the value is a relative reference, it is resolved against the executable base URIXP of the fn:transform function call.
| |
| 1.0, 2.0, 3.0, 4.0 | URI that can be used to locate the principal stylesheet module. If relative, it is resolved against the executable base URIXP of the fn:transform function call. The value also acts as the default for stylesheet-base-uri.
| |
| 1.0, 2.0, 3.0, 4.0 | Root of the tree containing the principal stylesheet module, as a document or element node. The base URI of the node acts as the default for stylesheet-base-uri.
| |
| 1.0, 2.0, 3.0, 4.0 | A map holding values to be supplied for stylesheet parameters. The keys are the parameter names; the values are the corresponding parameter values. The values are converted if necessary to the required type using the coercion rules. The default is anthe empty map.
| |
| 1.0, 2.0, 3.0, 4.0 | The principal stylesheet module in the form of unparsed lexical XML.
| |
| 3.0, 4.0 | The values of non-tunnel parameters to be supplied to the initial template, used with both apply-templates and call-template invocation. Each value is converted to the required type of the declared parameter using the coercion rules.
| |
| 3.0, 4.0 | The values of tunnel parameters to be supplied to the initial template, used with both apply-templates and call-template invocation. Each value is converted to the required type of the declared parameter using the coercion rules.
| |
| Indicates whether the target stylesheet is trusted to access external resources. This applies both to resources statically referenced by the stylesheet (for example using xsl:include, xsl:import, xsl:use-package, or xsl:import-schema), and to resources accessed dynamically by executing the retrieved stylesheet, for example by use of the fn:doc or unparsed-text function.
| ||
true | The loaded stylesheet has the same level of trust as the caller, and may therefore access all external resources available to the caller. | ||
false | The loaded stylesheet is untrustedXP, and is therefore unable to access external resources unless these have been made explicitly available by a trusted caller. | ||
| 1.0, 2.0, 3.0, 4.0 | Values for vendor-defined configuration options for the XSLT processor used to process the request. The key is the name of an option, expressed as a QName: the namespace URI of the QName should be a URI controlled by the vendor of the XSLT processor. The meaning of the associated value is implementation-defined. Implementations should ignore options whose names are in an unrecognized namespace. Default is anthe empty map.
| |
| 1.0, 2.0, 3.0, 4.0 | The minimum level of the XSLT language that the processor must support.
| |
The result of the transformation is returned as a map. There is one entry in the map for the principal result document, and one for each secondary result document. The key is a URI in the form of an xs:string value. The key for the principal result document is the base output URI if specified, or the string "output" otherwise. The key for secondary result documents is the URI of the document, as an absolute URI. The associated value in each entry depends on the requested delivery format. If the delivery format is document, the value is a document node. If the delivery format is serialized, the value is a string containing the serialized result.
Where nodes are passed to or from the transformation, for example as the value of a stylesheet parameter or the result of a function, they should if possible retain their node identity, their base URI, their type annotations, and their relationships to all other nodes in the containing tree (including ancestors and siblings). If this is not possible, for example because the only way of passing nodes to the chosen XSLT implementation is by serializing and re-parsing, then a node may be passed in the form of a deep copy, which may lose information about the identity of the node, about its ancestors and siblings, about its base URI, about its type annotation, and about its relationships to other nodes passed across the interface.
It is implementation-defined whether the XSLT transformation is executed within the same execution scope as the calling code.
The function is nondeterministic in that it is implementation-dependent whether running the function twice against the same inputs produces identical results. The results of two invocations may differ in the identity of any returned nodes; they may also differ in other respects, for example because the value of fn:current-dateTime is different for the two invocations, or because the contents of external documents accessed using fn:doc or xsl:source-document change between one invocation and the next.
A dynamic error is raised [err:FOXT0001] if the transformation cannot be invoked because no suitable XSLT processor is available. This includes (but is not limited to) the following cases:
No XSLT processor is available;
No XSLT processor supporting the requested version of XSLT is available;
The XSLT processor API does not support some requested feature (for example, the ability to supply tunnel parameters externally);
A dynamic error is raised [err:FOXT0002] if an error is detected in the supplied parameters (for example if two mutually exclusive parameters are supplied).
If a static or dynamic error is reported by the XSLT processor, this function fails with a dynamic error, retaining the XSLT error code.
A dynamic error is raised [err:FOXT0003] if the XSLT transformation invoked by a call on fn:transform fails with a static or dynamic error, and no more specific error code is available.
Note:
XSLT 1.0 does not define any error codes, so this is the likely outcome with an XSLT 1.0 processor. XSLT 2.0 and 3.0 do define error codes, but some APIs do not expose them. If multiple errors are signaled by the transformation (which is most likely to happen with static errors) then the error code should where possible be that of one of these errors, chosen arbitrarily; the processor may make details of additional errors available to the application in an implementation-defined way.
A dynamic error is raised [err:FOXT0004] if the use of this function (or of selected options) has been externally disabled, for example for security reasons.
A dynamic error is raised [err:FOXT0006] if the transformation produces output containing characters available only in XML 1.1, and the calling processor cannot handle such characters.
Recursive use of the fn:transform function may lead to catastrophic failures such as non-termination or stack overflow. No error code is assigned to such conditions, since they cannot necessarily be detected by the processor.
As with all other functions in this specification, conformance requirements depend on the host language. For example, a host language might specify that provision of this function is optional, or that it is excluded entirely, or that implementations are required to support a particular set of values for the xslt-version parameter.
Even where support for this function is mandatory, it is recommended for security reasons that implementations should provide a user option to disable its use, or to disable aspects of its functionality such as the ability to write to persistent resources.
The following example loads a stylesheet from the location | |
let $result := transform({
"stylesheet-location": "render.xsl",
"source-node": doc('test.xml')
})
return $result?output//body |
The functions in this section deliver information about schema types (including simple types and complex types). These may represent built-in types (such as xs:dateTime), user-defined types found in the static context (typically because they appear in an imported schema), or types used as type annotations on schema-validated nodes.
For more information on schema types, see 1.8.2 Schema Type Hierarchy. The properties of a schema type are described in terms of the properties of a Simple Type Definition or Complex Type Definition component as described in 3.16.1 The Simple Type Definition Schema Component XS11-1 and 3.4.1 The Complex Type Definition Schema Component XS11-1 respectively. Not all properties are exposed.
The structured representation of a schema type is described in 19.1.1 Record fn:schema-type-record.
Note:
Simple properties of a schema type that can be expressed as strings or booleans are represented in this record structure directly as atomic field values, while complex properties whose values are themselves types (for example, base-type and primitive-type) are represented as functions. This is done partly to make it easier for implementations to compute complex properties on demand rather than in advance, and partly to ensure that the overall structure is always acyclic. For example, the primitive type of xs:decimal is itself xs:decimal, and if this were represented as a field value without a guarding function, serialization of the map using the JSON output method would not terminate.
| Function | Meaning |
|---|---|
fn:schema-type | Returns a record containing information about a named schema type in the static context. |
fn:type-of | Returns information about the type of a value, as a string. |
fn:atomic-type-annotation | Returns a record containing information about the type annotation of an atomic value. |
fn:node-type-annotation | Returns a record containing information about the type annotation of an element or attribute node. |
This record type represents the properties of a simple or complex type in a schema.
| Name | Meaning |
|---|---|
| The name of the type. Empty in the case of an anonymous type. Corresponds to {name}XS11-1 and {target namespace}XS11-1 in the XSD component model for simple and complex type components.
|
| True for a simple type, false for a complex type.
|
| Function item returning the base type (the type from which this type is derived by restriction or extension). The function is always present, and returns anthe empty sequence in the case of the type
|
| For an atomic type, a function item returning the primitive type from which this type is ultimately derived. Corresponds to the {primitive type definition}XS11-1 in the XSD component model for simple types. Absent if the type is non atomic, or if it is the simple type
|
| For a simple type, one of
|
| For a simple type with variety
|
| For a complex type with variety
|
| For a generalized atomic typeXP, a function item that can be called to establish whether the supplied atomic item is an instance of this type. In all other cases, absent.
|
| For a simple type, a function item that can be used to construct instances of this type. In the case of a named type that is present in the dynamic context, the result is the same function as returned by
|
| The record type is extensible (it may contain additional fields beyond those listed). |
Returns a record containing information about a named schema type in the static context.
fn:schema-type( | ||
$name | as | |
) as schema-type-record? | ||
This function is deterministic, context-dependent, and focus-independent.
If the static context (specifically, the in-scope schema typesXP) includes a schema type whose name matches $name, the function returns a schema-type-record containing information about that schema type. If not, it returns anthe empty sequence.
| Expression: |
|
|---|---|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
| Expression: |
|
| Result: |
|
In this document, as well as in [XQuery 4.0: An XML Query Language] and [XML Path Language (XPath) 4.0], the phrase “an error is raised” is used. Raising an error is equivalent to calling the fn:error function defined in this section with the provided error code. Except where otherwise specified, errors defined in this specification are dynamic errors. Some errors, however, are classified as type errors. Type errors are typically used where the presence of the error can be inferred from knowledge of the type of the actual arguments to a function, for example with a call such as fn:string(fn:abs#1). Host languages may allow type errors to be reported statically if they are discovered during static analysis.
When function specifications indicate that an error is to be raised, the notation “[error code ]” is used to specify an error code. Each error defined in this document is identified by an xs:QName that is in the http://www.w3.org/2005/xqt-errors namespace, represented in this document by the err prefix. It is this xs:QName that is actually passed as an argument to the fn:error function. Calling this function raises an error. For a more detailed treatment of error handing, see 2.3.3 Handling Dynamic Errors XP31.
The fn:error function is a general function that may be called as above but may also be called from [XQuery 4.0: An XML Query Language] or [XML Path Language (XPath) 4.0] applications with, for example, an xs:QName argument.
Calling the fn:error function raises an application-defined error.
fn:error( | ||
$code | as | := (), |
$description | as | := (), |
$value | as | := . |
) as | ||
This function is nondeterministic, context-independent, and focus-independent.
This function never returns a value. Instead it always raises an error. The effect of the error is identical to the effect of dynamic errors raised implicitly, for example when an incorrect argument is supplied to a function.
The parameters to the fn:error function supply information that is associated with the error condition and that is made available to a caller that asks for information about the error. The error may be caught either by the host language (using a try/catch construct in XSLT or XQuery, for example), or by the calling application or external processing environment. The way in which error information is returned to the external processing environment is implementation-dependent.
There are three pieces of information that may be associated with an error.
The $code is an error code that distinguishes this error from others. It is an xs:QName; the namespace URI conventionally identifies the component, subsystem, or authority responsible for defining the meaning of the error code, while the local part identifies the specific error condition. The namespace URI http://www.w3.org/2005/xqt-errors is used for errors defined in this specification; other namespace URIs may be used for errors defined by the application.
If the external processing environment expects the error code to be returned as a URI or a string rather than as an xs:QName, then an error code with namespace URI NS and local part LP will be returned in the form NS#LP. The namespace URI part of the error code should therefore not include a fragment identifier.
If no value is supplied for the $code argument, or if the value supplied is anthe empty sequence, the effective value of the error code is fn:QName('http://www.w3.org/2005/xqt-errors', 'err:FOER0000').
The $description is a natural-language description of the error condition.
If no value is supplied for the $description argument, or if the value supplied is anthe empty sequence, then the effective value of the description is implementation-dependent.
The $value is an arbitrary value used to convey additional information about the error, and may be used in any way the application chooses.
If no value is supplied for the $value argument or if the value supplied is anthe empty sequence, then the effective value of the error object is implementation-dependent.
This function always raises a dynamic error. By default, it raises [err:FOER0000]
The value of the $description parameter may need to be localized.
Since the function never returns a value, the declared return type of item()* is a convenient fiction. It is relevant insofar as a function item such as error#1 may (as a consequence of function coercion) be supplied in contexts where a function with a more specific return type is required.
Any QName may be used as an error code; there are no reserved names or namespaces. The error is always classified as a dynamic error, even if the error code used is one that is normally used for static errors or type errors.
| Expression: |
|
|---|---|
| Result: | Raises error FOER0000. (This returns the URI |
| Expression: | error(
QName('http://www.example.com/HR', 'myerr:toohighsal'),
'Salary is too high'
) |
| Result: | Raises error myerr:toohighsal. (This returns |
Provides an execution trace intended to be used in debugging queries.
fn:trace( | ||
$input | as , | |
$label | as | := () |
) as | ||
This function is deterministic, context-independent, and focus-independent.
The function returns $input, unchanged.
In addition, the values of $input, typically serialized and converted to an xs:string, and $label (if supplied and non-empty) may be output to an implementation-defined destination.
Any serialization of the implementation’s trace output must not raise an error. This can be achieved (for example) by using a serialization method that can handle arbitrary input, such as the adaptive output method (see 10 Adaptive Output Method SER31).
The format of the trace output and its order are implementation-dependent. Therefore, the order in which the output appears is not predictable. This also means that if dynamic errors occur (whether or not they are caught using try/catch), it may be unpredictable whether any output is reported before the error occurs.
If the trace information is unrelated to a specific value, fn:message can be used instead.
Consider a situation in which a user wants to investigate the actual value passed to a function. Assume that in a particular execution, | |
The following two XPath expressions are identical, but only the second provides trace feedback to the user: | |
|
Outputs trace information and discards the result.
fn:message( | ||
$input | as , | |
$label | as | := () |
) as | ||
This function is deterministic, context-independent, and focus-independent.
Similar to fn:trace, the values of $input, typically serialized and converted to an xs:string, and $label (if supplied and non-empty) may be output to an implementation-defined destination.
In contrast to fn:trace, the function returns anthe empty sequence.
Any serialization of the implementation’s log output must not raise an error. This can e.g. be achieved by using a serialization method that can handle arbitrary input, such as the 10 Adaptive Output Method SER31.
The format of the log output and its order are implementation-dependent. Therefore, the order in which the output appears is not predictable. This also means that if dynamic errors occur (whether or not they are caught using try/catch), it may be unpredictable whether any output is logged before the error occurs.
The function can be used for debugging. It can also be helpful in productive environments, e.g. to store dynamic input and evaluations to log files.
The following two XPath expressions are identical, but only the second logs any feedback: | |
|
Constructor functions are used to convert a supplied value to a given type, and the name of the function is the same as the name of the target type. This section describes constructor functions corresponding to the following types:
Simple types (atomic types, union types, and list types as defined in [XML Schema Part 2: Datatypes Second Edition]), which are present in the static context either because they appear in the in-scope schema typesXP or because they appear as named item typesXP.
These constructor functions always take a single argument.
Record types defined as named item typesXP.
These take one argument for each named field of the record type. Constructor functions for record types are defined in 22.6 Constructor functions for named record types.
Constructor functions are defined for all user-defined named simple types, and for most built-in atomic, list, and union types. The only named simple types that have no constructor function are those that have no instances other than instances of their derived types: specifically, xs:anySimpleType, xs:anyAtomicType, and xs:NOTATION.
Each of the three built-in list types defined in [XML Schema Part 2: Datatypes Second Edition], namely xs:NMTOKENS, xs:ENTITIES, and xs:IDREFS, has an associated constructor function.
The function signatures are as follows:
xs:NMTOKENS( | ||
$value | as | := . |
) as | ||
xs:ENTITIES( | ||
$value | as | := . |
) as | ||
xs:IDREFS( | ||
$value | as | := . |
) as | ||
The semantics are equivalent to casting to the corresponding types from xs:string.
All three of these types have the facet minLength = 1 meaning that there must always be at least one item in the list. The return type, however, allows for the fact that when the argument to the function is anthe empty sequence, the result is anthe empty sequence.
Note:
In the case of atomic types, it is possible to use an expression such as xs:date(@date-of-birth) to convert an attribute value to an instance of xs:date, knowing that this will work both in the case where the attribute is already annotated as xs:date, and also in the case where it is xs:untypedAtomic. This approach does not work with list types, because it is not permitted to use a value of type xs:NMTOKEN* as input to the constructor function xs:NMTOKENS. Instead, it is necessary to use conditional logic that performs the conversion only in the case where the input is untyped: if (@x instance of attribute(*, xs:untypedAtomic)) then xs:NMTOKENS(@x) else data(@x)
There is a constructor function for the union type xs:numeric defined in [XQuery and XPath Data Model (XDM) 4.0]. The function signature is:
xs:numeric( | ||
$value | as | := . |
) as | ||
The semantics are determined by the rules in 23.3.7 Casting to union types. These rules have the effect that:
If the argument is an instance of xs:double, xs:float, or xs:decimal, then the result is an instance of the same primitive type, with the same value;
If the argument is an instance of xs:boolean, the result is the xs:double value 0.0e0 or 1.0e0;
If the argument is an instance of xs:string or xs:untypedAtomic, then:
If the value is in the lexical space of xs:double, the result will be the corresponding xs:double value;
Otherwise, a dynamic error [err:FORG0001] occurs;
Note:
The result will never be an instance of xs:float, xs:decimal, or xs:integer. This is because xs:double appears first in the list of member types of xs:numeric, and its lexical space subsumes the lexical space of the other numeric types. Thus, unlike XPath numeric literals, the result does not depend on the lexical form of the supplied value. The reason for this design choice is to retain compatibility with the function conversion rules: functions such as fn:abs and fn:round are declared to expect an instance of xs:numeric as their first or only argument, and compatibility with the function conversion rules defined in earlier versions of these specifications demands that when an untyped atomic item (or untyped node) is supplied as the argument, it is converted to an xs:double value even if its lexical form is that (say) of an integer.
In all other cases, a dynamic error [err:FORG0001] occurs.
In the case of an implementation that supports XSD 1.1, there is a constructor function associated with the built-in union type xs:error.
The function signature is as follows:
xs:error( | ||
$value | as | := . |
) as | ||
The semantics are equivalent to casting to the corresponding union type (see 23.3.7 Casting to union types).
Note:
Because xs:error has no member types, and therefore has an empty value space, casting will always fail with a dynamic error except in the case where the supplied argument is anthe empty sequence, in which case the result is also anthe empty sequence.
The error text provided with these errors is non-normative.
Raised when fn:apply is called and the arity of the supplied function is not the same as the number of members in the supplied array.
This error is raised whenever an attempt is made to divide by zero.
This error is raised whenever numeric operations result in an overflow or underflow.
This error is raised when an integer used to select a member of an array is outside the range of values for that array.
This error is raised when the $length argument to array:subarray is negative.
Raised when casting to xs:decimal if the supplied value exceeds the implementation-defined limits for the datatype.
Raised by fn:resolve-QName and fn:QName when a supplied value does not have the lexical form of a QName or URI respectively; and when casting to decimal, if the supplied value is NaN or Infinity.
Raised when casting to xs:integer if the supplied value exceeds the implementation-defined limits for the datatype.
Raised when multiplying or dividing a duration by a number, if the number supplied is NaN.
Raised when casting a string to xs:decimal if the string has more digits of precision than the implementation can represent (the implementation also has the option of rounding).
Raised by fn:codepoints-to-string if the input contains an integer that is not the codepoint of a permitted character.
Raised by any function that uses a collation if the requested collation is not recognized.
Raised by fn:normalize-unicode if the requested normalization form is not supported by the implementation.
Raised by functions such as fn:contains if the requested collation does not operate on a character-by-character basis.
Raised by fn:char if the supplied character name is not recognized, or if it represents a codepoint that is not a permitted character.
Raised when parsing CSV input if a syntax error in the input CSV is found.
Raised when parsing CSV input if the field-separator, record-separator, or quote-character option is set to an invalid value.
Raised when parsing CSV input if the same delimiter character is assigned to more than one role.
Raised by the function from the get entry of csv-columns-record, if its $key argument is an xs:string and is not one of the known column names.
Raised by fn:id, fn:idref, and fn:element-with-id if the node that identifies the tree to be searched is a node in a tree whose root is not a document node.
Raised by fn:doc, fn:collection, and fn:uri-collection to indicate that either the supplied URI cannot be dereferenced to obtain a resource, or the resource that is returned is not parseable as XML.
Raised by fn:doc, fn:collection, and fn:uri-collection to indicate that it is not possible to return a result that is guaranteed deterministic.
Raised by fn:collection and fn:uri-collection if the argument is not a valid xs:anyURI.
Raised (optionally) by fn:doc if the argument is not a valid xs:anyURI.
Raised by fn:parse-xml if the supplied string is not a well-formed and namespace-well-formed XML document; or if DTD validation is requested and the document is not valid against its DTD.
Raised by fn:parse-xml if DTD validation is requested and the supplied string has no DTD or is not valid against the DTD.
Raised when the xsd-validation option to fn:parse-xml is supplied, and the value is not one of the permitted values; for example if the option type Q{U}NNN is used, and Q{U}NNN does not identify a type in the static context.
Raised when the xsd-validation option to fn:parse-xml is set to a value other than skip, if the processor is not schema-aware.
Raised when fn:serialize is called and the processor does not support serialization, in cases where the host language makes serialization an optional feature.
Raised by fn:parse-html if the supplied string is not a well-formed HTML document.
Raised when the dtd-validation option to fn:parse-xml is set, if no validating XML parser is available. Note: it is recommended that all processors should support the dtd-validation option, but there may be environments (such as web browsers) where this is not practically feasible.
Raised by fn:parse-xml if XSD validation is requested and the XML document represented by the supplied string is not valid against the relevant XSD schema.
Raised by fn:xsd-validator if it is not possible to assemble a valid and consistent schema.
This error is raised if the decimal format name supplied to fn:format-number is not a valid QName, or if the prefix in the QName is undeclared, or if there is no decimal format in the static context with a matching name.
This error is raised if a decimal format value supplied to fn:format-number is not valid for the associated property, or if the properties of the decimal format resulting from a supplied map do not have distinct values.
This error is raised if the picture string supplied to fn:format-number or fn:format-integer has invalid syntax.
Raised when casting to date/time datatypes, or performing arithmetic with date/time values, if arithmetic overflow or underflow occurs.
Raised when casting to duration datatypes, or performing arithmetic with duration values, if arithmetic overflow or underflow occurs.
Raised by adjust-date-to-timezone and related functions if the supplied timezone is invalid.
Raised by civil-timezone if no timezone data is available for the given date/time and place.
Error code used by fn:error when no other error code is provided.
This error is raised if the picture string or calendar supplied to fn:format-date, fn:format-time, or fn:format-dateTime has invalid syntax.
This error is raised if the picture string supplied to fn:format-date selects a component that is not present in a date, or if the picture string supplied to fn:format-time selects a component that is not present in a time.
Raised by fn:hash if the effective value of the supplied algorithm is not one of the values supported by the implementation.
Raised by functions such as fn:json-doc, fn:parse-json or fn:json-to-xml if the string supplied as input does not conform to the JSON grammar (optionally with implementation-defined extensions).
Raised by functions such as map:merge, fn:json-doc, fn:parse-json or fn:json-to-xml if the input contains duplicate keys, when the chosen policy is to reject duplicates.
Raised by fn:json-to-xml if validation is requested when the processor does not support schema validation or typed nodes.
Raised by functions such as map:merge, fn:parse-json, and fn:xml-to-json if the $options map contains an invalid entry.
Raised by fn:xml-to-json if the XML input does not conform to the rules for the XML representation of JSON.
Raised by fn:xml-to-json if the XML input uses the attribute escaped="true" or escaped-key="true", and the corresponding string or key contains an invalid JSON escape sequence.
Raised by fn:element-to-map if the layout selected for converting elements of a given name is unsuitable for an element node with that name, or if the conversion plan explicitly defines the processing of a particular element as an error.
Raised by fn:resolve-QName and analogous functions if a supplied QName has a prefix that has no binding to a namespace.
Raised by fn:resolve-uri if no base URI is available for resolving a relative URI.
Raised by fn:path if the node supplied in the origin option is not an ancestor of the $node whose relative path is required.
Raised by fn:load-xquery-module if the supplied module URI is zero-length.
Raised by fn:load-xquery-module if no module can be found with the supplied module URI.
Raised by fn:load-xquery-module if a static error (including a statically detected type error) is encountered when processing the library module.
Raised by fn:load-xquery-module if a value is supplied for the initial context item or for an external variable, and the value does not conform to the required type declared in the dynamically loaded module.
Raised by fn:load-xquery-module if no XQuery processor is available supporting the requested XQuery version (or if none is available at all).
A general-purpose error raised when casting, if a cast between two datatypes is allowed in principle, but the supplied value cannot be converted: for example when attempting to cast the string "nine" to an integer.
Raised when either argument to fn:resolve-uri is not a valid URI/IRI.
Raised by fn:zero-or-one if the supplied value contains more than one item.
Raised by fn:one-or-more if the supplied value is anthe empty sequence.
Raised by fn:exactly-one if the supplied value is not a singleton sequence.
Raised by functions such as fn:max, fn:min, fn:avg, fn:sum if the supplied sequence contains values inappropriate to this function.
Raised by fn:dateTime if the two arguments both have timezones and the timezones are different.
A catch-all error for fn:resolve-uri, recognizing that the implementation can choose between a variety of algorithms and that some of these may fail for a variety of reasons.
Raised when the input to fn:parse-ietf-date does not match the prescribed grammar, or when it represents an invalid date/time such as 31 February.
Raised when the radix supplied to fn:parse-integer is not in the range 2 to 36.
Raised when the digits in the string supplied to fn:parse-integer are not in the range appropriate to the chosen radix.
Raised by regular expression functions such as fn:matches and fn:replace if the regular expression flags contain a character other than i, m, q, s, or x.
Raised by regular expression functions such as fn:matches and fn:replace if the regular expression is syntactically invalid.
Raised by fn:replace to report errors in the replacement string.
Raised by fn:replace if both the $replacement and $action arguments are supplied.
Raised by fn:data, or by implicit atomization, if applied to a node with no typed value, the main example being an element validated against a complex type that defines it to have element-only content.
Raised by fn:data, or by implicit atomization, if the sequence to be atomized contains a function item other than an array.
Raised by fn:string, or by implicit string conversion, if the input sequence contains a function item.
A dynamic error is raised if the authority component of a URI contains an open square bracket but no corresponding close square bracket.
Raised by fn:unparsed-text or fn:unparsed-text-lines if the $source argument contains a fragment identifier, or if it cannot be resolved to an absolute URI (for example, because the base-URI property in the static context is absent), or if it cannot be used to retrieve the string representation of a resource.
Raised by fn:unparsed-text or fn:unparsed-text-lines if the $encoding argument is not a valid encoding name, if the processor does not support the specified encoding, if the string representation of the retrieved resource contains octets that cannot be decoded into Unicode characters using the specified encoding, or if the resulting characters are not permitted characters.
Raised by fn:unparsed-text or fn:unparsed-text-lines if the $encoding argument is absent and the processor cannot infer the encoding using external information and the encoding is not UTF-8.
A dynamic error is raised if no XSLT processor suitable for evaluating a call on fn:transform is available.
A dynamic error is raised if the parameters supplied to fn:transform are invalid, for example if two mutually exclusive parameters are supplied. If a suitable XSLT error code is available (for example in the case where the requested initial-template does not exist in the stylesheet), that error code should be used in preference.
A dynamic error is raised if an XSLT transformation invoked using fn:transform fails with a static or dynamic error. The XSLT error code is used if available; this error code provides a fallback when no XSLT error code is returned, for example because the processor is an XSLT 1.0 processor.
A dynamic error is raised if the fn:transform function is invoked when XSLT transformation (or a specific transformation option) has been disabled for security or other reasons.
A dynamic error is raised if the result of the fn:transform function contains characters available only in XML 1.1 and the calling processor cannot handle such characters.
It is implementation-defined which version of Unicode is supported, but it is recommended that the most recent version of Unicode be used. (See Conformance.)
It is implementation-defined whether the type system is based on XML Schema 1.0 or XML Schema 1.1. (See Conformance.)
It is implementation-defined whether definitions that rely on XML (for example, the set of valid XML characters) should use the definitions in XML 1.0 or XML 1.1. (See Conformance.)
Implementations may attach an implementation-defined meaning to options in the map that are not described in this specification. These options should use values of type xs:QName as the option names, using an appropriate namespace. (See Options.)
It is implementation-defined which version of [The Unicode Standard] is supported, but it is recommended that the most recent version of Unicode be used. (See Strings, characters, and codepoints.)
[Definition] Some functions (such as fn:in-scope-prefixes, fn:load-xquery-module, and fn:unordered) produce result sequences or result maps in an implementation-defined or implementation-dependent order. In such cases two calls with the same arguments are not guaranteed to produce the results in the same order. These functions are said to be nondeterministic with respect to ordering. (See Properties of functions.)
Where the results of a function are described as being (to a greater or lesser extent) implementation-defined or implementation-dependent, this does not by itself remove the requirement that the results should be deterministic: that is, that repeated calls with the same explicit and implicit arguments must return identical results. (See Properties of functions.)
They may provide an implementation-defined mechanism that allows users to choose between raising an error and returning a result that is modulo the largest representable integer value. See [ISO 10967]. (See Arithmetic operators on numeric values.)
For xs:decimal values, let N be the number of digits of precision supported by the implementation, and let M (M <= N) be the minimum limit on the number of digits required for conformance (18 digits for XSD 1.0, 16 digits for XSD 1.1). Then for addition, subtraction, and multiplication operations, the returned result should be accurate to N digits of precision, and for division and modulus operations, the returned result should be accurate to at least M digits of precision. The actual precision is implementation-defined. If the number of digits in the mathematical result exceeds the number of digits that the implementation retains for that operation, the result is truncated or rounded in an implementation-defined manner. (See Arithmetic operators on numeric values.)
The [IEEE 754-2019] specification also describes handling of two exception conditions called divideByZero and invalidOperation. The IEEE divideByZero exception is raised not only by a direct attempt to divide by zero, but also by operations such as log(0). The IEEE invalidOperation exception is raised by attempts to call a function with an argument that is outside the function’s domain (for example, sqrt(-1) or log(-1)). Although IEEE defines these as exceptions, it also defines “default non-stop exception handling” in which the operation returns a defined result, typically positive or negative infinity, or NaN. With this function library, these IEEE exceptions do not cause a dynamic error at the application level; rather they result in the relevant function or operator returning the defined non-error result. The underlying IEEE exception may be notified to the application or to the user by some implementation-defined warning condition, but the observable effect on an application using the functions and operators defined in this specification is simply to return the defined result (typically -INF, +INF, or NaN) with no error. (See Arithmetic operators on numeric values.)
The [IEEE 754-2019] specification distinguishes two NaN values: a quiet NaN and a signaling NaN. These two values are not distinguishable in the XDM model: the value spaces of xs:float and xs:double each include only a single NaN value. This does not prevent the implementation distinguishing them internally, and triggering different implementation-defined warning conditions, but such distinctions do not affect the observable behavior of an application using the functions and operators defined in this specification. (See Arithmetic operators on numeric values.)
The implementation may adopt a different algorithm provided that it is equivalent to this formulation in all cases where implementation-dependent or implementation-defined behavior does not affect the outcome, for example, the implementation-defined precision of the result of xs:decimal division. (See op:numeric-integer-divide.)
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation. (See fn:divide-decimals.)
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation. (See fn:round.)
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation. (See fn:round-half-to-even.)
XSD 1.1 allows the string +INF as a representation of positive infinity; XSD 1.0 does not. It is implementation-defined whether XSD 1.1 is supported. (See fn:number.)
Any other format token, which indicates a numbering sequence in which that token represents the number 1 (one) (but see the note below). It is implementation-defined which numbering sequences, additional to those listed above, are supported. If an implementation does not support a numbering sequence represented by the given token, it must use a format token of 1. (See fn:format-integer.)
For all format tokens other than a digit-pattern, there may be implementation-defined lower and upper bounds on the range of numbers that can be formatted using this format token; indeed, for some numbering sequences there may be intrinsic limits. For example, the format token U+2460 (CIRCLED DIGIT ONE, ①) has a range imposed by the Unicode character repertoire — zero to 20 in Unicode versions prior to 3.2, or zero to 50 in subsequent versions. For the numbering sequences described above any upper bound imposed by the implementation must not be less than 1000 (one thousand) and any lower bound must not be greater than 1. Numbers that fall outside this range must be formatted using the format token 1. (See fn:format-integer.)
The set of languages for which numbering is supported is implementation-defined. If the $language argument is absent, or is set to anthe empty sequence, or is invalid, or is not a language supported by the implementation, then the number is formatted using the default language from the dynamic context. (See fn:format-integer.)
...either a or t, to indicate alphabetic or traditional numbering respectively, the default being implementation-defined. (See fn:format-integer.)
The string of characters between the parentheses, if present, is used to select between other possible variations of cardinal or ordinal numbering sequences. The interpretation of this string is implementation-defined. No error occurs if the implementation does not define any interpretation for the defined string. (See fn:format-integer.)
It is implementation-defined what combinations of values of the format token, the language, and the cardinal/ordinal modifier are supported. If ordinal numbering is not supported for the combination of the format token, the language, and the string appearing in parentheses, the request is ignored and cardinal numbers are generated instead. (See fn:format-integer.)
The use of the a or t modifier disambiguates between numbering sequences that use letters. In many languages there are two commonly used numbering sequences that use letters. One numbering sequence assigns numeric values to letters in alphabetic sequence, and the other assigns numeric values to each letter in some other manner traditional in that language. In English, these would correspond to the numbering sequences specified by the format tokens a and i. In some languages, the first member of each sequence is the same, and so the format token alone would be ambiguous. In the absence of the a or t modifier, the default is implementation-defined. (See fn:format-integer.)
The static context provides a set of decimal formats. One of the decimal formats is unnamed, the others (if any) are identified by a QName. There is always an unnamed decimal format available, but its contents are implementation-defined. (See Defining a decimal format.)
IEEE states that the preferred quantum is language-defined. In this specification, it is implementation-defined. (See Trigonometric and exponential functions.)
IEEE defines various rounding algorithms for inexact results, and states that the choice of rounding direction, and the mechanisms for influencing this choice, are language-defined. In this specification, the rounding direction and any mechanisms for influencing it are implementation-defined. (See Trigonometric and exponential functions.)
The map returned by the fn:random-number-generator function may contain additional entries beyond those specified here, but it must match the record type defined above. The meaning of any additional entries is implementation-defined. To avoid conflict with any future version of this specification, the keys of any such entries should start with an underscore character. (See fn:random-number-generator.)
It is no longer automatically an error if the input contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. (See fn:codepoints-to-string.)
If two query parameters use the same keyword then the last one wins. If a query parameter uses a keyword or value which is not defined in this specification then the meaning is implementation-defined. If the implementation recognizes the meaning of the keyword and value then it should interpret it accordingly; if it does not recognize the keyword or value then if the fallback parameter is present with the value no it should reject the collation as unsupported, otherwise it should ignore the unrecognized parameter. (See The Unicode Collation Algorithm.)
The following query parameters are defined. If any parameter is absent, the default is implementation-defined except where otherwise stated. The meaning given for each parameter is non-normative; the normative specification is found in [UTS #35]. (See The Unicode Collation Algorithm.)
Because the set of collations that are supported is implementation-defined, an implementation has the option to support all collation URIs, in which case it will never raise this error. (See Choosing a collation.)
The properties available are as defined for the Unicode Collation Algorithm (see 5.3.4 The Unicode Collation Algorithm). Additional implementation-defined properties may be specified as described in the rules for UCA collation URIs. (See fn:collation.)
It is possible to define collations that do not have the ability to generate collation keys. Supplying such a collation will cause the function to fail. The ability to generate collation keys is an implementation-defined property of the collation. (See fn:collation-key.)
Conforming implementations must support normalization form NFC and may support normalization forms NFD, NFKC, NFKD, and FULLY-NORMALIZED. They may also support other normalization forms with implementation-defined semantics. (See fn:normalize-unicode.)
It is implementation-defined which version of Unicode (and therefore, of the normalization algorithms and their underlying data) is supported by the implementation. See [UAX #15] for details of the stability policy regarding changes to the normalization rules in future versions of Unicode. If the input string contains codepoints that are unassigned in the relevant version of Unicode, or for which no normalization rules are defined, the fn:normalize-unicode function leaves such codepoints unchanged. If the implementation supports the requested normalization form then it must be able to handle every input string without raising an error. (See fn:normalize-unicode.)
It is possible to define collations that do not have the ability to decompose a string into units suitable for substring matching. An argument to a function defined in this section may be a URI that identifies a collation that is able to compare two strings, but that does not have the capability to split the string into collation units. Such a collation may cause the function to fail, or to give unexpected results, or it may be rejected as an unsuitable argument. The ability to decompose strings into collation units is an implementation-defined property of the collation. The fn:collation-available function can be used to ask whether a particular collation has this property. (See Functions based on substring matching.)
The result of the function will always be such that validation against this schema would succeed. However, it is implementation-defined whether the result is typed or untyped, that is, whether the elements and attributes in the returned tree have type annotations that reflect the result of validating against this schema. (See fn:analyze-string.)
Some URI schemes are hierarchical and some are non-hierarchical. Implementations must treat the following schemes as non-hierarchical: jar, mailto, news, tag, tel, and urn. Whether additional schemes are known to be non-hierarchical implementation-defined. If a scheme is not known to be non-hierarchical, it must be treated as hierarchical. (See Parsing and building URIs.)
If the omit-default-ports option is true, the port is discarded and set to the empty sequence if the port number is the same as the default port for the given scheme. Implementations should recognize the default ports for http (80), https (443), ftp (21), and ssh (22). Exactly which ports are recognized is implementation-defined. (See fn:parse-uri.)
If the omit-default-ports option is true then the $port is set to the empty sequence if the port number is the same as the default port for the given scheme. Implementations should recognize the default ports for http (80), https (443), ftp (21), and ssh (22). Exactly which ports are recognized is implementation-defined. (See fn:build-uri.)
Processors may support a greater range and/or precision. The limits are implementation-defined. (See Limits and precision.)
Similarly, a processor may be unable accurately to represent the result of dividing a duration by 2, or multiplying a duration by 0.5. A processor that limits the precision of the seconds component of duration values must deliver a result that is as close as possible to the mathematically precise result, given these limits; if two values are equally close, the one that is chosen is implementation-defined. (See Limits and precision.)
All conforming processors must support year values in the range 1 to 9999, and a minimum fractional second precision of 1 millisecond or three digits (i.e.that is, s.sss). However, processors may set larger implementation-defined limits on the maximum number of digits they support in these two situations. Processors may also choose to support the year 0 and years with negative values. The results of operations on dates that cross the year 0 are implementation-defined. (See Limits and precision.)
Similarly, a processor that limits the precision of the seconds component of date and time or duration values may need to deliver a rounded result for arithmetic operations. Such a processor must deliver a result that is as close as possible to the mathematically precise result, given these limits: if two values are equally close, the one that is chosen is implementation-defined. (See Limits and precision.)
...the format token n, N, or Nn, indicating that the value of the component is to be output by name, in lower-case, upper-case, or title-case respectively. Components that can be output by name include (but are not limited to) months, days of the week, timezones, and eras. If the processor cannot output these components by name for the chosen calendar and language then it must use an implementation-defined fallback representation. (See The picture string.)
...indicates alphabetic or traditional numbering respectively, the default being implementation-defined. This has the same meaning as in the second argument of fn:format-integer. (See The picture string.)
The sequence of characters in the (adjusted) first presentation modifier is reversed (for example, 999'### becomes ###'999). If the result is not a valid decimal digit pattern, then the output is implementation-defined. (See Formatting Fractional Seconds.)
The output for these components is entirely implementation-defined. The default presentation modifier for these components is n, indicating that they are output as names (or conventional abbreviations), and the chosen names will in many cases depend on the chosen language: see 9.8.4.8 The language, calendar, and place arguments. (See Formatting Other Components.)
The set of languages, calendars, and places that are supported in the date formatting functions is implementation-defined. When any of these arguments is omitted or is anthe empty sequence, an implementation-defined default value is used. (See The language, calendar, and place arguments.)
The choice of the names and abbreviations used in any given language is implementation-defined. For example, one implementation might abbreviate July as Jul while another uses Jly. In German, one implementation might represent Saturday as Samstag while another uses Sonnabend. Implementations may provide mechanisms allowing users to control such choices. (See The language, calendar, and place arguments.)
The choice of the names and abbreviations used in any given language for calendar units such as days of the week and months of the year is implementation-defined. (See The language, calendar, and place arguments.)
The calendar value if present must be a valid EQName (dynamic error: [err:FOFD1340]). If it is a lexical QName then it is expanded into an expanded QName using the statically known namespaces; if it has no prefix then it represents an expanded-QName in no namespace. If the expanded QName is in no namespace, then it must identify a calendar with a designator specified below (dynamic error: [err:FOFD1340]). If the expanded QName is in a namespace then it identifies the calendar in an implementation-defined way. (See The language, calendar, and place arguments.)
At least one of the above calendars must be supported. It is implementation-defined which calendars are supported. (See The language, calendar, and place arguments.)
If the arguments to fn:function-lookup identify a function that is present in the static context of the function call, the function will always return the same function that a static reference to this function would bind to. If there is no such function in the static context, then the results depend on what is present in the dynamic context, which is implementation-defined. (See fn:function-lookup.)
It is to some extent implementation-defined whether two maps or arrays have the same function identity. Processors should ensure as a minimum that when a variable $m is bound to a map or array, calling jtree($m) more than once (with the same variable reference) will deliver the same JNode each time. (See fn:jtree.)
The requirement to deliver a deterministic result has performance implications, and for this reason implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call of the function must either return a deterministic result or must raise a dynamic error [err:FODC0003]. (See fn:doc.)
Various aspects of this processing are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:... (See fn:doc.)
It is implementation-defined whether DTD validation and/or schema validation is applied to the source document. (See fn:doc.)
The effect of a fragment identifier in the supplied URI is implementation-defined. One possible interpretation is to treat the fragment identifier as an ID attribute value, and to return a document node having the element with the selected ID value as its only child. (See fn:doc.)
By default, this function is deterministic. This means that repeated calls on the function with the same argument will return the same result. However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call to this function must either return a deterministic result or must raise a dynamic error [err:FODC0003]. (See fn:collection.)
By default, this function is deterministic. This means that repeated calls on the function with the same argument will return the same result. However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call to this function must either return a deterministic result or must raise a dynamic error [err:FODC0003]. (See fn:uri-collection.)
It is no longer automatically an error if the resource (after decoding) contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. (See fn:unparsed-text.)
UTF-8., or a value that results..the encoding inferred from the initial octets of the resource, or from implementation-defined heuristics as defined by the rules of the bin:infer-encoding function. (See fn:unparsed-text.)
The fact that the resolution of URIs is defined by a mapping in the dynamic context means that in effect, various aspects of the behavior of this function are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:... (See fn:unparsed-text.)
The fact that the resolution of URIs is defined by a mapping in the dynamic context means that in effect, various aspects of the behavior of this function are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:... (See fn:unparsed-binary.)
The collation used for matching names is implementation-defined, but must be the same as the collation used to ensure that the names of all environment variables are unique. (See fn:environment-variable.)
Except to the extent defined by these options, the precise process used to construct the XDM instance is implementation-defined. In particular, it is implementation-defined whether an XML 1.0 or XML 1.1 parser is used. (See fn:parse-xml.)
Options set in $options may be supplemented or modified based on configuration options defined externally using implementation-defined mechanisms. (See fn:parse-xml.)
Except as explicitly defined, the precise process used to construct the XDM instance is implementation-defined. In particular, it is implementation-defined whether an XML 1.0 or XML 1.1 parser is used. (See fn:parse-xml-fragment.)
If the second argument is omitted, or is supplied in the form of an output:serialization-parameters element, then the values of any serialization parameters that are not explicitly specified is implementation-defined, and may depend on the context. (See fn:serialize.)
A list of target namespaces identifying schema components to be used for validation. The way in which the processor locates schema components for the specified target namespaces is implementation-defined. A zero-length string denotes a no-namespace schema.... (See fn:xsd-validator.)
Set to the decimal value 1.0 or 1.1 to indicate which version of XSD is to be used. The default is implementation-defined. A processor may use a later version of XSD than the version requested, but must not use an earlier version.... (See fn:xsd-validator.)
The XSD specification allows a schema to be used for validation even when it contains unresolved references to absent schema components. It is implementation-defined whether this function allows the schema to be incomplete in this way. For example, some processors might allow validation using a schema in which an element declaration contains a reference to a type declaration that is not present in the schema, provided that the element declaration is never needed in the course of a particular validation episode. (See fn:xsd-validator.)
...error-details as map(*)*. This field is present only when (a) the option return-error-details was set to true, and (b) the supplied document was found to be invalid. The value is a sequence of maps, each containing details of one invalidity that was found. The precise details of the invalidities are implementation-defined, but they may include the following fields, if the information is available:... (See fn:xsd-validator.)
Because the [DOM: Living Standard] and [HTML: Living Standard] are not fixed, it is implementation-defined which versions are used. (See XDM Mapping from HTML DOM Nodes.)
If an implementation allows these nodes to be passed in via an API or similar mechanism, their behaviour is implementation-defined. (See XDM Mapping from HTML DOM Nodes.)
If the local name contains a character that is not a valid XML NameStartChar or NameChar, then an implementation-defined replacement string is used. The result must be a valid NCName. (See node-name Accessor.)
If the local name contains a character that is not a valid XML NameStartChar or NameChar, then an implementation-defined replacement string is used. The result must be a valid NCName. (See node-name Accessor.)
The input may contain deviations from the grammar of [RFC 7159], which are handled in an implementation-defined way. (Note: some popular extensions include allowing quotes on keys to be omitted, allowing a comma to appear after the last item in an array, allowing leading zeroes in numbers, and allowing control characters such as tab and newline to be present in unescaped form.) Since the extensions accepted are implementation-defined, an error may be raised [err:FOJS0001] if the input does not conform to the grammar. (See fn:parse-json.)
The supplied function is called to process the string value of any JSON number in the input. By default, numbers are processed by converting to xs:double using the XPath casting rules. Supplying the value xs:decimal#1 will instead convert to xs:decimal (which potentially retains more precision, but disallows exponential notation), while supplying a function that casts to (xs:decimal | xs:double) will treat the value as xs:decimal if there is no exponent, or as xs:double otherwise. Supplying the value fn:identity#1 causes the value to be retained unchanged as an xs:untypedAtomic. If the liberal option is false (the default), then the supplied number-parser is called if and only if the value conforms to the JSON grammar for numbers (for example, a leading plus sign and redundant leading zeroes are not allowed). If the liberal option is true then it is also called if the value conforms to an implementation-defined extension of this grammar. (See fn:parse-json.)
It is no longer automatically an error if the input contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. (See fn:json-doc.)
The input may contain deviations from the grammar of [RFC 7159], which are handled in an implementation-defined way. (Note: some popular extensions include allowing quotes on keys to be omitted, allowing a comma to appear after the last item in an array, allowing leading zeroes in numbers, and allowing control characters such as tab and newline to be present in unescaped form.) Since the extensions accepted are implementation-defined, an error may be raised (see below) if the input does not conform to the grammar. (See fn:json-to-xml.)
Default: Implementation-defined. (See fn:json-to-xml.)
Indicates that the resulting XDM instance must be typed; that is, the element and attribute nodes must carry the type annotations that result from validation against the schema given at D.2 Schema for the result of fn:json-to-xml, or against an implementation-defined schema if the liberal option has the value true. (See fn:json-to-xml.)
The result of the function will always be such that validation against this schema would succeed. However, it is implementation-defined whether the result is typed or untyped, that is, whether the elements and attributes in the returned tree have type annotations that reflect the result of validating against this schema. (See fn:csv-to-xml.)
Additional, implementation-defined options may be available, for example, to control aspects of the XML serialization, to specify the grammar start symbol, or to produce output formats other than XML. (See fn:invisible-xml.)
Default: The version given in the prolog of the library module; or implementation-defined if this is absent. (See fn:load-xquery-module.)
A sequence of URIs (in the form of xs:string values) which may be used or ignored in an implementation-defined way.... (See fn:load-xquery-module.)
Values for vendor-defined configuration options for the XQuery processor used to process the request. The key is the name of an option, expressed as a QName: the namespace URI of the QName should be a URI controlled by the vendor of the XQuery processor. The meaning of the associated value is implementation-defined. Implementations should ignore options whose names are in an unrecognized namespace. The option parameter conventions do not apply to this contained map.... (See fn:load-xquery-module.)
It is implementation-defined whether constructs in the library module are evaluated in the same execution scope as the calling module. (See fn:load-xquery-module.)
The library module that is loaded may import schema declarations using an import schema declaration. It is implementation-defined whether schema components in the in-scope schema definitions of the calling module are automatically added to the in-scope schema definitions of the dynamically loaded module. The in-scope schema definitions of the calling and called modules must be consistent, according to the rules defined in 2.2.5 Consistency Constraints XQ31. (See fn:load-xquery-module.)
The serialized result is written to persistent storage. This means that the fn:transform function has side-effects and becomes nondeterministic, so the option should be used with care, and the precise behavior may be implementation-defined. When this option is used, the URIs used for the base-output-uri and the URIs of any secondary result documents must be writable locations. (See fn:transform.)
Indicates whether any xsl:message instructions in the stylesheet are to be evaluated. The destination and formatting of any such messages is implementation-defined. (See fn:transform.)
Default: Implementation-defined. (See fn:transform.)
Default: Implementation-defined. (See fn:transform.)
If the implementation provides a way of writing or invoking functions with side-effects, this post-processing function might be used to save a copy of the result document to persistent storage. For example, if the implementation provides access to the EXPath File library [EXPath], then a serialized document might be written to filestore by calling the file:write function. Similar mechanisms might be used to issue an HTTP POST request that posts the result to an HTTP server, or to send the document to an email recipient. The semantics of calling functions with side-effects are entirely implementation-defined. (See fn:transform.)
Calls to fn:transform can potentially have side-effects even in the absence of the post-processing option, because the XSLT specification allows a stylesheet to invoke extension functions that have side-effects. The semantics in this case are implementation-defined. (See fn:transform.)
A string intended to be used as the static base URI of the principal stylesheet module. This value must be used if no other static base URI is available. If the supplied stylesheet already has a base URI (which will generally be the case if the stylesheet is supplied using stylesheet-node or stylesheet-location) then it is implementation-defined whether this parameter has any effect. If the value is a relative reference, it is resolved against the executable base URIXP of the fn:transform function call.... (See fn:transform.)
Values for vendor-defined configuration options for the XSLT processor used to process the request. The key is the name of an option, expressed as a QName: the namespace URI of the QName should be a URI controlled by the vendor of the XSLT processor. The meaning of the associated value is implementation-defined. Implementations should ignore options whose names are in an unrecognized namespace. Default is anthe empty map.... (See fn:transform.)
It is implementation-defined whether the XSLT transformation is executed within the same execution scope as the calling code. (See fn:transform.)
XSLT 1.0 does not define any error codes, so this is the likely outcome with an XSLT 1.0 processor. XSLT 2.0 and 3.0 do define error codes, but some APIs do not expose them. If multiple errors are signaled by the transformation (which is most likely to happen with static errors) then the error code should where possible be that of one of these errors, chosen arbitrarily; the processor may make details of additional errors available to the application in an implementation-defined way. (See fn:transform.)
In addition, the values of $input, typically serialized and converted to an xs:string, and $label (if supplied and non-empty) may be output to an implementation-defined destination. (See fn:trace.)
Consider a situation in which a user wants to investigate the actual value passed to a function. Assume that in a particular execution, $v is an xs:decimal with value 124.84. Writing fn:trace($v, 'the value of $v is:') will return $v. The processor may output "124.84" and "the value of $v is:" to an implementation-defined destination. (See fn:trace.)
Similar to fn:trace, the values of $input, typically serialized and converted to an xs:string, and $label (if supplied and non-empty) may be output to an implementation-defined destination. (See fn:message.)
If ST is xs:float or xs:double, then TV is the xs:decimal value, within the set of xs:decimal values that the implementation is capable of representing, that is numerically closest to SV. If two values are equally close, then the one that is closest to zero is chosen. If SV is too large to be accommodated as an xs:decimal, (see [XML Schema Part 2: Datatypes Second Edition] for implementation-defined limits on numeric values) a dynamic error is raised [err:FOCA0001]. If SV is one of the special xs:float or xs:double values NaN, INF, or -INF, a dynamic error is raised [err:FOCA0002]. (See Casting to xs:decimal.)
In casting to xs:decimal or to a type derived from xs:decimal, if the value is not too large or too small but nevertheless cannot be represented accurately with the number of decimal digits available to the implementation, the implementation may round to the nearest representable value or may raise a dynamic error [err:FOCA0006]. The choice of rounding algorithm and the choice between rounding and error behavior is implementation-defined. (See Casting from xs:string and xs:untypedAtomic.)
If ST is xs:decimal, xs:float or xs:double, then TV is SV with the fractional part discarded and the value converted to xs:integer. Thus, casting 3.1456 returns 3 while -17.89 returns -17. Casting 3.124E1 returns 31. If SV is too large to be accommodated as an integer, (see [XML Schema Part 2: Datatypes Second Edition] for implementation-defined limits on numeric values) a dynamic error is raised [err:FOCA0003]. If SV is one of the special xs:float or xs:double values NaN, INF, or -INF, a dynamic error is raised [err:FOCA0002]. (See Casting to xs:integer.)
The tz timezone database, available at http://www.iana.org/time-zones. It is implementation-defined which version of the database is used. (See IANA Timezone Database.)
Unicode Standard Annex #15: Unicode Normalization Forms. Ed. Mark Davis and Ken Whistler, Unicode Consortium. The current version is 16.0.0, dated 2024-08-14. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr15/. (See UAX #15.)
Unicode Standard Annex #29: Unicode Text Segmentation. Ed. Josh Hadley, Unicode Consortium. The current version is 16.0.0, dated 2024-08-28. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr29/. (See UAX #29.)
The Unicode Consortium, Reading, MA, Addison-Wesley, 2016. The Unicode Standard as updated from time to time by the publication of new versions. See http://www.unicode.org/standard/versions/ for the latest version and additional information on versions of the standard and of the Unicode Character Database. The version of Unicode to be used is implementation-defined, but implementations are recommended to use the latest Unicode version; currently, Version 9.0.0. (See The Unicode Standard.)
Unicode Technical Standard #10: Unicode Collation Algorithm. Ed. Mark Davis and Ken Whistler, Unicode Consortium. The current version is 16.0.0, dated 2024-08-22. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr10/. (See UTS #10.)
Unicode Technical Standard #35: Unicode Locale Data Markup Language. Ed Mark Davis et al, Unicode Consortium. The current version is 47, dated 2025-03-11. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr35/. (See UTS #35.)
If a section of this specification has been updated since version 3.1, an overview of the changes is provided, along with links to navigate to the next or previous change.
See 1 Introduction
Sections with significant changes are marked with a ✭ symbol in the table of contents. New functions are indicated by ✚.
See 1 Introduction
PR 1504 2329
New in 4.0
New in 4.0
New in 4.0
See 2.1.12 fn:slice
PR 1120 1150
A callback function can be supplied for comparing individual items.
Changed in 4.0 to use transitive equality comparisons for numeric values.
PR 614 987
New in 4.0
New in 4.0. Originally proposed under the name fn:uniform
New in 4.0. Originally proposed under the name fn:unique
New in 4.0
See 2.5.3 fn:every
New in 4.0
See 2.5.9 fn:highest
New in 4.0
New in 4.0
See 2.5.11 fn:lowest
New in 4.0
New in 4.0
See 2.5.16 fn:some
PR 795 2228
New in 4.0
PR 521 761
New in 4.0
New in 4.0
See 4.4.5 fn:is-NaN
PR 1260 1275
A third argument has been added, providing control over the rounding mode.
See 4.4.6 fn:round
PR 1049 1151
Decimal format parameters can now be supplied directly as a map in the third argument, rather than referencing a format defined in the static context.
PR 1205 1230
New in 4.0
See 4.8.2 math:e
See 4.8.8 math:cosh
See 4.8.15 math:sinh
See 4.8.18 math:tanh
The 3.1 specification suggested that every value in the result range should have the same chance of being chosen. This has been corrected to say that the distribution should be arithmetically uniform (because there are as many xs:double values between 0.01 and 0.1 as there are between 0.1 and 1.0).
PR 261 306 993
New in 4.0
See 5.4.1 fn:char
New in 4.0
PR 937 995 1190
New in 4.0
See 5.4.13 fn:hash
PR 215 415
New in 4.0
PR 1423 1413
New in 4.0
New in 4.0
PR 1620 1886
Options are added to customize the form of the output.
See 12.2.9 fn:path
PR 1547 1551
New in 4.0
PR 969 1134
New in 4.0
PR 478 515
New in 4.0
PR 1575 1906
A new function fn:element-to-map is provided for converting XDM trees to maps suitable for serialization as JSON. Unlike the fn:xml-to-json function retained from 3.1, this can handle arbitrary XML as input.
New in 4.0
PR 968 1295
New in 4.0
PR 476 1087
New in 4.0
PR 360 476
New in 4.0
Supplying anthe empty sequence as the value of an optional argument is equivalent to omitting the argument.
PR 1117 1279
The $options parameter has been added.
PR 259 956
A new function is available for processing input data in HTML format.
See 17.3 Functions on HTML Data
New in 4.0
An option is provided to control how JSON numbers should be formatted.
Additional options are available, as defined by fn:parse-json.
New in 4.0
New in 4.0
New in 4.0
PR 629 803
New in 4.0
PR 533 719 834
New functions are available for processing input data in CSV (comma separated values) format.
Comparison of mixed numeric types (for example xs:double and xs:decimal) now generally converts both values to xs:decimal.
PR 289 1901
A third argument is added, allowing user control of how absent keys should be handled.
See 14.4.9 map:get
A third argument is added, allowing user control of how index-out-of-bounds conditions should be handled.
A new collation URI is defined for Unicode case-insensitive comparison and ordering.
PR 1727 1740
It is no longer guaranteed that the new key replaces the existing key.
See 14.4.14 map:put
The group may remove this function, it is considered at risk.
PR 173
New in 4.0
See 18.4 fn:op
PR 203
New in 4.0
See 14.4.1 map:build
PR 207
New in 4.0
PR 222
New in 4.0
See 2.2.3 fn:contains-subsequence
PR 250
New in 4.0
See 2.1.3 fn:foot
See 2.1.15 fn:trunk
PR 258
New in 4.0
PR 313
The second argument can now be a sequence of integers.
See 2.1.9 fn:remove
PR 319
New in 4.0. The function replaces the internal op:same-key function in 3.1
PR 326
Higher-order functions are no longer an optional feature.
See 1.2 Conformance
PR 360
New in 4.0
PR 419
New in 4.0
PR 434
New in 4.0
The function has been extended to allow output in a radix other than 10, for example in hexadecimal.
PR 477
New in 4.0
PR 482
Deleted an inaccurate statement concerning the behavior of NaN.
PR 507
New in 4.0
PR 546
It is no longer automatically an error if the input contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted.
See 5.2.1 fn:codepoints-to-string
It is no longer automatically an error if the resource (after decoding) contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted.
The rules regarding use of non-XML characters in JSON texts have been relaxed.
See 17.4.3 JSON character repertoire
It is no longer automatically an error if the input contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted.
PR 609
New in 4.0
PR 631
New in 4.0
PR 662
Constructor functions now have a zero-arity form; the first argument defaults to the context item.
PR 680
The case-insensitive collation is now defined normatively within this specification, rather than by reference to the HTML "living specification", which is subject to change. The collation can now be used for ordering comparisons as well as equality comparisons.
PR 702
The function can now take any number of arguments (previously it had to be two or more), and the arguments can be sequences of strings rather than single strings.
See 5.4.4 fn:concat
PR 710
Changes the function to return a sequence of key-value pairs rather than a map.
PR 727
It has been clarified that loading a module has no effect on the static or dynamic context of the caller.
PR 828
The $predicate callback function accepts an optional position argument.
See 2.5.4 fn:filter
The $action callback function accepts an optional position argument.
The $predicate callback function now accepts an optional position argument.
The $action callback function now accepts an optional position argument.
PR 881
The way that fn:min and fn:max compare numeric values of different types has changed. The most noticeable effect is that when these functions are applied to a sequence of xs:integer or xs:decimal values, the result is an xs:integer or xs:decimal, rather than the result of converting this to an xs:float or xs:double.
See 2.4.5 fn:max
See 2.4.6 fn:min
PR 901
The optional third argument can now be supplied as anthe empty sequence.
The third argument can now be supplied as anthe empty sequence.
The second argument can now be anthe empty sequence.
The optional second argument can now be supplied as anthe empty sequence.
The 3rd, 4th, and 5th arguments are now optional; previously the function required either 2 or 5 arguments.
All three arguments are now optional, and each argument can be set to anthe empty sequence. Previously if $description was supplied, it could not be empty.
See 21.1.1 fn:error
The $label argument can now be set to anthe empty sequence. Previously if $label was supplied, it could not be empty.
See 21.2.1 fn:trace
PR 905
The rule that multiple calls on fn:doc supplying the same absolute URI must return the same document node has been clarified; in particular the rule does not apply if the dynamic context for the two calls requires different processing of the documents (such as schema validation or whitespace stripping).
See 17.1.1 fn:doc
PR 909
The function has been expanded in scope to handle comparison of values other than strings.
See 2.2.2 fn:compare
PR 924
Rules have been added clarifying that users should not be allowed to change the schema for the fn namespace.
See D Schemas
PR 925
The decimal format name can now be supplied as a value of type xs:QName, as an alternative to supplying a lexical QName as an instance of xs:string.
PR 932
The specification now prescribes a minimum precision and range for durations.
PR 933
When comments and processing instructions are ignored, any text nodes either side of the comment or processing instruction are now merged prior to comparison.
PR 940
New in 4.0
PR 953
Constructor functions for named record types have been introduced.
PR 962
New in 4.0
PR 969
New in 4.0
See 14.4.3 map:empty
PR 984
New in 4.0
See 8.4.1 fn:seconds
PR 987
The order of results is now prescribed; it was previously implementation-dependent.
PR 1022
Regular expressions can include comments (starting and ending with #) if the c flag is set.
See 6.1 Regular expression syntax
See 6.2 Flags
PR 1028
An option is provided to control how the JSON null value should be handled.
PR 1032
New in 4.0
See 2.1.17 fn:void
PR 1046
New in 4.0
PR 1059
Use of an option keyword that is not defined in the specification and is not known to the implementation now results in a dynamic error; previously it was ignored.
See 1.7 Options
PR 1068
New in 4.0
PR 1072
The return type is now specified more precisely.
PR 1090
When casting from a string to a duration or time or dateTime, it is now specified that when there are more digits in the fractional seconds than the implementation is able to retain, excess digits are truncated. Rounding upwards (which could affect the number of minutes or hours in the value) is not permitted.
PR 1093
New in 4.0
PR 1117
The $options parameter has been added.
PR 1182
The $predicate callback function may return anthe empty sequence (meaning false).
See 2.5.3 fn:every
See 2.5.4 fn:filter
See 2.5.16 fn:some
PR 1191
The $options parameter has been added, absorbing the $collation parameter.
New in 4.0
PR 1250
For selected properties including percent and exponent-separator, it is now possible to specify a single-character marker to be used in the picture string, together with a multi-character rendition to be used in the formatted output.
PR 1257
The $options parameter has been added.
PR 1262
New in 4.0
PR 1265
The constraints on the result of the function have been relaxed.
PR 1280
As a result of changes to the coercion rules, the number of supplied arguments can be greater than the number required: extra arguments are ignored.
See 2.5.1 fn:apply
PR 1288
Additional error conditions have been defined.
PR 1296
New in 4.0
PR 1333
A new option is provided to allow the content of the loaded module to be supplied as a string.
PR 1353
An option has been added to suppress the escaping of the solidus (forwards slash) character.
PR 1358
New in 4.0
PR 1361
The term atomic value has been replaced by atomic item.
See 1.9 Terminology
PR 1393
Changes the function to return a sequence of key-value pairs rather than a map.
PR 1409
This section now uses the term primitive type strictly to refer to the 20 atomic types that are not derived by restriction from another atomic type: that is, the 19 primitive atomic types defined in XSD, plus xs:untypedAtomic. The three types xs:integer, xs:dayTimeDuration, and xs:yearMonthDuration, which have custom casting rules but are not strictly-speaking primitive, are now handled in other subsections.
See 23.1 Casting from primitive types to primitive types
The rules for conversion of dates and times to strings are now defined entirely in terms of XSD 1.1 canonical mappings, since these deliver exactly the same result as the XPath 3.1 rules.
See 23.1.2.2 Casting date/time values to xs:string
The rules for conversion of durations to strings are now defined entirely in terms of XSD 1.1 canonical mappings, since the XSD 1.1 rules deliver exactly the same result as the XPath 3.1 rules.
PR 1455
Numbers now retain their original lexical form, except for any changes needed to satisfy JSON syntax rules (for example, stripping leading zero digits).
PR 1473
New in 4.0
PR 1481
The function has been extended to handle other Gregorian types such as xs:gYearMonth.
See 9.5.1 fn:year-from-dateTime
See 9.5.2 fn:month-from-dateTime
The function has been extended to handle other Gregorian types such as xs:gMonthDay.
See 9.5.3 fn:day-from-dateTime
The function has been extended to handle other types including xs:time.
See 9.5.4 fn:hours-from-dateTime
See 9.5.5 fn:minutes-from-dateTime
The function has been extended to handle other types such as xs:gYearMonth.
PR 1523
New functions are provided to obtain information about built-in types and types defined in an imported schema.
New in 4.0
PR 1545
New in 4.0
PR 1565
The default for the escape option has been changed to false. The 3.1 specification gave the default value as true, but this appears to have been an error, since it was inconsistent with examples given in the specification and with tests in the test suite.
PR 1570
New in 4.0
PR 1587
New in 4.0
PR 1611
The spec has been corrected to note that the function depends on the implicit timezone.
See 2.2.2 fn:compare
PR 1671
New in 4.0.
PR 1687
New in 4.0
PR 1703
Ordered maps are introduced.
Enhanced to allow for ordered maps.
See 14.4.7 map:find
See 14.4.14 map:put
The order of entries in maps is retained.
PR 1711
It is explicitly stated that the limits for $precision are implementation-defined.
See 4.4.6 fn:round
PR 1727
For consistency with the new function map:build, the handling of duplicates may now be controlled by supplying a user-defined callback function as an alternative to the fixed values for the earlier duplicates option.
PR 1734
In 3.1, given a mixed input sequence such as (1, 3, 4.2e0), the specification was unclear whether it was permitted to add the first two integer items using integer arithmetic, rather than converting all items to doubles before performing any arithmetic. The 4.0 specification is clear that this is permitted; but since the items can be reordered before being added, this is not required.
See 2.4.4 fn:avg
See 2.4.7 fn:sum
PR 1825
New in 4.0
PR 1856
Word boundaries can be matched. Lookahead and lookbehind assertions are supported. Assertions (including ^ and $) can no longer be followed by a quantifier.
See 6.1 Regular expression syntax
The output of the function is extended to allow the represention of captured groups found within lookahead assertions.
PR 1879
Additional options to control DTD and XInclude processing have been added.
PR 1897
The $replacement argument can now be a function that computes the replacement strings.
See 6.3.2 fn:replace
PR 1906
New in 4.0
See 14.5.10 fn:element-to-map-plan
New in 4.0.
PR 1910
An $options parameter is added. Note that the rules for the $options parameter control aspects of processing that were implementation-defined in earlier versions of this specification. An implementation may provide configuration options designed to retain backwards-compatible behavior when no explicit options are supplied.
See 17.1.1 fn:doc
PR 1913
It is now permitted for the regular expression to match a zero-length string.
See 6.3.2 fn:replace
PR 1933
New in 4.0
PR 1991
Named record types used in the signatures of built-in functions are now available as standard in the static context.
PR 2001
New in 4.0.
PR 2013
Support for binary input has been added.
See 17.2.2 fn:parse-xml-fragment
New in 4.0
PR 2030
This description of the XSD validation process was previously found (with some duplication) in the XQuery and XSLT specifications; those specifications now reference this description. As a side-effects, the descriptions of the process in XQuery and XSLT are better aligned.
PR 2031
Introduced the concept of JNodes.
New in 4.0
See 16.1.1 fn:jtree
PR 2149
Generalized to work with JNodes as well as XNodes.
The function is extended to handle JNodes.
See 12.2.9 fn:path
Generalized to work with JNodes as well as XNodes.
PR 2168
Atomic items of types xs:hexBinary and xs:base64Binary are now mutually comparable. In rare cases, where an application uses both types and assumes they are distinct, this can represent a backwards incompatibility.
PR 2223
An error may now be raised if the base URI is not a valid LEIRI reference.
PR 2224
The $action callback function now accepts an optional position argument.
PR 2228
New in 4.0
PR 2249
The specification now describes in more detail how to determine the effective encoding value.
PR 2256
In the interests of consistency, the index-of function now defines equality to mean contextually equal. This has the implication that NaN is now considered equal to NaN.
PR 2259
A new parameter canonical is available to give control over serialization of XML, XHTML, and JSON.
PR 2286
The type of $value has been generalized to xs:anyAtomicType?.
PR 2387
It is now recommended that out-of-range xs:double values should translate to positive or negative infinity.
This section summarizes the extent to which this specification is compatible with previous versions.
Version 4.0 of this function library is fully backwards compatible with version 3.1, except as noted below:
In fn:deep-equal, and in other functions such as fn:distinct-values that refer to fn:deep-equal, the rules for comparing values of different numeric types (for example, xs:double and xs:decimal) have changed. In previous versions of the specification, xs:decimal values were converted to xs:double, leading to a possible loss of precision. This could make comparisons non-transitive, leading to problems when grouping, and potentially (depending on the sort algorithm) with sorting. The problem has been fixed by requiring comparisons to be performed based on the exact mathematical value without any loss of precision.
This means, for example, that deep-equal(0.2, 0.2e0) is now false, whereas in previous versions it was true. The two values are not mathematically equal, because the exact decimal equivalent of the xs:double value written as 0.2e0 is 0.200000000000000011102230246251565404236316680908203125.
The corresponding change has not been made to the = and eq operators, because it was found to be too disruptive. For example, if the context node is the element <e price="10.0" discount="0.2"/>, there is an expectation that the expression @price - @discount = 9.8 should return true. But (assuming untyped data), the result of the subtraction is an xs:double whose precise value is 9.800000000000000710542735760100185871124267578125, so comparing the two values as decimals would return false.
In previous versions, unrecognized options supplied to the $options parameter of functions such as fn:parse-json were silently ignored. In 4.0, they are rejected as a type error, unless they are QNames with a non-absent namespace, or are extensions recognized by the implementation.
In version 4.0, omitting the $value of fn:error has the same effect as setting it to anthe empty sequence. In 3.1, the effects could be different (the effect of omitting the argument was implementation-defined).
In version 3.1, the fn:deep-equal function did not merge adjacent text nodes after stripping comments and processing instructions, so the elements <a>abc<!--note1-->def</code> and <a>abcde<!--note2-->f</code> were considered non-equal. In version 4.0, the text nodes are now merged prior to comparison, so these two elements compare equal.
In version 3.1, the atomic types xs:hexBinary and xs:base64Binary were not mutually comparable under the eq operator, and always compared not equal as map keys or under operations such as fn:distinct-values and fn:deep-equal. In version 4.0, instances of xs:hexBinary and xs:base64Binary are equal if they represent the same octet sequence. This means, for example, that the zero-length values xs:hexBinary("") and xs:base64Binary("") can no longer co-exist as keys in the same map.
The format of numeric values in the output of fn:xml-to-json may be different. In version 3.1, the supplied value was parsed as an xs:double and then serialized using the casting rules, resulting in an input value of 10000000 being output as 1e7. In version 4.0, the value is output as is, except for any changes (such as stripping of leading zeroes or a leading plus sign) that might be needed to ensure the result is valid JSON.
In version 4.0, the function signature of fn:namespace-uri-for-prefix constrains the first argument to be either an xs:NCName or a zero-length string (the new coercion rules mean that any string in the form of an xs:NCName is acceptable). If a string is supplied that does not meet these requirements, a type error will be raised. In version 3.1, this was not an error: it came under the rule that when no namespace binding existed for the supplied prefix, the function would return anthe empty sequence.
Furthermore, because the expected type of this parameter is no longer xs:string, the special coercion rules for xs:string parameters in XPath 1.0 compatibility mode no longer apply. For example, supplying xs:duration('PT1H') as the first argument will now raise a type error, rather than looking for a namespace binding for the prefix PT1H.
Version 4.0 makes it clear that the casting of a value other than xs:string or xs:untypedAtomic to a list type (whether using a cast expression or a constructor function) is a type error [err:XPTY0004]XP. Previously this was defined as an error, but the kind of error and the error code were left unspecified. Accordingly, the function signatures of the constructor functions for built-in list types have been changed to use an argument type of xs:string?.
The way that fn:min and fn:max compare numeric values of different types has changed. The most noticeable effect is that when these functions are applied to a sequence of xs:integer or xs:decimal values, the result is an xs:integer or xs:decimal, rather than the result of converting this to an xs:double.
The type of the third argument of fn:format-number has changed from xs:string to (xs:string | xs:QName). Because the expected type of this parameter is no longer xs:string, the special coercion rules for xs:string parameters no longer apply. For example, it is no longer possible to supply an instance of xs:anyURI or (when XPath 1.0 compatibility mode is in force) an instance of xs:boolean or xs:duration.
When map:put replaces an entry in a map with a new value for an existing key, in the case where the existing key and the new key differ (for example, if they have different type annotations), it is no longer guaranteed that the new entry includes the new key rather than the existing key.
In regular expressions, the assertions ^ and $ can no longer be followed by a quantifier. This is because (a) a quantifier that allows zero occurrences means that the assertion will always match, and (b) a quantifier that allows multiple occurrences has no effect. Processors may provide an option that allows such regular expressions to be accepted for compatibility reasons.
The index-of now treats NaN as equal to NaN.
For compatibility issues regarding earlier versions, see the 3.1 version of this specification.