Please check the errata for any errors or issues reported since publication.
See also translations.
This document is also available in these non-normative formats: Specification in XML format and XML function catalog.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
This document defines constructor functions, operators, and functions on the datatypes defined in [XML Schema Part 2: Datatypes Second Edition] and the datatypes defined in [XQuery and XPath Data Model (XDM) 3.1]. It also defines functions and operators on nodes and node sequences as defined in the [XQuery and XPath Data Model (XDM) 3.1]. These functions and operators are defined for use in [XML Path Language (XPath) 4.0] and [XQuery 4.0: An XML Query Language] and [XSL Transformations (XSLT) Version 4.0] and other related XML standards. The signatures and summaries of functions defined in this document are available at: http://www.w3.org/2005/xpath-functions/.
A summary of changes since version 3.1 is provided at G Changes since 3.1.
This version of the specification is work in progress. It is produced by the QT4 Working Group, officially the W3C XSLT 4.0 Extensions Community Group. Individual functions specified in the document may be at different stages of review, reflected in their History notes. Comments are invited, in the form of GitHub issues at https://github.com/qt4cg/qtspecs.
The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).
The functions described in this section make use of a regular expression syntax for pattern matching. The syntax and semantics of regular expressions are defined in this section.
| Function | Meaning |
|---|---|
fn:matches | Returns true if the supplied string matches a given regular expression. |
fn:replace | Returns a string produced from the input string by replacing any segments that match a given regular expression with a supplied replacement string, provided either literally, or by invoking a supplied function. |
fn:tokenize | Returns a sequence of strings constructed by splitting the input wherever a separator is found; the separator is any substring that matches a given regular expression. |
fn:analyze-string | Analyzes a string using a regular expression, returning an XML structure that identifies which parts of the input string matched or failed to match the regular expression, and in the case of matched substrings, which substrings matched each capturing group in the regular expression. |
Returns a string produced from the input string by replacing any segments that match a given regular expression with a supplied replacement string, provided either literally, or by invoking a supplied function.
fn:replace( | ||
$value | as , | |
$pattern | as , | |
$replacement | as | := (), |
$flags | as | := '', |
$action | as | := () |
) as | ||
This function is deterministic, context-independent, and focus-independent.
If $value is the empty sequence, it is interpreted as the zero-length string.
If the $flags argument is omitted or if it is an empty sequence, the effect is the same as setting $flags to a zero-length string. Flags are defined in 6.2 Flags.
The string $value is matched against the regular expression $pattern, using the supplied $flags, to obtain a set of disjoint matching segments. A replacement string R for each of these segments (say M) is determined by the valuesvalue of the $replacement and/or argument$action arguments, by applying the first of the following rules that applies:
If $replacement is absent or empty, R is a zero-length string.
If the $action argument is present and is not an empty sequence, R is obtained by calling the $action function.
If $replacement is a function item F, then R is obtained by calling F, and then applying the function fn:string to the result.
The first argument to the $actionF function is the string to be replaced, provided as xs:untypedAtomic.
The second argument to the $actionF function provides the captured groups as an xs:untypedAtomic sequence. The Nth item in this sequence is the string value of the segment captured by the Nth capturing subexpression. If the Nth capturing subexpression was not matched, the Nth item will be the zero-length string.
Note that the rules for function coercion mean that the function actually supplied for the $actionF parameter may be an arity-1 function: the second argument does not need to be declared if it is not used.
The replacement string R is obtained by applying the fn:string to the result of the function call.
If $replacement is absent or empty, R is a zero-length string.
If $replacement is a string and the q flag is present, R is the value of $replacement.
Otherwise, the value of $replacement is processed as follows.
Within the supplied $replacement string, a variable marker $N (where N is an unsigned integer) may be used to refer to the Nth captured group associated with M. The replacement string R is obtained by replacing each of these variable markers with the string value of the relevant captured group. The variable marker $0 refers to the substring captured by the regular expression as a whole.
A literal $ character within the replacement string must be written as \$, and a literal \ character must be written as \\.
More specifically, the rules are as follows, where S is the number of capturing subexpressions in the regular expression, and N is the decimal number formed by taking all the digits that consecutively follow the $ character in $replacement:
If N=0, then the variable is replaced by the string value of M.
If 1<=N<=S, then the variable marker is replaced by the string value of the Nth captured group associated with M. If the Nth parenthesized sub-expression was not matched, then the variable marker is replaced by the zero-length string.
If S<N<=9, then the variable marker is replaced by the zero-length string.
Otherwise (if N>S and N>9), the last digit of N is taken to be a literal character to be included “as is” in the replacement string, and the rules are reapplied using the number N formed by stripping off this last digit.
For example, if the replacement string is "$23" and there are 5 substrings, the result contains the value of the substring that matches the second capturing subexpression, followed by the digit 3.
The function returns the xs:string that is obtained by replacing each of the disjoint matching segments of $value with the corresponding value of R.
A dynamic error is raised [err:FORX0002] if the value of $pattern is invalid according to the rules described in section 6.1 Regular expression syntax.
A dynamic error is raised [err:FORX0001] if the value of $flags is invalid according to the rules described in section 6.2 Flags.
In the absence of the q flag, a dynamic error is raised [err:FORX0004] if the value of $replacement contains a dollar sign ($) character that is not immediately followed by a digit 0-9 and not immediately preceded by a backslash (\).
In the absence of the q flag, a dynamic error is raised [err:FORX0004] if the value of $replacement contains a backslash (\) character that is not part of a \\ pair, unless it is immediately followed by a dollar sign ($) character.
A dynamic error is raised [err:FORX0005] if both the $replacement and $action arguments are supplied, and neither is an empty sequence.
If the input string contains no substring that matches the regular expression, the result of the function is a single string identical to the input string.
If two overlapping substrings of $value both match the $pattern, then only the first one (that is, the one whose first character comes first in the $value string) is replaced.
If two alternatives within the pattern both match at the same position in the $input, then the match that is chosen is the one matched by the first alternative. For example:
replace("abcd", "(ab)|(a)", "[1=$1][2=$2]") returns "[1=ab][2=]cd"| Expression: |
|
|---|---|
| Result: | "a*cada*" |
| Expression: |
|
| Result: | "*" |
| Expression: |
|
| Result: | "*c*bra" |
| Expression: |
|
| Result: | "brcdbr" |
| Expression: |
|
| Result: | "abbraccaddabbra" |
| Expression: |
|
| Result: | "b" |
| Expression: |
|
| Result: | "bbbb" |
| Expression: |
|
| Result: | "|In| |the| |beginning| |was| |the| |Word|" |
| Expression: |
|
| Result: | "a!b!c!d!" |
| Expression: |
|
| Result: | "carted" (TheOnly the first |
| Expression: | replace("abracadabra", "bra", action := fn { "*" })replace("abracadabra", "bra", upper-case#1) |
| Result: | "a*cada*""aBRAcadaBRA" |
| Expression: | replace(
"abracadabra",
"bra",
action := upper-case#1
) |
| Result: | "aBRAcadaBRA" |
| Expression: | replace("Chapter 9", "[0-9]+", action := fn { . + 1 })replace("Chapter 9", "[0-9]+", fn { . + 1 }) |
| Result: | "Chapter 10" |
| Expression: | replace(
"LHR to LAX",
"\b[A-Z]{3}\b",
action := { 'LAX': 'Los Angeles', 'LHR': 'London' }
)replace(
"LHR to LAX",
"\b[A-Z]{3}\b",
{ 'LAX': 'Los Angeles', 'LHR': 'London' }
) |
| Result: | "London to Los Angeles" |
| Expression: | replace(
"57°43′30″",
"([0-9]+)°([0-9]+)′([0-9]+)″",
action := fn($s, $groups) {
string($groups[1] + $groups[2] ÷ 60 + $groups[3] ÷ 3600) || '°'
}
)replace(
"57°43′30″",
"([0-9]+)°([0-9]+)′([0-9]+)″",
fn($s, $groups) {
string($groups[1] + $groups[2] ÷ 60 + $groups[3] ÷ 3600) || '°'
}
) |
| Result: | "57.725°" |
Use the arrows to browse significant changes since the 3.1 version of this specification.
See 1 Introduction
Sections with significant changes are marked Δ in the table of contents. New functions introduced in this version are marked ➕ in the table of contents.
See 1 Introduction
PR 1547 1551
New in 4.0
PR 629 803
New in 4.0
See 3.2.2 fn:message
PR 1260 1275
A third argument has been added, providing control over the rounding mode.
See 4.4.4 fn:round
New in 4.0
See 4.4.7 fn:is-NaN
PR 1049 1151
Decimal format parameters can now be supplied directly as a map in the third argument, rather than referencing a format defined in the static context.
PR 1205 1230
New in 4.0
See 4.8.2 math:e
See 4.8.16 math:sinh
See 4.8.17 math:cosh
See 4.8.18 math:tanh
The 3.1 specification suggested that every value in the result range should have the same chance of being chosen. This has been corrected to say that the distribution should be arithmetically uniform (because there are as many xs:double values between 0.01 and 0.1 as there are between 0.1 and 1.0).
PR 261 306 993
New in 4.0
See 5.4.1 fn:char
New in 4.0
PR 937 995 1190
New in 4.0
See 5.4.13 fn:hash
The $action argument is new in 4.0.
See 6.3.2 fn:replace
New in 4.0
PR 1423 1413
New in 4.0
New in 4.0
Reformulated in 4.0 in terms of the new fn:in-scope-namespaces function; the semantics are unchanged.
Reformulated in 4.0 in terms of the new fn:in-scope-namespaces function; the semantics are unchanged.
New in 4.0
New in 4.0
See 14.1.12 fn:slice
New in 4.0. The function is identical to the internal op:same-key function in 3.1
PR 1120 1150
A callback function can be supplied for comparing individual items.
Changed in 4.0 to use transitive equality comparisons for numeric values.
PR 614 987
New in 4.0
New in 4.0. Originally proposed under the name fn:uniform
New in 4.0. Originally proposed under the name fn:unique
PR 1117 1279
The $options parameter has been added.
Additional options to control DTD and XInclude processing have been added.
A new function is available for processing input data in HTML format.
PR 259 956
New in 4.0
An option is provided to control how JSON numbers should be formatted.
Additional options are available, as defined by fn:parse-json.
New in 4.0
New in 4.0
New in 4.0
New in 4.0
See 17.2.4 fn:every
New in 4.0
New in 4.0
New in 4.0
New in 4.0
New in 4.0
See 17.2.17 fn:some
PR 521 761
New in 4.0
New in 4.0
A third argument is added, allowing user control of how absent keys should be handled.
See 18.4.9 map:get
New in 4.0
PR 478 515
New in 4.0
New in 4.0
New in 4.0
See 18.4.15 map:pair
New in 4.0
New in 4.0.
New in 4.0
A third argument is added, allowing user control of how index-out-of-bounds conditions should be handled.
PR 968 1295
New in 4.0
PR 476 1087
New in 4.0
PR 360 476
New in 4.0
New in 4.0
New in 4.0
New in 4.0
Supplying an empty sequence as the value of an optional argument is equivalent to omitting the argument.
New functions are provided to obtain information about built-in types and types defined in an imported schema.
Options are added to customize the form of the output.
See 2.2.6 fn:path
PR 533 719 834
New functions are available for processing input data in CSV (comma separated values) format.
PR 734 1233
New in 4.0
See 17.2.2 fn:chain
A new function fn:elements-to-maps is provided for converting XDM trees to maps suitable for serialization as JSON. Unlike the fn:xml-to-json function retained from 3.1, this can handle arbitrary XML as input.
New in 4.0
New in 4.0.
The default for the escape option has been changed to false. The 3.1 specification gave the default value as true, but this appears to have been an error, since it was inconsistent with examples given in the specification and with tests in the test suite.
The spec has been corrected to note that the function depends on the implicit timezone.
In 3.1, given a mixed input sequence such as (1, 3, 4.2e0), the specification was unclear whether it was permitted to add the first two integer items using integer arithmetic, rather than converting all items to doubles before performing any arithmetic. The 4.0 specification is clear that this is permitted; but since the items can be reordered before being added, this is not required.
See 14.4.2 fn:avg
See 14.4.5 fn:sum
It is explicitly stated that the limits for $precision are implementation-defined.
See 4.4.4 fn:round
It is no longer guaranteed that the new key replaces the existing key.
See 18.4.17 map:put
New in 4.0
The $replacement argument can now be a function that computes the replacement strings.
See 6.3.2 fn:replace
PR 173
New in 4.0
See 17.3.4 fn:op
PR 203
New in 4.0
See 18.4.1 map:build
PR 207
New in 4.0
PR 222
New in 4.0
See 14.2.7 fn:starts-with-subsequence
PR 250
New in 4.0
See 14.1.3 fn:foot
See 14.1.15 fn:trunk
PR 258
New in 4.0
PR 313
The second argument can now be a sequence of integers.
See 14.1.8 fn:remove
PR 314
New in 4.0
PR 326
Higher-order functions are no longer an optional feature.
See 1.2 Conformance
PR 419
New in 4.0
PR 434
New in 4.0
The function has been extended to allow output in a radix other than 10, for example in hexadecimal.
PR 482
Deleted an inaccurate statement concerning the behavior of NaN.
PR 507
New in 4.0
PR 546
The rules regarding use of non-XML characters in JSON texts have been relaxed.
PR 623
Substantially revised to allow multiple sort key definitions.
See 17.2.18 fn:sort
PR 631
New in 4.0
PR 662
Constructor functions now have a zero-arity form; the first argument defaults to the context item.
PR 680
The case-insensitive collation is now defined normatively within this specification, rather than by reference to the HTML "living specification", which is subject to change. The collation can now be used for ordering comparisons as well as equality comparisons.
PR 702
The function can now take any number of arguments (previously it had to be two or more), and the arguments can be sequences of strings rather than single strings.
See 5.4.4 fn:concat
PR 710
Changes the function to return a sequence of key-value pairs rather than a map.
PR 727
It has been clarified that loading a module has no effect on the static or dynamic context of the caller.
PR 795
New in 4.0
PR 828
The $predicate callback function accepts an optional position argument.
See 17.2.5 fn:filter
The $action callback function accepts an optional position argument.
The $predicate callback function now accepts an optional position argument.
The $action callback function now accepts an optional position argument.
PR 881
The way that fn:min and fn:max compare numeric values of different types has changed. The most noticeable effect is that when these functions are applied to a sequence of xs:integer or xs:decimal values, the result is an xs:integer or xs:decimal, rather than the result of converting this to an xs:double
See 14.4.3 fn:max
See 14.4.4 fn:min
PR 901
All three arguments are now optional, and each argument can be set to an empty sequence. Previously if $description was supplied, it could not be empty.
See 3.1.1 fn:error
The $label argument can now be set to an empty sequence. Previously if $label was supplied, it could not be empty.
See 3.2.1 fn:trace
The third argument can now be supplied as an empty sequence.
The second argument can now be an empty sequence.
The optional second argument can now be supplied as an empty sequence.
The 3rd, 4th, and 5th arguments are now optional; previously the function required either 2 or 5 arguments.
The optional third argument can now be supplied as an empty sequence.
PR 905
The rule that multiple calls on fn:doc supplying the same absolute URI must return the same document node has been clarified; in particular the rule does not apply if the dynamic context for the two calls requires different processing of the documents (such as schema validation or whitespace stripping).
See 14.6.1 fn:doc
PR 909
The function has been expanded in scope to handle comparison of values other than strings.
PR 924
Rules have been added clarifying that users should not be allowed to change the schema for the fn namespace.
See C Schemas
PR 925
The decimal format name can now be supplied as a value of type xs:QName, as an alternative to supplying a lexical QName as an instance of xs:string.
PR 932
The specification now prescribes a minimum precision and range for durations.
PR 933
When comments and processing instructions are ignored, any text nodes either side of the comment or processing instruction are now merged prior to comparison.
PR 940
New in 4.0
PR 953
Constructor functions for named record types have been introduced.
PR 962
New in 4.0
PR 969
New in 4.0
See 18.4.3 map:empty
PR 984
New in 4.0
See 9.4.1 fn:seconds
PR 987
The order of results is now prescribed; it was previously implementation-dependent.
PR 988
New in 4.0
See 15.3.8 fn:pin
See 15.3.9 fn:label
PR 1022
Regular expressions can include comments (starting and ending with #) if the c flag is set.
See 6.1 Regular expression syntax
See 6.2 Flags
PR 1028
An option is provided to control how the JSON null value should be handled.
PR 1032
New in 4.0
See 14.1.17 fn:void
PR 1046
New in 4.0
PR 1059
Use of an option keyword that is not defined in the specification and is not known to the implementation now results in a dynamic error; previously it was ignored.
See 1.7 Options
PR 1068
New in 4.0
PR 1072
The return type is now specified more precisely.
PR 1090
When casting from a string to a duration or time or dateTime, it is now specified that when there are more digits in the fractional seconds than the implementation is able to retain, excess digits are truncated. Rounding upwards (which could affect the number of minutes or hours in the value) is not permitted.
PR 1093
New in 4.0
PR 1117
The $options parameter has been added.
PR 1182
The $predicate callback function may return an empty sequence (meaning false).
See 17.2.4 fn:every
See 17.2.5 fn:filter
See 17.2.17 fn:some
PR 1191
New in 4.0
See 2.3.1 fn:distinct-ordered-nodes
The $options parameter has been added, absorbing the $collation parameter.
PR 1250
For selected properties including percent and exponent-separator, it is now possible to specify a single-character marker to be used in the picture string, together with a multi-character rendition to be used in the formatted output.
PR 1257
The $options parameter has been added.
PR 1262
New in 4.0
PR 1265
The constraints on the result of the function have been relaxed.
PR 1280
As a result of changes to the coercion rules, the number of supplied arguments can be greater than the number required: extra arguments are ignored.
See 17.2.1 fn:apply
PR 1288
Additional error conditions have been defined.
PR 1296
New in 4.0
PR 1333
A new option is provided to allow the content of the loaded module to be supplied as a string.
PR 1353
An option has been added to suppress the escaping of the solidus (forwards slash) character.
PR 1358
New in 4.0
PR 1361
The term atomic value has been replaced by atomic item.
See 1.9 Terminology
PR 1393
Changes the function to return a sequence of key-value pairs rather than a map.
PR 1409
This section now uses the term primitive type strictly to refer to the 20 atomic types that are not derived by restriction from another atomic type: that is, the 19 primitive atomic types defined in XSD, plus xs:untypedAtomic. The three types xs:integer, xs:dayTimeDuration, and xs:yearMonthDuration, which have custom casting rules but are not strictly-speaking primitive, are now handled in other subsections.
See 22.1 Casting from primitive types to primitive types
The rules for conversion of dates and times to strings are now defined entirely in terms of XSD 1.1 canonical mappings, since these deliver exactly the same result as the XPath 3.1 rules.
See 22.1.2.2 Casting date/time values to xs:string
The rules for conversion of durations to strings are now defined entirely in terms of XSD 1.1 canonical mappings, since the XSD 1.1 rules deliver exactly the same result as the XPath 3.1 rules.
PR 1455
Numbers now retain their original lexical form, except for any changes needed to satisfy JSON syntax rules (for example, stripping leading zero digits).
PR 1481
The function has been extended to handle other Gregorian types such as xs:gYearMonth.
See 10.5.1 fn:year-from-dateTime
See 10.5.2 fn:month-from-dateTime
The function has been extended to handle other Gregorian types such as xs:gMonthDay.
See 10.5.3 fn:day-from-dateTime
The function has been extended to handle other types including xs:time.
See 10.5.4 fn:hours-from-dateTime
See 10.5.5 fn:minutes-from-dateTime
The function has been extended to handle other types such as xs:gYearMonth.
PR 1504
New in 4.0
Optional $separator added.
PR 1523
New in 4.0
PR 1545
New in 4.0
PR 1570
New in 4.0
PR 1703
The order of entries in maps is retained.
Ordered maps are introduced.
Enhanced to allow for ordered maps.
See 18.4.7 map:find
See 18.4.17 map:put
PR 1727
For consistency with the new functions map:build and map:of-pairs, the handling of duplicates may now be controlled by supplying a user-defined callback function as an alternative to the fixed values for the earlier duplicates option.
PR 1856
Word boundaries can be matched. Lookahead and lookbehind assertions are supported. Assertions (including ^ and $) can no longer be followed by a quantifier.
See 6.1 Regular expression syntax
It is now permitted for the regular expression to match a zero-length string.
See 6.3.2 fn:replace
The output of the function is extended to allow the represention of captured groups found within lookahead assertions.
It is now permitted for the regular expression to match a zero-length string.