View Old View New View Both View Only Previous Next

This draft contains only sections that have differences from the version that it modified.

W3C

XPath and XQuery Functions and Operators 4.0

W3C Editor's Draft 23 February 2026

This version:
https://qt4cg.org/specifications/xpath-functions-40/
Latest version of XPath and XQuery Functions and Operators 4.0:
https://qt4cg.org/specifications/xpath-functions-40/
Most recent Recommendation of XPath and XQuery Functions and Operators:
https://www.w3.org/TR/2017/REC-xpath-functions-31-20170321/
Editor:
Michael Kay, Saxonica <http://www.saxonica.com/>

Please check the errata for any errors or issues reported since publication.

See also translations.

This document is also available in these non-normative formats: Specification in XML format and XML function catalog.


Abstract

This document defines constructor functions, operators, and functions on the datatypes defined in [XML Schema Part 2: Datatypes Second Edition] and the datatypes defined in [XQuery and XPath Data Model (XDM) 3.1]. It also defines functions and operators on nodes and node sequences as defined in the [XQuery and XPath Data Model (XDM) 3.1]. These functions and operators are defined for use in [XML Path Language (XPath) 4.0] and [XQuery 4.0: An XML Query Language] and [XSL Transformations (XSLT) Version 4.0] and other related XML standards. The signatures and summaries of functions defined in this document are available at: http://www.w3.org/2005/xpath-functions/.

A summary of changes since version 3.1 is provided at H Changes since 3.1.

Status of this Document

This version of the specification is work in progress. It is produced by the QT4 Working Group, officially the W3C XSLT 4.0 Extensions Community Group. Individual functions specified in the document may be at different stages of review, reflected in their History notes. Comments are invited, in the form of GitHub issues at https://github.com/qt4cg/qtspecs.

Dedication

The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).


4 Processing numerics

This section specifies arithmetic operators on the numeric datatypes defined in [XML Schema Part 2: Datatypes Second Edition].

4.2 Arithmetic operators on numeric values

The following functions define the semantics of arithmetic operators defined in [XQuery 4.0: An XML Query Language] and [XML Path Language (XPath) 4.0] on these numeric types.

OperatorMeaning
op:numeric-addAddition
op:numeric-subtractSubtraction
op:numeric-multiplyMultiplication
op:numeric-divideDivision
op:numeric-integer-divideInteger division
op:numeric-modModulus
op:numeric-unary-plusUnary plus
op:numeric-unary-minusUnary minus (negation)

The parameters and return types for the above operators are in most cases declared to be of type xs:numeric, which permits the basic numeric types: xs:integer, xs:decimal, xs:float and xs:double, and types derived from them. In general the two-argument functions require that both arguments are of the same primitive type, and they return a value of this same type. The exceptions are op:numeric-divide, which returns an xs:decimal if called with two xs:integer operands, and op:numeric-integer-divide which always returns an xs:integer.

If the two operands of an arithmetic expression are not of the same type, they may be converted to a common type as described in [XML Path Language (XPath) 4.0][TITLE OF XP40 section SPEC, TITLE OF id-arithmetic-expressions SECTION]XP404.9 Arithmetic Expressions.

The result type of operations depends on their argument datatypes and is defined in the following table:

OperatorReturns
op:operation(xs:integer, xs:integer)xs:integer (except for op:numeric-divide(integer, integer), which returns xs:decimal)
op:operation(xs:decimal, xs:decimal)xs:decimal
op:operation(xs:float, xs:float)xs:float
op:operation(xs:double, xs:double)xs:double
op:operation(xs:integer)xs:integer
op:operation(xs:decimal)xs:decimal
op:operation(xs:float)xs:float
op:operation(xs:double)xs:double

The basic rules for addition, subtraction, and multiplication of ordinary numbers are not set out in this specification; they are taken as given. In the case of xs:double and xs:float the rules are as defined in [IEEE 754-2019]. The rules for handling division and modulus operations, as well as the rules for handling special values such as infinity and NaN, and exception conditions such as overflow and underflow, are described more explicitly since they are not necessarily obvious.

On overflow and underflow situations during arithmetic operations, conforming implementations must behave as follows:

  • For xs:float and xs:double operations, overflow behavior must be conformant with [IEEE 754-2019]. This specification allows the following options:

    • Raising a dynamic error [err:FOAR0002] via an overflow trap.

    • Returning INF or -INF.

    • Returning the largest (positive or negative) non-infinite number.

  • For xs:float and xs:double operations, underflow behavior must be conformant with [IEEE 754-2019]. This specification allows the following options:

    • Raising a dynamic error [err:FOAR0002] via an underflow trap.

    • Returning 0.0E0 or +/- 2**Emin or a denormalized value; where Emin is the smallest possible xs:float or xs:double exponent.

  • For xs:decimal operations, overflow behavior must raise a dynamic error [err:FOAR0002]. On underflow, 0.0 must be returned.

  • For xs:integer operations, implementations that support limited-precision integer operations must select from the following options:

    • They may choose to always raise a dynamic error [err:FOAR0002].

    • They may provide an implementation-defined mechanism that allows users to choose between raising an error and returning a result that is modulo the largest representable integer value. See [ISO 10967].

The functions op:numeric-add, op:numeric-subtract, op:numeric-multiply, op:numeric-divide, op:numeric-integer-divide and op:numeric-mod are each defined for pairs of numeric operands, each of which has the same type:xs:integer, xs:decimal, xs:float, or xs:double. The functions op:numeric-unary-plus and op:numeric-unary-minus are defined for a single operand whose type is one of those same numeric types.

For xs:float and xs:double arguments, if either argument is NaN, the result is NaN.

For xs:decimal values, let N be the number of digits of precision supported by the implementation, and let M (M <= N) be the minimum limit on the number of digits required for conformance (18 digits for XSD 1.0, 16 digits for XSD 1.1). Then for addition, subtraction, and multiplication operations, the returned result should be accurate to N digits of precision, and for division and modulus operations, the returned result should be accurate to at least M digits of precision. The actual precision is implementation-defined. If the number of digits in the mathematical result exceeds the number of digits that the implementation retains for that operation, the result is truncated or rounded in an implementation-defined manner.

Note:

This specification does not determine whether xs:decimal operations are fixed point or floating point. In an implementation using floating point it is possible for very simple operations to require more digits of precision than are available; for example, adding 1e100 to 1e-100 requires 200 digits of precision for an accurate representation of the result.

The [IEEE 754-2019] specification also describes handling of two exception conditions called divideByZero and invalidOperation. The IEEE divideByZero exception is raised not only by a direct attempt to divide by zero, but also by operations such as log(0). The IEEE invalidOperation exception is raised by attempts to call a function with an argument that is outside the function’s domain (for example, sqrt(-1) or log(-1)). Although IEEE defines these as exceptions, it also defines “default non-stop exception handling” in which the operation returns a defined result, typically positive or negative infinity, or NaN. With this function library, these IEEE exceptions do not cause a dynamic error at the application level; rather they result in the relevant function or operator returning the defined non-error result. The underlying IEEE exception may be notified to the application or to the user by some implementation-defined warning condition, but the observable effect on an application using the functions and operators defined in this specification is simply to return the defined result (typically -INF, +INF, or NaN) with no error.

The [IEEE 754-2019] specification distinguishes two NaN values: a quiet NaN and a signaling NaN. These two values are not distinguishable in the XDM model: the value spaces of xs:float and xs:double each include only a single NaN value. This does not prevent the implementation distinguishing them internally, and triggering different implementation-defined warning conditions, but such distinctions do not affect the observable behavior of an application using the functions and operators defined in this specification.

Note:

Although comparison of numeric values across heterogeneous types has changed to convert both values to xs:decimal, arithmetic operations continue to use xs:double as the common type.

6 Regular expressions

The functions described in this section make use of a regular expression syntax for pattern matching. The syntax and semantics of regular expressions are defined in this section.

6.3 Functions using regular expressions

FunctionMeaning
fn:matchesReturns true if the supplied string matches a given regular expression.
fn:replaceReturns a string produced from the input string by replacing any segments that match a given regular expression with a supplied replacement string, provided either literally, or by invoking a supplied function.
fn:tokenizeReturns a sequence of strings constructed by splitting the input wherever a separator is found; the separator is any substring that matches a given regular expression.
fn:analyze-stringAnalyzes a string using a regular expression, returning an XML structure that identifies which parts of the input string matched or failed to match the regular expression, and in the case of matched substrings, which substrings matched each capturing group in the regular expression.

6.3.4 fn:analyze-string

Changes in 4.0  

  1. The output of the function is extended to allow the represention of captured groups found within lookahead assertions.  [ PR 1856]

  2. It is now permitted for the regular expression to match a zero-length string.  [ PR 1856]

Summary

Analyzes a string using a regular expression, returning an XML structure that identifies which parts of the input string matched or failed to match the regular expression, and in the case of matched substrings, which substrings matched each capturing group in the regular expression.

Signature
fn:analyze-string(
$valueas xs:string?,
$patternas xs:string,
$flagsas xs:string?:= ""
) as element(fn:analyze-string-result)
Properties

This function is nondeterministic, context-independent, and focus-independent.

Rules

If the $flags argument is omitted or if it is an empty sequence, the effect is the same as setting $flags to a zero-length string. Flags are defined in 6.2 Flags.

If $value is the empty sequence the function behaves as if $value were the zero-length string.

The function returns an element node whose local name is analyze-string-result. This element and all its descendant elements have the namespace URI http://www.w3.org/2005/xpath-functions. The namespace prefix is implementation-dependent. The children of this element are a sequence of fn:match and fn:non-match elements. This sequence is formed by breaking the $value string into a sequence of strings, returning any substring that matches $pattern as the content of an fn:match element, and any intervening substring as the content of an fn:non-match element.

More specifically, the function starts by matching the regular expression against the string, using the supplied $flags, to obtain the disjoint matching segments. For each such segment it constructs an fn:match child, whose string value is the string value of the segment. Before, between, or after these fn:match elements, as required to ensure that the string value of the fn:analyze-string-result element is the same as $value, it inserts fn:non-match elements. The content of an fn:non-match element is always a single (non-empty) text node, and two fn:non-match elements never appear as adjacent siblings.

The captured groups for each disjoint matching segment are represented using fn:group or fn:lookahead-group children of the corresponding fn:match element. Groups captured by a subexpression within a lookahead assertion are referred to as lookahead groups; those not within a lookahead assertion are called ordinary groups.

The content of a role="element-name"fn:match element is in general:

  • A sequence of text nodes and fn:group element children, whose string-values when concatenated comprise the string value of the matching segment, followed by

  • A sequence of zero or more fn:lookahead-group elements, representing the lookahead groups

The string value of an fn:match element may be empty.

An fn:group element with a nr attribute having the integer value N identifies the substring captured by an ordinary group, specifically the string value of the Nth captured group. For each ordinary capturing subexpression there will be at most one corresponding fn:group element in each fn:match element in the result.

By contrast, lookahead groups are represented by fn:lookahead-group elements, which (if they appear at all) must follow all text node and fn:group element children of the fn:match element. These groups may overlap the matching and non-matching substrings, and indeed may overlap each other. They must appear in ascending numerical order of group number. The attributes of the fn:lookahead-group element are as follows:

  • nr: the group number, based on the position of the capturing subexpression that captured the group;

  • value: the string value of the segment that was captured;

  • position: the one-based start position of the segment within the input string.

If the function is called twice with the same arguments, it is implementation-dependent whether the two calls return the same element node or distinct (but deep equal) element nodes. In this respect it is nondeterministic with respect to node identity.

The base URI of the element nodes in the result is implementation-dependent.

A schema is defined for the structure of the returned element: see D.1 Schema for the result of fn:analyze-string.

The result of the function will always be such that validation against this schema would succeed. However, it is implementation-defined whether the result is typed or untyped, that is, whether the elements and attributes in the returned tree have type annotations that reflect the result of validating against this schema.

Error Conditions

A dynamic error is raised [err:FORX0002] if the value of $pattern is invalid according to the rules described in section 6.1 Regular expression syntax.

A dynamic error is raised [err:FORX0001] if the value of $flags is invalid according to the rules described in section 6.2 Flags.

Notes

It is recommended that a processor that implements schema awareness should return typed nodes. The concept of “schema awareness”, however, is a matter for host languages to define and is outside the scope of the function library specification.

The declarations and definitions in the schema are not automatically available in the static context of the fn:analyze-string call (or of any other expression). The contents of the static context are host-language defined, and in some host languages are implementation-defined.

The schema defines the outermost element, analyze-string-result, in such a way that mixed content is permitted. In fact the element will only have element nodes (match and non-match) as its children, never text nodes. Although this might have originally been an oversight, defining the analyze-string-result element with mixed="true" allows it to be atomized, which is potentially useful (the atomized value will be the original input string), and the capability has therefore been retained for compatibility with the 3.0 version of this specification.

The rules for disjoint matching segments allow a zero-length matching segment to immediately follow a non-zero-length matching segment (they are not considered to overlap). This means, for example, that the regular expression .* will typically produce two matches: one matching segment containing all the characters in the input string, and a second zero-length matching seqment at the end position of the string.

Examples

In the following examples, the result document is shown in serialized form, with whitespace between the element nodes. This whitespace is not actually present in the result.

Expression:

analyze-string("The cat sat on the mat.", "\w+")

Result:
<analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions">
  <match>The</match>
  <non-match> </non-match>
  <match>cat</match>
  <non-match> </non-match>
  <match>sat</match>
  <non-match> </non-match>
  <match>on</match>
  <non-match> </non-match>
  <match>the</match>
  <non-match> </non-match>
  <match>mat</match>
  <non-match>.</non-match>
</analyze-string-result>

(with whitespace added for legibility)

Expression:
analyze-string("08-12-03", "^(\d+)\-(\d+)\-(\d+)$")
Result:
<analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions">
  <match>
    <group nr="1">08</group>-<group nr="2">12</group>-<group nr="3">03</group>
  </match>
</analyze-string-result>

(with whitespace added for legibility)

Expression:
analyze-string("A1,C15,,D24, X50,", "([A-Z])([0-9]+)")
Result:
<analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions">
  <match>
    <group nr="1">A</group>
    <group nr="2">1</group>
  </match>
  <non-match>,</non-match>
  <match>
    <group nr="1">C</group>
    <group nr="2">15</group>
  </match>
  <non-match>,,</non-match>
  <match>
    <group nr="1">D</group>
    <group nr="2">24</group>
  </match>
  <non-match>, </non-match>
  <match>
    <group nr="1">X</group>
    <group nr="2">50</group>
  </match>
  <non-match>,</non-match>
</analyze-string-result>

(with whitespace added for legibility)

Expression:
analyze-string("Chapter 5", "(Chapter|Appendix)(?=\s+([0-9]+))")
Result:
<analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions">
  <match>
    <group nr="1">Chapter</group>
    <lookahead-group nr="2" value="5" position="9"/>
  </match>
  <non-match> 5</non-match>  
</analyze-string-result>

(with whitespace added for legibility)

Expression:
analyze-string("There we go", "\b(?=\w+)")
Result:
<analyze-string-result xmlns="http://www.w3.org/2005/xpath-functions">
  <match><lookahead-group nr="1" value="There" position="1"/></match>
  <non-match>There </non-match>
  <match><lookahead-group nr="1" value="we" position="7"/></match>
  <non-match>we </non-match>
  <match><lookahead-group nr="1" value="go" position="10"/></match>
  <non-match>go</non-match>
</analyze-string-result>

(with whitespace added for legibility)