Please check the errata for any errors or issues reported since publication.
See also translations.
This document is also available in these non-normative formats: XML.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
XPath 4.0 is an expression language that allows the processing of values conforming to the data model defined in [XQuery and XPath Data Model (XDM) 4.0]. The name of the language derives from its most distinctive feature, the path expression, which provides a means of hierarchic addressing of the nodes in an XML tree. As well as modeling the tree structure of XML, the data model also includes atomic items, function items, maps, arrays, and sequences. This version of XPath supports JSON as well as XML, and adds many new functions in [XQuery and XPath Functions and Operators 4.0].
XPath 4.0 is a superset of XPath 3.1. A detailed list of changes made since XPath 3.1 can be found in I Change Log.
This is a draft prepared by the QT4CG (officially registered in W3C as the XSLT Extensions Community Group). Comments are invited.
The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).
Michael was central to the development of XML and many related technologies. He brought a polymathic breadth of knowledge and experience to everything he did. This, combined with his indefatigable curiosity and appetite for learning, made him an invaluable contributor to our project, along with many others. We have lost a brilliant thinker, a patient teacher, and a loyal friend.
This section discusses each of the basic kinds of expression. Each kind of expression has a name such as PathExpr, which is introduced on the left side of the grammar production that defines the expression. Since XPath 4.0 is a composable language, each kind of expression is defined in terms of other expressions whose operators have a higher precedence. In this way, the precedence of operators is represented explicitly in the grammar.
The order in which expressions are discussed in this document does not reflect the order of operator precedence. In general, this document introduces the simplest kinds of expressions first, followed by more complex expressions. For the complete grammar, see Appendix [A XPath 4.0 Grammar].
The highest-level symbol in the XPath grammar is XPath.
XPath | ::= | Expr |
Expr | ::= | (ExprSingle ++ ",") |
ExprSingle | ::= | ForExpr |
ExprSingle | ::= | ForExpr |
ForExpr | ::= | ForClauseForLetReturn |
LetExpr | ::= | LetClauseForLetReturn |
QuantifiedExpr | ::= | ("some" | "every") (QuantifierBinding ++ ",") "satisfies" ExprSingle |
IfExpr | ::= | "if" "(" Expr ")" (UnbracedActions | BracedAction) |
OrExpr | ::= | AndExpr ("or" AndExpr)* |
The XPath 4.0 operator that has lowest precedence is the comma operator, which is used to combine two operands to form a sequence. As shown in the grammar, a general expression (Expr) can consist of multiple ExprSingle operands, separated by commas.
The name ExprSingle denotes an expression that does not contain a top-level comma operator (despite its name, an ExprSingle may evaluate to a sequence containing more than one item.)
The symbol ExprSingle is used in various places in the grammar where an expression is not allowed to contain a top-level comma. For example, each of the arguments of a function call must be a ExprSingle, because commas are used to separate the arguments of a function call.
After the comma, the expressions that have next lowest precedence are ForExpr, LetExpr, QuantifiedExpr, IfExpr, and OrExpr. Each of these expressions is described in a separate section of this document.
Functions in XPath 4.0 arise in two ways:
A function definition contains information about a family of functions with the same name and a defined arity range. These functions are in most cases known statically (they appear in the statically known function definitions), but there may be further function definitions that are known only dynamically (appearing in the dynamically known function definitions).
Function items are XDM items that can be called using a dynamic function call. They are values that can be bound to variables, passed as arguments, returned as function results, and generally manipulated in the same way as other XDM values.
The functions defined by a statically known function definition can be invoked using a static function call. Function items corresponding to these definitions can also be obtained, as dynamic values, by evaluating a named function reference. Function items can also be obtained using the fn:function-lookup function: in this case the function name and arity do not need to be known statically, and the function definition need not be present in the static context, so long as it is in the dynamic context.
Static and dynamic function calls are described in the following sections.
A function item is an XDM value that can be bound to a variable, or manipulated in various ways by XPath 4.0 expressions. The most significant such expression is a dynamic function call, which supplies values of arguments and evaluates the function to produce a result.
The syntax of dynamic function calls is defined in 4.5.2.1 Dynamic Function Calls.
A number of constructs can be used to produce a function item, notably:
A named function reference (see 4.5.2.4 Named Function References) constructs a function item by reference to function definitions in the static context. For example, fn:node-name#1 returns a function item whose effect is to call the static fn:node-name function with one argument.
An inline function (see 4.5.2.5 Inline Function Expressions ) constructs a function item whose body is defined locally. For example, the construct fn($x) { $x + 1 } returns a function item whose effect is to increment the value of the supplied argument.
A partial function application (see 4.5.2.3 Partial Function Application) derives one function item from another by supplying the values of some of its arguments. For example, fn:ends-with(?, ".txt") returns a function item with one argument that tests whether the supplied string ends with the substring ".txt".
Maps and arrays are also function items. See 4.13.1.1 Map Constructors and 4.13.2.1 Array Constructors.
The fn:function-lookup function can be called to discover functions that are present in the dynamic context.
The fn:load-xquery-module function can be called to load functions dynamically from an external XQuery library module.
Some system functions such as fn:random-number-generator and fn:op return a function item as their result.
These constructs are described in detail in the following sections, or in [XQuery and XPath Functions and Operators 4.0].
It is sometimes useful to be able to establish whether two variables refer to the same function or to different functions. For this purpose, every function item has an identity. Functions with the same identity are indistinguishable in every way; in particular, any function call with identical arguments will produce an identical result.
In general, evaluation of an expression that returns a function item other than one that was present in its operands delivers a function item whose identity is unique, and thus distinct from any other function item. There are two exceptions to this rule:
Evaluating a function reference such as count#1 returns the same function every time. Specifically, if the function name identifies a function definition that is not context dependent (which is the most usual case), then all function references using this function name and arity return the same function. For more details see 4.5.2.4 Named Function References.
An optimizer is permitted to rewrite deterministicFO expressions in such a way that repeated evaluation is avoided, and this may be done without consideration of function identity. For example:
If the expression contains(?, "e") appears within the body of a for clause, or if the same expression is written repeatedly in a query, then an optimizer may decide to evaluate it once only, and thus return the same function item each time.
Similarly, if the expression fn($x) { $x + 1 } appears more than once, or is evaluated repeatedly, then it may return the same function each time.
Optimizers are allowed to replace any expression with an equivalent expression. For example, count(?) may be rewritten as count#1. Similarly, fn($x) { $x + 1 } may be rewritten as fn($y) { $y + 1 }. This may lead to different expressions returning identical function items.
In principle, two function items are not identical if they differ in their captured context. Optimizers, however, will often be able to eliminate parts of the captured context that a function does not actually use. For example, an inline function expression delivers a function item whose captured context includes the values of all nonlocal in-scope variables; but in practice the implementation is unlikely to retain the values of such variables unless they are actually referenced.