Please check the errata for any errors or issues reported since publication.
See also translations.
This document is also available in these non-normative formats: XML.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
XPath 4.0 is an expression language that allows the processing of values conforming to the data model defined in [XQuery and XPath Data Model (XDM) 4.0]. The name of the language derives from its most distinctive feature, the path expression, which provides a means of hierarchic addressing of the nodes in an XML tree. As well as modeling the tree structure of XML, the data model also includes atomic items, function items, maps, arrays, and sequences. This version of XPath supports JSON as well as XML, and adds many new functions in [XQuery and XPath Functions and Operators 4.0].
XPath 4.0 is a superset of XPath 3.1. A detailed list of changes made since XPath 3.1 can be found in I Change Log.
This is a draft prepared by the QT4CG (officially registered in W3C as the XSLT Extensions Community Group). Comments are invited.
The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).
Michael was central to the development of XML and many related technologies. He brought a polymathic breadth of knowledge and experience to everything he did. This, combined with his indefatigable curiosity and appetite for learning, made him an invaluable contributor to our project, along with many others. We have lost a brilliant thinker, a patient teacher, and a loyal friend.
This section discusses each of the basic kinds of expression. Each kind of expression has a name such as PathExpr, which is introduced on the left side of the grammar production that defines the expression. Since XPath 4.0 is a composable language, each kind of expression is defined in terms of other expressions whose operators have a higher precedence. In this way, the precedence of operators is represented explicitly in the grammar.
The order in which expressions are discussed in this document does not reflect the order of operator precedence. In general, this document introduces the simplest kinds of expressions first, followed by more complex expressions. For the complete grammar, see Appendix [A XPath 4.0 Grammar].
The highest-level symbol in the XPath grammar is XPath.
XPath | ::= | Expr |
Expr | ::= | (ExprSingle ++ ",") |
ExprSingle | ::= | ForExpr |
ExprSingle | ::= | ForExpr |
ForExpr | ::= | ForClauseForLetReturn |
LetExpr | ::= | LetClauseForLetReturn |
QuantifiedExpr | ::= | ("some" | "every") (QuantifierBinding ++ ",") "satisfies" ExprSingle |
IfExpr | ::= | "if" "(" Expr ")" (UnbracedActions | BracedAction) |
OrExpr | ::= | AndExpr ("or" AndExpr)* |
The XPath 4.0 operator that has lowest precedence is the comma operator, which is used to combine two operands to form a sequence. As shown in the grammar, a general expression (Expr) can consist of multiple ExprSingle operands, separated by commas.
The name ExprSingle denotes an expression that does not contain a top-level comma operator (despite its name, an ExprSingle may evaluate to a sequence containing more than one item.)
The symbol ExprSingle is used in various places in the grammar where an expression is not allowed to contain a top-level comma. For example, each of the arguments of a function call must be a ExprSingle, because commas are used to separate the arguments of a function call.
After the comma, the expressions that have next lowest precedence are ForExpr, LetExpr, QuantifiedExpr, IfExpr, and OrExpr. Each of these expressions is described in a separate section of this document.
Most modern programming languages have support for collections of key/value pairs, which may be called maps, dictionaries, associative arrays, hash tables, keyed lists, or objects (these are not the same thing as objects in object-oriented systems). In XPath 4.0, we call these maps. Most modern programming languages also support ordered lists of values, which may be called arrays, vectors, or sequences. In XPath 4.0, we have both sequences and arrays. Unlike sequences, an array is an item, and can appear as an item in a sequence.
Note:
The XPath 4.0 specification focuses on syntax provided for maps and arrays, especially constructors and lookup.
Some of the functionality typically needed for maps and arrays is provided by functions defined in Section 18 Processing mapsFO and Section 19 Processing arraysFO, including functions used to read JSON to create maps and arrays, serialize maps and arrays to JSON, combine maps to create a new map, remove map entries to create a new map, iterate over the keys of a map, convert an array to create a sequence, combine arrays to form a new array, and iterate over arrays in various ways.
The lookup operator ? can now be followed by a string literal, for cases where map keys are strings other than NCNames. It can also be followed by a variable reference.
A deep lookup operator ?? is provided for searching trees of maps and arrays. [Issue 297 PR 837 23 November 2023]
Lookup expressions can now take a modifier (such as keys, values, or pairs) enabling them to return structured results rather than a flattened sequence. [Issues 960 1094 PR 1125 23 April 2024]
An inline function may be annotated as a %method, giving it access to its containing map. [Issues 1800 1845 PRs 1817 1853 4 March 2025]
The key specifier can reference an item type or sequence type, to select values of that type only. This is especially useful when processing trees of maps and arrays, as encountered when processing JSON input. [Issues 1456 1866 PRs 1864 1877]
XPath 4.0 provides two lookup operators ? and ?? for maps and arrays. These provide a terse syntax for accessing the entries in a map or the members of an array.
The operator "?", known as the shallow lookup operator, returns values found immediately in the operand map or array. The operator "??", known as the deep lookup operator, also searches nested maps and arrays. The effect of the deep lookup operator "??" is explained in 4.13.3.3 Deep Lookup.
LookupExpr | ::= | PostfixExprLookup |
PostfixExpr | ::= | PrimaryExpr | FilterExpr | DynamicFunctionCall | LookupExpr | FilterExprAM |
Lookup | ::= | ("?" | "??") (Modifier "::")? KeySpecifier |
Modifier | ::= | "pairs" | "keys" | "values" | "items" |
KeySpecifier | ::= | NCName | IntegerLiteral | StringLiteral | VarRef | ParenthesizedExpr | LookupWildcard | TypeSpecifier |
IntegerLiteral | ::= | Digits |
| /* ws: explicit */ | ||
Digits | ::= | DecDigit ((DecDigit | "_")* DecDigit)? |
| /* ws: explicit */ | ||
DecDigit | ::= | [0-9] |
| /* ws: explicit */ | ||
StringLiteral | ::= | AposStringLiteral | QuotStringLiteral |
| /* ws: explicit */ | ||
VarRef | ::= | "$" EQName |
ParenthesizedExpr | ::= | "(" Expr? ")" |
LookupWildcard | ::= | "*" |
TypeSpecifier | ::= | "~[" SequenceType "]" |
SequenceType | ::= | ("empty-sequence" "(" ")") |
A Lookup has two parts: the KeySpecifier determines which entries (in a map) or members (in an array) are selected, and the Modifier determines how they are delivered in the result. The default modifier is items, which delivers the result as a flattened sequence of items.
To take a simple example, given $A as an array [ ("a", "b"), ("c", "d"), ("e", "f"), 42 ], some example Lookup expressions are:
| Expression | Result |
|---|---|
$A?* (or $A?items::*) | ("a", "b", "c", "d", "e", "f", 42) |
$A?pairs::* | ({ "key": 1, "value": ("a", "b") },
{ "key": 2, "value": ("c", "d") },
{ "key": 3, "value": ("e", "f") },
{ "key": 4, "value": 42 }) |
$A?values::* | ([ "a", "b" ], [ "c", "d" ], [ "e", "f" ], [42]) |
$A?keys::* | (1, 2, 3, 4) |
$A?2 (or $A?items::2) | ("c", "d") |
$A?pairs::2 | ({ "key": 2, "value":("c", "d") }) |
$A?values::2 | ([ "c", "d" ]) |
$A?keys::2 | (2) |
$A?(3, 1) (or $A?items::(3, 1)) | ("e", "f", "a", "b") |
$A?pairs::(3, 1) | ({ "key": 3, "value": ("e", "f") },
{ "key": 1, "value": ("a", "b") }) |
$A?values::(3, 1) | ([ "e", "f" ][ "a", "b" ]) |
$A?keys::(3, 1) | (3, 1) |
$A?~[xs:integer] | 42 |
$A?keys::~[xs:integer] | 4 |
$A?keys::~[xs:string+] | (1, 2, 3) |
Similarly, given $M as a map { "X": ("a", "b"), "Y": ("c", "d"), "Z": ("e", "f"), "N": 42 }, some example lookup expressions are as follows.
| Expression | Result |
|---|---|
$M?* (or $M?items::*) | ("a", "b", "c", "d", "e", "f", 42) |
$M?pairs::* | ({ "key": "X", "value": ("a", "b") },
{ "key": "Y", "value": ("c", "d") },
{ "key": "Z", "value": ("e", "f") },
{ "key": "N", "value": 42 }) |
$M?values::* | ([ "a", "b" ], [ "c", "d" ], [ "e", "f" ], [42]) |
$M?keys::* | ("X", "Y", "Z", "N") |
$M?Y (or $M?items::Y) | ("c", "d") |
$M?pairs::Y | ({ "key": "Y", "value":("c", "d") }) |
$M?values::Y | ([ "c", "d" ]) |
$M?keys::Y | ("Y") |
$M?("Z", "X") (or $A?items::("Z", "X")) | ("e", "f", "a", "b") |
$M?pairs::("Z", "X") | ({ "key": "Z", "value": ("e", "f") },
{ "key": "X", "value": ("a", "b") }) |
$M?values::("Z", "X") | ([ "e", "f" ][ "a", "b" ]) |
$M?keys::("Z", "X") | ("Z", "X") |
$M?~[xs:integer] | 42 |
$M?keys::~[xs:integer] | "N" |
$M?keys::~[xs:string+] | ("X", "Y", "Z") |
The semantics of a postfix lookup expression E?pairs::KS are defined as follows. The results with other modifiers can be derived from this result, as explained below.
E is evaluated to produce a value $V.
If $V is not a singleton (that is if count($V) ne 1), then the result (by recursive application of these rules) is the value of for $v in $V return $v?pairs::KS.
If $V is a singleton array item (that is, if $V instance of array(*)) then:
If the KeySpecifierKS is a ParenthesizedExpr, then it is evaluated to produce a value $K and the result is:
data($K) ! { "key": ., "value": array:get($V, .) }Note:
The focus for evaluating the key specifier expression is the same as the focus for the Lookup expression itself.
If the KeySpecifierKS is an IntegerLiteral with value $i, the result is the same as $V?pairs::($i).
If the KeySpecifierKS is an NCNameor StringLiteral, the expression raises a type error [err:XPTY0004].
If the KeySpecifierKS is a wildcard (*), the result is the same as $V?pairs::(1 to array:size($V)):
Note:
Note that array items are returned in order.
If the KeySpecifierKS is a TypeSpecifier~[T], the result is the same as $V?pairs::*[?value instance of T].
If $V is a singleton map item (that is, if $V instance of map(*)) then:
If the KeySpecifierKS is a ParenthesizedExpr, then it is evaluated to produce a value $K and the result is:
data($K) ! { "key": ., "value": map:get($V, .) }Note:
The focus for evaluating the key specifier expression is the same as the focus for the Lookup expression itself.
If the KeySpecifierKS is an NCName or a StringLiteral, with value $S, the result is the same as $V?pairs::($S)
If the KeySpecifierKS is an IntegerLiteral with value $N, the result is the same as $V?pairs::($N).
If the KeySpecifierKS is a wildcard (*), the result is the same as $V?pairs::(map:keys($V)).
Note:
The order of entries in the result sequence reflects the entry orderDM of the map.
If the KeySpecifierKS is a TypeSpecifier~T, the result is the same as $V?pairs::*[?value instance of T]. Note that T is in general a sequence type: if there is an occurrence indicator, then it must be written within parentheses, but if it is a plain item type with no occurrence indicator, then the parentheses may be omitted.
Otherwise (that is, if $V is neither a map nor an array) a type error is raised [err:XPTY0004].
For modifiers other than pairs, the resulting key-value pair is post-processed as follows:
If the modifier is items (explicitly or by default), and the key specifier is an NCName or StringLiteral, then the result of $V?items::KS is the result of the expression:
for $KVP in $V?pairs::KS
let $value := map:get($KVP, 'value')
return if ($value instance of %method function(*))
then bind-focus($value, $V)
else $valuewhere bind-focus($F, $V) is a function that takes a function item $F and returns a modified function item whose captured context has the focus set to $V: for more detail see 4.5.6.1 Methods.
Note:
The effect of this is that if any of the selected values is a singletonmethod, the selected function item is modified by binding the context value to the containing map $V. In other cases the result is the sequence concatenation of the value parts.
If the modifier is items (explicitly or by default), and tbethe key specifier is not an NCName nor a StringLiteral, then the result of $V?items::KS is the result of the expression $V?pairs::KS ! map:get(., "value"). This returns the sequence concatenation of the selected values.
If the modifier is values, the result of $V?values::KS is the same as the result of $V?pairs::KS ! array { map:get(., "value") }. This returns each value as an array.
If the modifier is keys, the result of $V?keys::KS is the same as the result of $V?pairs::KS ! map:get(., "key"). This returns the keys (integer indexes in the case of an array) without the values.
Examples:
{ "first" : "Jenna", "last" : "Scott" }?first evaluates to "Jenna"
{ "first name" : "Jenna", "last name" : "Scott" }?"first name" evaluates to "Jenna"
[ 4, 5, 6 ]?2 evaluates to 5.
({ "first": "Tom" }, { "first": "Dick" }, { "first": "Harry" })?first evaluates to the sequence ("Tom", "Dick", "Harry").
([ 1, 2, 3 ], [ 4, 5, 6 ])?2 evaluates to the sequence (2, 5).
([ 1, [ "a", "b" ], [ 4, 5, [ "c", "d"] ])?value::*[. instance of array(xs:string)] evaluates to the sequence ([ "a", "b" ], [ "c", "d" ]).
[ "a", "b" ]?3 raises a dynamic error [err:FOAY0001]FO40