Please check the errata for any errors or issues reported since publication.
See also translations.
This document is also available in these non-normative formats: XML.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
XPath 4.0 is an expression language that allows the processing of values conforming to the data model defined in [XQuery and XPath Data Model (XDM) 4.0]. The name of the language derives from its most distinctive feature, the path expression, which provides a means of hierarchic addressing of the nodes in an XML tree. As well as modeling the tree structure of XML, the data model also includes atomic items, function items, maps, arrays, and sequences. This version of XPath supports JSON as well as XML, and adds many new functions in [XQuery and XPath Functions and Operators 4.0].
XPath 4.0 is a superset of XPath 3.1. A detailed list of changes made since XPath 3.1 can be found in I Change Log.
This is a draft prepared by the QT4CG (officially registered in W3C as the XSLT Extensions Community Group). Comments are invited.
The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).
Michael was central to the development of XML and many related technologies. He brought a polymathic breadth of knowledge and experience to everything he did. This, combined with his indefatigable curiosity and appetite for learning, made him an invaluable contributor to our project, along with many others. We have lost a brilliant thinker, a patient teacher, and a loyal friend.
This section discusses each of the basic kinds of expression. Each kind of expression has a name such as PathExpr, which is introduced on the left side of the grammar production that defines the expression. Since XPath 4.0 is a composable language, each kind of expression is defined in terms of other expressions whose operators have a higher precedence. In this way, the precedence of operators is represented explicitly in the grammar.
The order in which expressions are discussed in this document does not reflect the order of operator precedence. In general, this document introduces the simplest kinds of expressions first, followed by more complex expressions. For the complete grammar, see Appendix [A XPath 4.0 Grammar].
The highest-level symbol in the XPath grammar is XPath.
XPath | ::= | Expr |
Expr | ::= | (ExprSingle ++ ",") |
ExprSingle | ::= | ForExpr |
ExprSingle | ::= | ForExpr |
ForExpr | ::= | ForClauseForLetReturn |
LetExpr | ::= | LetClauseForLetReturn |
QuantifiedExpr | ::= | ("some" | "every") (QuantifierBinding ++ ",") "satisfies" ExprSingle |
IfExpr | ::= | "if" "(" Expr ")" (UnbracedActions | BracedAction) |
OrExpr | ::= | AndExpr ("or" AndExpr)* |
The XPath 4.0 operator that has lowest precedence is the comma operator, which is used to combine two operands to form a sequence. As shown in the grammar, a general expression (Expr) can consist of multiple ExprSingle operands, separated by commas.
The name ExprSingle denotes an expression that does not contain a top-level comma operator (despite its name, an ExprSingle may evaluate to a sequence containing more than one item.)
The symbol ExprSingle is used in various places in the grammar where an expression is not allowed to contain a top-level comma. For example, each of the arguments of a function call must be a ExprSingle, because commas are used to separate the arguments of a function call.
After the comma, the expressions that have next lowest precedence are ForExpr, LetExpr, QuantifiedExpr, IfExpr, and OrExpr. Each of these expressions is described in a separate section of this document.
Most modern programming languages have support for collections of key/value pairs, which may be called maps, dictionaries, associative arrays, hash tables, keyed lists, or objects (these are not the same thing as objects in object-oriented systems). In XPath 4.0, we call these maps. Most modern programming languages also support ordered lists of values, which may be called arrays, vectors, or sequences. In XPath 4.0, we have both sequences and arrays. Unlike sequences, an array is an item, and can appear as an item in a sequence.
Note:
The XPath 4.0 specification focuses on syntax provided for maps and arrays, especially constructors and lookup.
Some of the functionality typically needed for maps and arrays is provided by functions defined in Section 18 Processing mapsFO and Section 19 Processing arraysFO, including functions used to read JSON to create maps and arrays, serialize maps and arrays to JSON, combine maps to create a new map, remove map entries to create a new map, iterate over the keys of a map, convert an array to create a sequence, combine arrays to form a new array, and iterate over arrays in various ways.
[Definition: A map is a function that associates a set of keys with values, resulting in a collection of key / value pairs.] [Definition: Each key / value pair in a map is called an entry.] [Definition: The value associated with a given key is called the associated value of the key.]
Maps and their properties are defined in the data model: see Section 7.2 Map ItemsDM. For an overview of the functions available for processing maps, see Section 18 Processing mapsFO.
Note:
Maps in XPath 4.0 are ordered. The effect of this property is explained in Section 7.2 Map ItemsDM. In an ordered map, the order of entries is predictable and depends on the order in which they were added to the map.
In map constructors, the keyword map is now optional, so map { 0: false(), 1: true() } can now be written { 0: false(), 1: true() }, provided it is used in a context where this creates no ambiguity. [Issue 1070 PR 1071 26 March 2024]
The order of key-value pairs in the map constructor is now retained in the constructed map. [Issue 1651 PR 1703 14 January 2025]
A general expression is allowed within a map constructor; this facilitates the creation of maps in which the presence or absence of particular keys is decided dynamically. [Issue 2003 PR 2094 13 July 2025]
A map can be created using a MapConstructor.
Examples are:
{ "a": 1, "b", 2 }{ "a": 1, "b": 2 }which constructs a map with two entries, and
{ "a": 1, if ($condition) { map{ "b", 2 } } }{ "a": 1, if ($condition) { map{ "b": 2 } } }which constructs a map having either one or two entries depending on the value of $condition.
Both the keys and the values in a map constructor can be supplied as expressions rather than as constants.
MapConstructor | ::= | "map"? "{" (MapConstructorEntry ** ",") "}" |
MapConstructorEntry | ::= | ExprSingle (":" ExprSingle)? |
ExprSingle | ::= | ForExpr |
Note:
The keyword map was required in earlier versions of the language; in XPath 4.0 it becomes optional. There may be cases where using the keyword improves readability.
In order to allow the map keyword to be omitted, an incompatible change has been made to XQuery computed element and attribute constructors: if the name of the constructed element or attribute is a language keyword, it must now be written using the QNameLiteral syntax, for example element #div {}.
Although the grammar allows a MapConstructor to appear within an EnclosedExpr (that is, between curly brackets), this may be confusing to readers, and using the map keyword in such cases may improve clarity. The keyword map is used in the second example above to avoid any confusion between the braces required for the then part of the conditional expression, and the braces required for the inner map constructor.
If the EnclosedExpr appears in a context such as a StringTemplate, the two adjacent left opening braces must at least be separated by whitespace.
When a MapConstructorEntry is written as two instances of ExprSingle separated by a colon, the first expression is evaluated and atomized to form a key, and the second expression is evaluated to form the corresponding value. The result is a single-entry mapDM which will be merged into the constructed map, as described below. A type error [err:XPTY0004] occurs if the result of the first expression (after atomization) is not a single atomic item. The result of the second expression is used as is.
When the MapConstructorEntry is written as a single instance of ExprSingle with no colon, it must evaluate to a sequence of zero or more map items ([err:XPTY0004]). These map items will be merged into the constructed map, as described below.
Each contained MapConstructorEntry thus delivers zero or more maps, and the result of the map constructor is a new map obtained by merging these component maps, in order, as if by the map:merge function.
[Definition: Two atomic items K1 and K2 have the same key value if fn:atomic-equal(K1, K2) returns true, as specified in Section 14.2.1 fn:atomic-equalFO ] If two or more entries have the same key value then a dynamic error is raised [err:XQDY0137]. The error may be raised statically if two or more entries can be determined statically to have the same key value.
The entry orderDM of the entries in the constructed map retains the order of the MapConstructorEntry entries in the input.
The following expression constructs a map with seven entries:
The following expression constructs a map with either five or seven entries, depending on a supplied condition:
{
"Mo" : "Monday",
"Tu" : "Tuesday",
"We" : "Wednesday",
"Th" : "Thursday",
"Fr" : "Friday",
if ($include-weekends) {
{ "Sa" : "Saturday",
"Su" : "Sunday"
}
}
}
The following expression (which uses two nested map constructors) constructs a map that indexes employees by the value of their @id attribute:
{ //employee ! {@id, .} }
Maps can nest, and can contain any XDM value. Here is an example of a nested map with values that can be string values, numeric values, or arrays:
{
"book": {
"title": "Data on the Web",
"year": 2000,
"author": [
{
"last": "Abiteboul",
"first": "Serge"
},
{
"last": "Buneman",
"first": "Peter"
},
{
"last": "Suciu",
"first": "Dan"
}
],
"publisher": "Morgan Kaufmann Publishers",
"price": 39.95
}
}Note:
The syntax deliberately mimics JSON, but there are a few differences. JSON constructs that are not accepted in XPath 4.0 map constructors include the keywords true, false, and null, and backslash-escaped characters such as "\n" in string literals. In an XPath 4.0 map constructor, of course, any literal value can be replaced with an expression.
Note:
In some circumstances, it is necessary to include whitespace before or after the colon of a MapConstructorEntry to ensure that it is parsed as intended.
For instance, consider the expression {a:b}. Although it matches the EBNF for MapConstructor (with a matching MapKeyExpr and b matching MapValueExpr), the "longest possible match" rule requires that a:b be parsed as a QName, which results in a syntax error. Changing the expression to {a :b} or {a: b} will prevent this, resulting in the intended parse.
Similarly, consider these three expressions:
{a:b:c}
{a:*:c}
{*:b:c} In each case, the expression matches the EBNF in two different ways, but the “longest possible match” rule forces the parse in which the MapKeyExpr is a:b, a:*, or *:b (respectively) and the MapValueExpr is c. To achieve the alternative parse (in which the MapKeyExpr is merely a or *), insert whitespace before and/or after the first colon.
Note:
There are also several functions that can be used to construct maps with a variable number of entries:
map:build takes any sequence as input, and for each item in the sequence, it computes a key and a value, by calling user-supplied functions.
map:merge takes a sequence of maps (often but not necessarily single-entry mapDM) and merges them into a single map.
map:of-pairs takes a sequence of key-value pair mapsFO and merges them into a single map.
Any of these functions can be used to build an index of employee elements using the value of the @id attribute as a key:
map:build(//employee, fn { @id })
map:merge(//employee ! { @id, . })
map:of-pairs(//employee ! { 'key': @id, 'value': . })
All three functions also provide control over:
The way in which duplicate keys are handled, and
The ordering of entries in the resulting map.
Use the arrows to browse significant changes since the 3.1 version of this specification.
See 1 Introduction
Sections with significant changes are marked Δ in the table of contents.
See 1 Introduction
Setting the default namespace for elements and types to the special value ##any causes an unprefixed element name to act as a wildcard, matching by local name regardless of namespace.
The terms FunctionType, ArrayType, MapType, and RecordType replace FunctionTest, ArrayTest, MapTest, and RecordTest, with no change in meaning.
Record types are added as a new kind of ItemType, constraining the value space of maps.
Function coercion now allows a function with arity N to be supplied where a function of arity greater than N is expected. For example this allows the function true#0 to be supplied where a predicate function is required.
PR 1817 1853
An inline function may be annotated as a %method, giving it access to its containing map.
See 4.5.6 Inline Function Expressions
See 4.5.6.1 Methods
The symbols × and ÷ can be used for multiplication and division.
The rules for value comparisons when comparing values of different types (for example, decimal and double) have changed to be transitive. A decimal value is no longer converted to double, instead the double is converted to a decimal without loss of precision. This may affect compatibility in edge cases involving comparison of values that are numerically very close.
Operators such as < and > can use the full-width forms < and > to avoid the need for XML escaping.
The lookup operator ? can now be followed by a string literal, for cases where map keys are strings other than NCNames. It can also be followed by a variable reference.
PR 1864 1877
The key specifier can reference an item type or sequence type, to select values of that type only. This is especially useful when processing trees of maps and arrays, as encountered when processing JSON input.
PR 1763 1830
The syntax on the right-hand side of an arrow operator has been relaxed; a dynamic function call no longer needs to start with a variable reference or a parenthesized expression, it can also be (for example) an inline function expression or a map or array constructor.
The arrow operator => is now complemented by a “mapping arrow” operator =!> which applies the supplied function to each item in the input sequence independently.
PR 1023 1128
It has been clarified that function coercion applies even when the supplied function item matches the required function type. This is to ensure that arguments supplied when calling the function are checked against the signature of the required function type, which might be stricter than the signature of the supplied function item.
A dynamic function call can now be applied to a sequence of functions, and in particular to an empty sequence. This makes it easier to chain a sequence of calls.
The syntax document-node(N), where N is a NameTestUnion, is introduced as an abbreviation for document-node(element(N)). For example, document-node(*) matches any well-formed XML document (as distinct from a document fragment).
See 3.2.7 Node Types
QName literals are new in 4.0.
A general expression is allowed within a map constructor; this facilitates the creation of maps in which the presence or absence of particular keys is decided dynamically.
PR 28
Multiple for and let clauses can be combined in an expression without an intervening return keyword.
PR 159
Keyword arguments are allowed on static function calls, as well as positional arguments.
PR 202
The presentation of the rules for the subtype relationship between sequence types and item types has been substantially rewritten to improve clarity; no change to the semantics is intended.
PR 230
The rules for “errors and optimization” have been tightened up to disallow many cases of optimizations that alter error behavior. In particular there are restrictions on reordering the operands of and and or, and of predicates in filter expressions, in a way that might allow the processor to raise dynamic errors that the author intended to prevent.
PR 254
The term "function conversion rules" used in 3.1 has been replaced by the term "coercion rules".
The coercion rules allow “relabeling” of a supplied atomic item where the required type is a derived atomic type: for example, it is now permitted to supply the value 3 when calling a function that expects an instance of xs:positiveInteger.
PR 284
Alternative syntax for conditional expressions is available: if (condition) { X }.
PR 286
Element and attribute tests can include alternative names: element(chapter|section), attribute(role|class).
See 3.2.7 Node Types
The NodeTest in an AxisStep now allows alternatives: ancestor::(section|appendix)
See 3.2.7 Node Types
Element and attribute tests of the form element(N) and attribute(N) now allow N to be any NameTest, including a wildcard.
PR 324
String templates provide a new way of constructing strings: for example `{$greeting}, {$planet}!` is equivalent to $greeting || ', ' || $planet || '!'
PR 326
Support for higher-order functions is now a mandatory feature (in 3.1 it was optional).
See 5 Conformance
PR 344
A for member clause is added to FLWOR expressions to allow iteration over an array.
PR 368
The concept of the context item has been generalized, so it is now a context value. That is, it is no longer constrained to be a single item.
PR 433
Numeric literals can now be written in hexadecimal or binary notation; and underscores can be included for readability.
PR 519
The rules for tokenization have been largely rewritten. In some cases the revised specification may affect edge cases that were handled in different ways by different 3.1 processors, which could lead to incompatible behavior.
PR 521
New abbreviated syntax is introduced (focus function) for simple inline functions taking a single argument. An example is fn { ../@code }
PR 603
The rules for reporting type errors during static analysis have been changed so that a processor has more freedom to report errors in respect of constructs that are evidently wrong, such as @price/@value, even though dynamic evaluation is defined to return an empty sequence rather than an error.
PR 606
Element and attribute tests of the form element(A|B) and attribute(A|B) are now allowed.
PR 691
Enumeration types are added as a new kind of ItemType, constraining the value space of strings.
PR 728
The syntax record(*) is allowed; it matches any map.
PR 815
The coercion rules now allow conversion in either direction between xs:hexBinary and xs:base64Binary.
PR 837
A deep lookup operator ?? is provided for searching trees of maps and arrays.
PR 911
The coercion rules now allow any numeric type to be implicitly converted to any other, for example an xs:double is accepted where the required type is xs:decimal.
PR 996
The value of a predicate in a filter expression can now be a sequence of integers.
PR 1031
An otherwise operator is introduced: A otherwise B returns the value of A, unless it is an empty sequence, in which case it returns the value of B.
PR 1071
In map constructors, the keyword map is now optional, so map { 0: false(), 1: true() } can now be written { 0: false(), 1: true() }, provided it is used in a context where this creates no ambiguity.
PR 1125
Lookup expressions can now take a modifier (such as keys, values, or pairs) enabling them to return structured results rather than a flattened sequence.
PR 1131
A positional variable can be defined in a for expression.
The type of a variable used in a for expression can be declared.
The type of a variable used in a let expression can be declared.
PR 1132
Choice item types (an item type allowing a set of alternative item types) are introduced.
PR 1163
Filter expressions for maps and arrays are introduced.
PR 1181
The default namespace for elements and types can be set to the value ##any, allowing unprefixed names in axis steps to match elements with a given local name in any namespace.
If the default namespace for elements and types has the special value ##any, then an unprefixed name in a NameTest acts as a wildcard, matching names in any namespace or none.
PR 1197
The keyword fn is allowed as a synonym for function in function types, to align with changes to inline function declarations.
In inline function expressions, the keyword function may be abbreviated as fn.
PR 1212
XPath 3.0 included empty-sequence and item as reserved function names, and XPath 3.1 added map and array. This was unnecessary since these names never appear followed by a left parenthesis at the start of an expression. They have therefore been removed from the list. New keywords introducing item types, such as record and enum, have not been included in the list.
PR 1217
Predicates in filter expressions for maps and arrays can now be numeric.
PR 1249
A for key/value clause is added to FLWOR expressions to allow iteration over maps.
PR 1250
Several decimal format properties, including minus sign, exponent separator, percent, and per-mille, can now be rendered as arbitrary strings rather than being confined to a single character.
PR 1265
The rules regarding the document-uri property of nodes returned by the fn:collection function have been relaxed.
PR 1344
Parts of the static context that were there purely to assist in static typing, such as the statically known documents, were no longer referenced and have therefore been dropped.
The static typing option has been dropped.
The static typing feature has been dropped.
See 5 Conformance
PR 1361
The term atomic value has been replaced by atomic item.
See 2.1.2 Values
PR 1384
If a type declaration is present, the supplied values in the input sequence are now coerced to the required type. Type declarations are now permitted in XPath as well as XQuery.
PR 1496
The context value static type, which was there purely to assist in static typing, has been dropped.
PR 1498
The EBNF operators ++ and ** have been introduced, for more concise representation of sequences using a character such as "," as a separator. The notation is borrowed from Invisible XML.
See 2.1 Terminology
The EBNF notation has been extended to allow the constructs (A ++ ",") (one or more occurrences of A, comma-separated, and (A ** ",") (zero or more occurrences of A, comma-separated.
The EBNF operators ++ and ** have been introduced, for more concise representation of sequences using a character such as "," as a separator. The notation is borrowed from Invisible XML.
See A.1 EBNF
See A.1.1 Notation
PR 1501
The coercion rules now apply recursively to the members of an array and the entries in a map.
PR 1532
Four new axes have been defined: preceding-or-self, preceding-sibling-or-self, following-or-self, and following-sibling-or-self.
See 4.6.4.1 Axes
PR 1577
The syntax record() is allowed; the only thing it matches is an empty map.
PR 1686
With the pipeline operator ->, the result of an expression can be bound to the context value before evaluating another expression.
PR 1696
Parameter names may be included in a function signature; they are purely documentary.
PR 1703
Ordered maps are introduced.
See 4.13.1 Maps
The order of key-value pairs in the map constructor is now retained in the constructed map.
PR 1874
The coercion rules now reorder the entries in a map when the required type is a record type.
PR 1898
The rules for subtyping of document node types have been refined.
PR 1991
Named record types used in the signatures of built-in functions are now available as standard in the static context.
PR 2026
The module feature is no longer an optional feature; processing of library modules is now required.
See 5 Conformance
PR 2055
Sequences, arrays, and maps can be destructured in a let expression to extract their components into multiple variables.
PR 2094
A general expression is allowed within a map constructor; this facilitates the creation of maps in which the presence or absence of particular keys is decided dynamically.