View Old View New View Both View Only Previous Next

This draft contains only sections that have differences from the version that it modified.

W3C

XML Path Language (XPath) 4.0 WG Review Draft

W3C Editor's Draft 23 February 2026

This version:
https://qt4cg.org/specifications/xpath-40/
Most recent version of XPath:
https://qt4cg.org/specifications/xpath-40/
Most recent Recommendation of XPath:
https://www.w3.org/TR/2017/REC-xpath-31-20170321/
Editor:
Michael Kay, Saxonica <mike@saxonica.com>

Please check the errata for any errors or issues reported since publication.

See also translations.

This document is also available in these non-normative formats: XML.


Abstract

XPath 4.0 is an expression language that allows the processing of values conforming to the data model defined in [XQuery and XPath Data Model (XDM) 4.0]. The name of the language derives from its most distinctive feature, the path expression, which provides a means of hierarchic addressing of the nodes in an XML tree. As well as modeling the tree structure of XML, the data model also includes atomic items, function items, maps, arrays, and sequences. This version of XPath supports JSON as well as XML, and adds many new functions in [XQuery and XPath Functions and Operators 4.0].

XPath 4.0 is a superset of XPath 3.1. A detailed list of changes made since XPath 3.1 can be found in I Change Log.

Status of this Document

This is a draft prepared by the QT4CG (officially registered in W3C as the XSLT Extensions Community Group). Comments are invited.

Dedication

The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).

Michael was central to the development of XML and many related technologies. He brought a polymathic breadth of knowledge and experience to everything he did. This, combined with his indefatigable curiosity and appetite for learning, made him an invaluable contributor to our project, along with many others. We have lost a brilliant thinker, a patient teacher, and a loyal friend.


4 Expressions

This section discusses each of the basic kinds of expression. Each kind of expression has a name such as PathExpr, which is introduced on the left side of the grammar production that defines the expression. Since XPath 4.0 is a composable language, each kind of expression is defined in terms of other expressions whose operators have a higher precedence. In this way, the precedence of operators is represented explicitly in the grammar.

The order in which expressions are discussed in this document does not reflect the order of operator precedence. In general, this document introduces the simplest kinds of expressions first, followed by more complex expressions. For the complete grammar, see Appendix [A XPath 4.0 Grammar].

The highest-level symbol in the XPath grammar is XPath.

XPath::=Expr
Expr::=(ExprSingle ++ ",")
ExprSingle::=ForExpr
| LetExpr
| QuantifiedExpr
| IfExpr
| OrExpr
ForExpr::=ForClauseForLetReturn
LetExpr::=LetClauseForLetReturn
QuantifiedExpr::=("some" | "every") (QuantifierBinding ++ ",") "satisfies" ExprSingle
IfExpr::="if" "(" Expr ")" (UnbracedActions | BracedAction)
OrExpr::=AndExpr ("or" AndExpr)*

The XPath 4.0 operator that has lowest precedence is the comma operator, which is used to combine two operands to form a sequence. As shown in the grammar, a general expression (Expr) can consist of multiple ExprSingle operands, separated by commas.

The name ExprSingle denotes an expression that does not contain a top-level comma operator (despite its name, an ExprSingle may evaluate to a sequence containing more than one item.)

The symbol ExprSingle is used in various places in the grammar where an expression is not allowed to contain a top-level comma. For example, each of the arguments of a function call must be a ExprSingle, because commas are used to separate the arguments of a function call.

After the comma, the expressions that have next lowest precedence are ForExpr, LetExpr, QuantifiedExpr, IfExpr, and OrExpr. Each of these expressions is described in a separate section of this document.

4.13 Maps and Arrays

Most modern programming languages have support for collections of key/value pairs, which may be called maps, dictionaries, associative arrays, hash tables, keyed lists, or objects (these are not the same thing as objects in object-oriented systems). In XPath 4.0, we call these maps. Most modern programming languages also support ordered lists of values, which may be called arrays, vectors, or sequences. In XPath 4.0, we have both sequences and arrays. Unlike sequences, an array is an item, and can appear as an item in a sequence.

Note:

The XPath 4.0 specification focuses on syntax provided for maps and arrays, especially constructors and lookup.

Some of the functionality typically needed for maps and arrays is provided by functions defined in Section 18 Processing mapsFO and Section 19 Processing arraysFO, including functions used to read JSON to create maps and arrays, serialize maps and arrays to JSON, combine maps to create a new map, remove map entries to create a new map, iterate over the keys of a map, convert an array to create a sequence, combine arrays to form a new array, and iterate over arrays in various ways.

4.13.3 Lookup Expressions

Changes in 4.0  

  1. The lookup operator ? can now be followed by an arbitrary literal, for cases where keys are items other than integers or NCNames. It can also be followed by a variable reference or a context value reference.   [Issue 1996 PR 2134 29 July 2025]

The operator "?", known as the lookup operator, returns values found in the operand map or array.

4.13.3.1 Postfix Lookup Expressions
LookupExpr::=PostfixExprLookup
PostfixExpr::=PrimaryExpr | FilterExpr | DynamicFunctionCall | LookupExpr | MethodCall | FilterExprAM
Lookup::="?" KeySpecifier
KeySpecifier::=NCName | Literal | ContextValueRef | VarRef | ParenthesizedExpr | LookupWildcard
Literal::=NumericLiteral | StringLiteral | QNameLiteral
ContextValueRef::="."
VarRef::="$" EQName
ParenthesizedExpr::="(" Expr? ")"
LookupWildcard::="*"

A postfix Lookup has two parts: the left hand operand selects maps or arrays to be searched, and the KeySelector defines the search criteria.

First a simple example: given an array $array of maps:

[ { "John": 3, "Jill": 5}, {"Peter": 8, "Mary": 6} ]
  • $array?1?John returns 3

  • $array?2?Mary returns 6

  • $array?*?* returns (3, 5, 8, 6)

  • $array?2?* returns (8, 6)

  • $array?*?Peter returns 8

  • 'Peter' -> $array?*?. returns 8

The value of the left-hand operand must be a sequence of maps or arrays (but if it includes JNodes, these will be coerced to maps or arrays by extracting the ·content· property of the JNode). The lookup operation is applied independently to each of these maps or arrays, and the final expression result is the sequence concatenation of the individual results.

The semantics of a postfix lookup expression E?KS are defined by the following rules:

  1. E is evaluated to produce a value $V.

  2. If $V is not a singleton (that is if count($V) ne 1), then the result (by recursive application of these rules) is the value of for $v in $V return $v?KS.

  3. If $V is a JNode then it is coerced to the required type (map(*)|array(*)): see coercion rules.

  4. If $V (after coercion) is a singleton array item (that is, if $V instance of array(*)) then:

    1. If the KeySpecifierKS is either a Literal, a ContextValueRef, a VarRef, or a ParenthesizedExpr, then it is evaluated as an expression to produce a value $K and the result is:

      data($K) ! array:get($V, .)

      Note:

      The focus for evaluating the key specifier expression is the same as the focus for the Lookup expression itself.

      The order of items in the result reflects the order of subscripts in $K: [10, 20, 30]?(3, 1) returns (30, 10).

      This rule implies that a type error ([err:XPTY0004]) is raised if an item in the atomized value of $K cannot be coerced to the type xs:integer.

      This rule also implies that a dynamic error ([err:FOAY0001]FO40) is raised if an integer in the atomized value of $K is outside the range 1 to array:size($V).

    2. If the KeySpecifierKS is an NCName then it is evaluated in the same way as if the NCName were written in quotation marks as a StringLiteral: in consequence, the expression raises a type error [err:XPTY0004].

    3. If the KeySpecifierKS is a wildcard (*), the result is the same as $V?(1 to array:size($V)):

      Note:

      Note that array items are returned in order.

  5. If $V is a singleton map item (that is, if $V instance of map(*)) then:

    1. If the KeySpecifierKS is either a Literal, a ContextValueRef, a VarRef, or a ParenthesizedExpr, then it is evaluated as an expression to produce a value $K and the result is:

      data($K) ! map:get($V, .)

      Note:

      The focus for evaluating the key specifier expression is the same as the focus for the Lookup expression itself.

      The order of items in the result reflects the order of keys in $K: {'a':10, 'b':20, 'c':30}?('c', 'a') returns (30, 10).

      There is no error when $K includes a key that is not present in the map.

    2. If the KeySpecifierKS is an NCName, then the result is the same as if it were written in quotes as a StringLiteral: for example $map?name returns the same result as map?"name".

    3. If the KeySpecifierKS is a wildcard (*), the result is the same as $V?(map:keys($V)).

      Note:

      The order of entries in the result sequence reflects the entry orderDM of the map.

  6. Otherwise (that is, if $V is neither a map nor an array) a type error is raised [err:XPTY0004].

Examples:

  • [ 1, 2, 5, 7 ]?* evaluates to (1, 2, 5, 7).

  • [ [ 1, 2, 3 ], [ 4, 5, 6 ] ]?* evaluates to ([ 1, 2, 3 ], [ 4, 5, 6 ])

  • [ [ 1, 2, 3 ], 4, 5 ]?*[. instance of array(xs:integer)] evaluates to ([ 1, 2, 3 ])

  • [ [ 1, 2, 3 ], [ 4, 5, 6 ], 7 ]?*[. instance of array(*)]?2 evaluates to (2, 5)

  • [ [ 1, 2, 3 ], 4, 5 ]?*[. instance of xs:integer] evaluates to (4, 5).

4.13.3.2 Unary Lookup
UnaryLookup::=Lookup
Lookup::="?" KeySpecifier
KeySpecifier::=NCName | Literal | ContextValueRef | VarRef | ParenthesizedExpr | LookupWildcard
Literal::=NumericLiteral | StringLiteral | QNameLiteral
ContextValueRef::="."
VarRef::="$" EQName
ParenthesizedExpr::="(" Expr? ")"
LookupWildcard::="*"

Unary lookup is most commonly used in predicates (for example, $map[?name = 'Mike']) or with the simple map operator (for example, avg($maps ! (?price - ?discount))).

The unary lookup expression ?KS is defined to be equivalent to the postfix lookup expression .?KS, which has the context value (.) as the implicit first operand. See 4.13.3.1 Postfix Lookup Expressions for the postfix lookup operator.

Note:

Although the grammar allows the key specifier to be a context value expression, this is of no practical use with a unary lookup. The expression [1, 2, 3] -> ?. expands to [1, 2, 3]?(1, 2, 3) which returns (1, 2, 3); but a more likely result is a type error or array bounds error.

Examples:

  • ?name is equivalent to .("name"), an appropriate lookup for a map.

  • ?2 is equivalent to .(2), an appropriate lookup for an array or an integer-valued map.

  • ?"first name" is equivalent to .("first name")

  • ?#code is equivalent to .(#code)

  • ?($a) and ?$a are equivalent to for $k in $a return .($k), allowing keys for an array or map to be passed using a variable.

  • ?(3e0) and ?3e0 return the same result as ?3, because xs:double(3e0) and xs:integer(3) compare equal under the rules of the atomic-equal function.

  • ?(2 to 4) is equivalent to for $k in (2, 3, 4) return .($k), a convenient way to return a range of values from an array.

  • ([ 1, 2, 3 ], [ 1, 2, 5 ], [ 1, 2 ])[?3 = 5] raises an error, because ?3 applied to one of the items in the sequence fails.

  • If $m is bound to the weekdays map described in 4.13.1 Maps, then $m?* returns the values ("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"), in implementation-dependent order.

4.13.3.3 Comparing Lookup and Path Expressions

Lookup expressions are retained in this specification with only minor changes from the previous version 3.1. They remain a convenient solution for simple lookups of entries in maps and arrays.

For more complex queries into trees of maps and arrays, XPath 4.0 introduces a generalization of path expressions (see 4.6 Path Expressions) which can now handle JTrees as well as XTrees.

For simple expressions, the capabilities of the two constructs overlap. For example, if $m is a map, then the expressions $m?code = 3 and $m/code = 3 have the same effect. Path expressions, however, have more power, and with it, more complexity. The expression $m/code = 3 (unless simplified by an optimizer) effectively expands the expression to (jtree($m)/child::get("code") => jnode-value()) = 3: that is, the supplied map is wrapped in a JNode, the child axis returns a sequence of JNodes, and the ·content· properties of these JNodes are compared with the supplied value 3.

Whereas simple lookups of specific entries in maps and arrays work well, experience has shown that the ?* wildcard lookup can be problematic. This is because of the flattening effect: for example, given the array let $A := [(1,2), (3,4), (), 5] the result of the expression $A?* is the sequence (1, 2, 3, 4, 5) which loses information that might be needed for further processing. By contrast, the path expression $A/* (or $A/child::*) returns a sequence of four JNodes, whose ·content· properties are respectively (1,2), (3,4), (), and 5.

The result of a lookup expression is a simple value (the value of an entry in a map or a member of an array, or the sequence concatenation of several such values). By contrast, the result of a path expression applied to maps or arrays is always a sequence of JNodes. These JNodes can be used for further navigation. If only the ·content· properties of the JNodes are needed, these will usually be extracted automatically by virtue of the coercion rules: for example if the value is used in an arithmetic expression or a value comparison, atomization of the JNode automatically extracts its ·content·. In other cases the value can be extracted explicitly by a call of the jnode-value function.

Lookup expressions on arrays result in a dynamic error if the subscript is out of bounds, whereas the equivalent path expression succeeds, returning an empty sequence. For example array{1 to 5}?10 raises [err:FOAY0001]FO40, whereas array{1 to 5}/get(10) returns a empty sequence.