Please check the errata for any errors or issues reported since publication.
See also translations.
This document is also available in these non-normative formats: XML.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
XPath 4.0 is an expression language that allows the processing of values conforming to the data model defined in [XQuery and XPath Data Model (XDM) 4.0]. The name of the language derives from its most distinctive feature, the path expression, which provides a means of hierarchic addressing of the nodes in an XML tree. As well as modeling the tree structure of XML, the data model also includes atomic items, function items, maps, arrays, and sequences. This version of XPath supports JSON as well as XML, and adds many new functions in [XQuery and XPath Functions and Operators 4.0].
XPath 4.0 is a superset of XPath 3.1. A detailed list of changes made since XPath 3.1 can be found in I Change Log.
This is a draft prepared by the QT4CG (officially registered in W3C as the XSLT Extensions Community Group). Comments are invited.
The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).
Michael was central to the development of XML and many related technologies. He brought a polymathic breadth of knowledge and experience to everything he did. This, combined with his indefatigable curiosity and appetite for learning, made him an invaluable contributor to our project, along with many others. We have lost a brilliant thinker, a patient teacher, and a loyal friend.
This section discusses each of the basic kinds of expression. Each kind of expression has a name such as PathExpr, which is introduced on the left side of the grammar production that defines the expression. Since XPath 4.0 is a composable language, each kind of expression is defined in terms of other expressions whose operators have a higher precedence. In this way, the precedence of operators is represented explicitly in the grammar.
The order in which expressions are discussed in this document does not reflect the order of operator precedence. In general, this document introduces the simplest kinds of expressions first, followed by more complex expressions. For the complete grammar, see Appendix [A XPath 4.0 Grammar].
The highest-level symbol in the XPath grammar is XPath.
XPath | ::= | Expr |
Expr | ::= | (ExprSingle ++ ",") |
ExprSingle | ::= | ForExpr |
ExprSingle | ::= | ForExpr |
ForExpr | ::= | ForClauseForLetReturn |
LetExpr | ::= | LetClauseForLetReturn |
QuantifiedExpr | ::= | ("some" | "every") (QuantifierBinding ++ ",") "satisfies" ExprSingle |
IfExpr | ::= | "if" "(" Expr ")" (UnbracedActions | BracedAction) |
OrExpr | ::= | AndExpr ("or" AndExpr)* |
The XPath 4.0 operator that has lowest precedence is the comma operator, which is used to combine two operands to form a sequence. As shown in the grammar, a general expression (Expr) can consist of multiple ExprSingle operands, separated by commas.
The name ExprSingle denotes an expression that does not contain a top-level comma operator (despite its name, an ExprSingle may evaluate to a sequence containing more than one item.)
The symbol ExprSingle is used in various places in the grammar where an expression is not allowed to contain a top-level comma. For example, each of the arguments of a function call must be a ExprSingle, because commas are used to separate the arguments of a function call.
After the comma, the expressions that have next lowest precedence are ForExpr, LetExpr, QuantifiedExpr, IfExpr, and OrExpr. Each of these expressions is described in a separate section of this document.
Arrow expressions apply a function to a value, using the value of the left-hand expression as the first argument to the function.
ArrowExpr | ::= | UnaryExpr (SequenceArrowTarget | MappingArrowTarget)* |
UnaryExpr | ::= | ("-" | "+")* ValueExpr |
SequenceArrowTarget | ::= | "=>" ArrowTarget |
ArrowTarget | ::= | FunctionCall | RestrictedDynamicCall |
FunctionCall | ::= | EQNameArgumentList |
| /* xgc: reserved-function-names */ | ||
| /* gn: parens */ | ||
RestrictedDynamicCall | ::= | (VarRef | ParenthesizedExpr | FunctionItemExpr | MapConstructor | ArrayConstructor) PositionalArgumentList |
VarRef | ::= | "$" EQName |
ParenthesizedExpr | ::= | "(" Expr? ")" |
FunctionItemExpr | ::= | NamedFunctionRef | InlineFunctionExpr |
NamedFunctionRef | ::= | EQName "#" IntegerLiteral |
| /* xgc: reserved-function-names */ | ||
InlineFunctionExpr | ::= | MethodAnnotation* ("function" | "fn") FunctionSignature? FunctionBody |
MapConstructor | ::= | "map"? "{" (MapConstructorEntry ** ",") "}" |
ArrayConstructor | ::= | SquareArrayConstructor | CurlyArrayConstructor |
PositionalArgumentList | ::= | "(" PositionalArguments? ")" |
PositionalArguments | ::= | (Argument ++ ",") |
MappingArrowTarget | ::= | "=!>" ArrowTarget |
The arrow syntax is particularly helpful when applying multiple functions to a value in turn. For example, the following expression invites syntax errors due to misplaced parentheses:
tokenize((normalize-unicode(upper-case($string))),"\s+")
In the following reformulation, it is easier to see that the parentheses are balanced:
$string => upper-case() => normalize-unicode() => tokenize("\s+")When the operator is written as =!>, the function is applied to each item in the sequence in turn. Assuming that $string is a single string, the above example could equally be written:
$string =!> upper-case() =!> normalize-unicode() =!> tokenize("\s+")The difference between the two operators is seen when the left-hand operand evaluates to a sequence:
(1, 2, 3) => avg()
returns a value of only one item, 2, the average of all three items, whereas.
This example could also be written as using the pipeline operator as:
(1, 2, 3) -> avg(.)By contrast, an expression using the mapping arrow operator:
(1, 2, 3) =!> avg()
returnswould return the original sequence of three items, (1, 2, 3), each item being the average of itself. The following example:
There are two significant differences between the pipeline operator-> and the sequence arrow operator=>:
The -> operator takes an arbitrary expression as its right-hand operand, whereas the => operator only accepts a function call.
When the right hand operand is a function call, the first argument is omitted in the case of the => operator, but is included explicitly (as a context value expression, .) in the case of the -> operator.
The following example:
"The cat sat on the mat"
=> tokenize()
=!> concat(".")
=!> upper-case()
=> string-join(" ")returns "THE. CAT. SAT. ON. THE. MAT.". The first arrow could be written either as => or =!> because the operand is a singleton; the next two arrows have to be =!> because the function is applied to each item in the tokenized sequence individually; the final arrow must be => because the string-join function applies to the sequence as a whole.
Note:
It may be useful to think of this as a map/reduce pipeline. The functions introduced by =!> are mapping operations; the function introduced by => is a reduce operation.
The following example introduces an inline function to the pipeline:
(1 to 5) =!> xs:double() =!> math:sqrt() =!> fn($a) { $a + 1 }() => sum()This is equivalent to sum((1 to 5) ! (math:sqrt(xs:double(.)) + 1)).
The same effect can be achieved using a focus function:
(1 to 5) =!> xs:double() =!> math:sqrt() =!> fn { . + 1 }() => sum()It could also be expressed using the mapping operator !:
(1 to 5) ! xs:double(.) ! math:sqrt(.) ! (. + 1) => sum()
Note:
The ArgumentList may include PlaceHolders, though this is not especially useful. For example, the expression "$" => concat(?) is equivalent to concat("$", ?): its value is a function that prepends a supplied string with a $ symbol.
Note:
The ArgumentList may include keyword arguments if the function is identified statically (that is, by name). For example, the following is valid: $xml => xml-to-json(indent := true()) => parse-json(escape := false()).
The sequence arrow operator thus applies the supplied function to the left-hand operand as a whole, while the mapping arrow operator applies the function to each item in the value of the left-hand operand individually. In the case where the result of the left-hand operand is a single item, the two operators have the same effect.
Note:
The mapping arrow symbol =!> is intended to suggest a combination of function application (=>) and sequence mapping (!) combined in a single operation.
The construct on the right-hand side of the arrow operator (=>) can either be a static function call, or a restricted form of dynamic function call. The restrictions are there to ensure that the two forms can be distinguished by the parser with limited lookahead. For a dynamic call, the function item to be called can be expressed as a variable reference, an inline function expression, a named function reference, a map constructor, or an array constructor. Any other expression used to return the required function item must be enclosed in parentheses.