Please check the errata for any errors or issues reported since publication.
See also translations.
This document is also available in these non-normative formats: XML.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
XPath 4.0 is an expression language that allows the processing of values conforming to the data model defined in [XQuery and XPath Data Model (XDM) 4.0]. The name of the language derives from its most distinctive feature, the path expression, which provides a means of hierarchic addressing of the nodes in an XML tree. As well as modeling the tree structure of XML, the data model also includes atomic items, function items, maps, arrays, and sequences. This version of XPath supports JSON as well as XML, and adds many new functions in [XQuery and XPath Functions and Operators 4.0].
XPath 4.0 is a superset of XPath 3.1. A detailed list of changes made since XPath 3.1 can be found in I Change Log.
This is a draft prepared by the QT4CG (officially registered in W3C as the XSLT Extensions Community Group). Comments are invited.
The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).
Michael was central to the development of XML and many related technologies. He brought a polymathic breadth of knowledge and experience to everything he did. This, combined with his indefatigable curiosity and appetite for learning, made him an invaluable contributor to our project, along with many others. We have lost a brilliant thinker, a patient teacher, and a loyal friend.
This section discusses each of the basic kinds of expression. Each kind of expression has a name such as PathExpr, which is introduced on the left side of the grammar production that defines the expression. Since XPath 4.0 is a composable language, each kind of expression is defined in terms of other expressions whose operators have a higher precedence. In this way, the precedence of operators is represented explicitly in the grammar.
The order in which expressions are discussed in this document does not reflect the order of operator precedence. In general, this document introduces the simplest kinds of expressions first, followed by more complex expressions. For the complete grammar, see Appendix [A XPath 4.0 Grammar].
The highest-level symbol in the XPath grammar is XPath.
XPath | ::= | Expr |
Expr | ::= | (ExprSingle ++ ",") |
ExprSingle | ::= | ForExpr |
ExprSingle | ::= | ForExpr |
ForExpr | ::= | ForClauseForLetReturn |
LetExpr | ::= | LetClauseForLetReturn |
QuantifiedExpr | ::= | ("some" | "every") (QuantifierBinding ++ ",") "satisfies" ExprSingle |
IfExpr | ::= | "if" "(" Expr ")" (UnbracedActions | BracedAction) |
OrExpr | ::= | AndExpr ("or" AndExpr)* |
The XPath 4.0 operator that has lowest precedence is the comma operator, which is used to combine two operands to form a sequence. As shown in the grammar, a general expression (Expr) can consist of multiple ExprSingle operands, separated by commas.
The name ExprSingle denotes an expression that does not contain a top-level comma operator (despite its name, an ExprSingle may evaluate to a sequence containing more than one item.)
The symbol ExprSingle is used in various places in the grammar where an expression is not allowed to contain a top-level comma. For example, each of the arguments of a function call must be a ExprSingle, because commas are used to separate the arguments of a function call.
After the comma, the expressions that have next lowest precedence are ForExpr, LetExpr, QuantifiedExpr, IfExpr, and OrExpr. Each of these expressions is described in a separate section of this document.
Path expressions are extended to handle JNodes (found in trees of maps and arrays) as well as XNodes (found in trees representing parsed XML). [Issue 2054 ]
PathExpr | ::= | ("/" RelativePathExpr?) |
| /* xgc: leading-lone-slash */ | ||
RelativePathExpr | ::= | StepExpr (("/" | "//") StepExpr)* |
[Definition: A path expression consists of a series of one or more steps, separated by / or //, and optionally beginning with / or //. A path expression is typically used to locate GNodes within GTrees. ]
Note:
Note the terminology:
The following definitions are copied from the data model specification, for convenience:
[Definition: A tree that is rooted at a parentless JNode is referred to as a JTree.]
[Definition: A tree that is rooted at a parentless XNode is referred to as an XTree.]
[Definition: The term generic node or GNode is a collective term for XNodes (more commonly called simply nodes) representing the parts of an XML document, and JNodes, often used to represent the parts of a JSON document.]
[Definition: A JNode is a kind of item used to represent a value within the context of a tree of maps and arrays. A root JNode represents a map or array; a non-root JNode represents a member of an array or an entry in a map.]
[Definition: The term GTree means JTree or XTree.]
Absolute path expressions (those starting with an initial / or //), start their selection from the root GNode of a GTree; relative path expressions (those without a leading / or //) start from the context value.
A path expression consisting of a single step is evaluated as described in 4.6.4 Steps.
A path expression consisting of / on its own is treated as an abbreviation for /..
An expression of the form /PP (that is, a path expression with a leading /) is treated as an abbreviation for the expression self::gnode()/(fn:root(.) treat as (document-node()|jnode-type())/PP. The effect of this expansion is that for every item J in the context value V:
A type error occurs if J is not a GNode [err:XPTY0020].
The root GNode R of the GTree containing J is selected.
A dynamic error occurs if R is neither a JNode nor a document node [err:XPDY0050].
The expression that follows the leading / is evaluated with R as the context value.
Note:
If the context value includes a map or array, it is not converted implicitly to a JNode; rather, a type error occurs.
The results of these multiple evaluations are then combined into a single sequence; if the result is a set of GNodes, the GNodes are delivered in document order with duplicates eliminated.
Note:
The / character can be used either as a complete path expression or as the beginning of a longer path expression such as /*. Also, * is both the multiply operator and a wildcard in path expressions. This can cause parsing difficulties when / appears on the left-hand side of *. This is resolved using the leading-lone-slash constraint. For example, /* and / * are valid path expressions containing wildcards, but /*5 and / * 5 raise syntax errors. Parentheses must be used when / is used on the left-hand side of an operator that could be confused with a node test, as in (/) * 5. Similarly, 4 + / * 5 raises a syntax error, but 4 + (/) * 5 is a valid expression. The expression 4 + / is also valid, because / does not occur on the left-hand side of the operator.
Similarly, in the expression / union /*, union is interpreted as an element name rather than an operator. For it to be parsed as an operator, the expression should be written (/) union /*.
An expression of the form //PP (that is, a path expression with a leading //) is treated as an abbreviation for the expression self::gnode()/(fn:root(.) treat as (document-node()|jnode-type())/descendant-or-self::gnode()/PP. The effect of this expansion is that for every item J in the context value V:
A type error occurs if J is not a GNode [err:XPTY0020].
The root GNode R of the GTree containing J is selected.
A dynamic error occurs if R is neither a JNode nor a document node [err:XPDY0050].
The descendants of R are selected, along with R itself.
For every GNode D in this set of GNodes, the expression that follows the leading // is evaluated with D as the context value.
The results of these multiple evaluations are then combined into a single sequence; if the result is a set of GNodes, the GNodes are delivered in document order with duplicates eliminated.
Any map or array that is present in the context value is first coerced to a JNode by applying the JNode function.
If (after this coercion) the context value is not a sequence of GNodes, a type error is raised [err:XPTY0020]. At evaluation time, if the root GNode of any item in the context value is not a document node or a JNode, a dynamic error is raised [err:XPDY0050].
Note:
The descendants of an XNode do not include attribute nodes or namespace nodes. However, the rules for expanding // ensure that .//@* selects all attributes of all descendants, and similarly .//namespace::* selects all namespaces of all descendants.
Note:
// on its own is not a valid expression.
/)The path operator / is primarily used for locating GNodes within GTrees. ItsThe value of the left-hand operand may include maps and arrays; such itemss are implicitly converted to JNodes as if by a call on the jnode function. After this conversion, the left-hand operand must return a sequence of nodesGNodes. The result of the operator is either a sequence of GNodes (in document order, with no duplicates), or a sequence of non-GNodes.
The operation E1/E2 is evaluated as follows: Expression E1 is evaluated. Any maps or arrays in the result are converted to JNodes by applying the JNode function. If the result is not a (possibly empty) sequence S of GNodes, a type error is raised [err:XPTY0019]. Each GNode in S then serves in turn to provide an inner focus (the GNode as the context value, its position in S as the context position, the length of S as the context size) for an evaluation of E2, as described in 2.2.2 Dynamic Context. The sequences resulting from all the evaluations of E2 are combined as follows:
If every evaluation of E2 returns a (possibly empty) sequence of GNodes, these sequences are combined, and duplicate GNodes are eliminated based on GNode identity. The resulting GNode sequence is returned in document order.
If every evaluation of E2 returns a (possibly empty) sequence of non-GNodes, these sequences are concatenated, in order, and returned. The returned sequence preserves the orderings within and among the subsequences generated by the evaluations of E2.
Note:
The use of path expressions to select values other than GNodes is for backwards compatibility. Generally it is preferable to use the simple mapping operator ! for this purpose. For example, write $nodes!node-name() in preference to $nodes/node-name().
If the multiple evaluations of E2 return at least one GNode and at least one non-GNode, a type error is raised [err:XPTY0018].
Note:
The semantics of the path operator can also be defined using the simple map operator (!) as follows (the function fn:distinct-ordered-nodes($R) has the effect of eliminating duplicates and sorting nodes into document order):
let $R := E1 ! E2
return if (every $r in $R satisfies $r instance of gnode())
then (fn:distinct-ordered-nodes($R))
else if (every $r in $R satisfies not($r instance of gnode()))
then $R
else error()For a table comparing the step operator to the map operator, see 4.19 Simple map operator (!).
StepExpr | ::= | PostfixExpr | AxisStep |
PostfixExpr | ::= | PrimaryExpr | FilterExpr | DynamicFunctionCall | LookupExpr | FilterExprAM |
AxisStep | ::= | (AbbreviatedStep | FullStep) Predicate* |
[Definition: A step is a part of a path expression that generates a sequence of items and then filters the sequence by zero or more predicates. The value of the step consists of those items that satisfy each of the predicates, working from left to right. A step may be either an axis step or a postfix expression.] Postfix expressions are described in 4.3 Postfix Expressions.
[Definition: An axis step returns a sequence of GNodes that are reachable from a starting GNode via a specified axis. Such a step has two parts: an axis, which defines the direction of movement for the step, and a node test, which selects GNodes based on their properties.]
If the context value is a sequence of zero or more GNodes, an axis step returns a sequence of zero or more GNodes; otherwise, a type error is raised [err:XPTY0020].
If the context value for an axis step includes a map or array, this is implicitly converted to a JNode as if by applying the fn:jnode function. If, after this conversion, the sequence contains a value that is not a GNode, a type error is raised [err:XPTY0020]. The result of evaluating the axis step is a sequence of zero or more GNodes.
The step expression S is equivalent to ./S. Thus, if the context value is a sequence containing multiple GNodes, the semantics of a step expression are equivalent to a path expression in which the step is always applied to a single GNode. The following description therefore explains the semantics for the case where the context value is a single GNode, called the origin.
Note:
The equivalence of a step S to the path expression ./S means that the resulting GNode sequence is returned in document order.
In the abbreviated syntax for a step, the axis can be omitted and other shorthand notations can be used as described in 4.6.7 Abbreviated Syntax.
The unabbreviated syntax for an axis step consists of the axis name and node test separated by a double colon. The result of the step consists of the GNodes reachable from the origin via the specified axis that match the node test. For example, the step child::para selects the para element children of the origin XNode: child is the name of the axis, and para is the name of the element nodes to be selected on this axis. The available axes are described in 4.6.4.1 Axes. The available node tests are described in 4.6.4.2 Node Tests. Examples of steps are provided in 4.6.6 Unabbreviated Syntax and 4.6.7 Abbreviated Syntax.