Please check the errata for any errors or issues reported since publication.
See also translations.
This document is also available in these non-normative formats: XML.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
XPath 4.0 is an expression language that allows the processing of values conforming to the data model defined in [XDM 4.0]. The name of the language derives from its most distinctive feature, the path expression, which provides a means of hierarchic addressing of the nodes in an XML tree. As well as modeling the tree structure of XML, the data model also includes atomic items, function items, maps, arrays, and sequences. This version of XPath supports JSON as well as XML, and adds many new functions in [Functions and Operators 4.0].
XPath 4.0 is a superset of XPath 3.1. A detailed list of changes made since XPath 3.1 can be found in I Change Log.
This is a draft prepared by the QT4CG (officially registered in W3C as the XSLT Extensions Community Group). Comments are invited.
The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).
Michael was central to the development of XML and many related technologies. He brought a polymathic breadth of knowledge and experience to everything he did. This, combined with his indefatigable curiosity and appetite for learning, made him an invaluable contributor to our project, along with many others. We have lost a brilliant thinker, a patient teacher, and a loyal friend.
This section discusses each of the basic kinds of expression. Each kind of expression has a name such as PathExpr, which is introduced on the left side of the grammar production that defines the expression. Since XPath 4.0 is a composable language, each kind of expression is defined in terms of other expressions whose operators have a higher precedence. In this way, the precedence of operators is represented explicitly in the grammar.
The order in which expressions are discussed in this document does not reflect the order of operator precedence. In general, this document introduces the simplest kinds of expressions first, followed by more complex expressions. For the complete grammar, see Appendix [A XPath 4.0 Grammar].
The highest-level symbol in the XPath grammar is XPath.
XPath | ::= | Expr |
Expr | ::= | (ExprSingle ++ ",") |
ExprSingle | ::= | ForExpr |
ForExpr | ::= | ForClauseForLetReturn |
LetExpr | ::= | LetClauseForLetReturn |
QuantifiedExpr | ::= | ("some" | "every") (QuantifierBinding ++ ",") "satisfies" ExprSingle |
IfExpr | ::= | "if" "(" Expr ")" (UnbracedActions | BracedAction) |
OrExpr | ::= | AndExpr ("or" AndExpr)* |
The XPath 4.0 operator that has lowest precedence is the comma operator, which is used to combine two operands to form a sequence. As shown in the grammar, a general expression (Expr) can consist of multiple ExprSingle operands, separated by commas.
The name ExprSingle denotes an expression that does not contain a top-level comma operator (despite its name, an ExprSingle may evaluate to a sequence containing more than one item.)
The symbol ExprSingle is used in various places in the grammar where an expression is not allowed to contain a top-level comma. For example, each of the arguments of a function call must be a ExprSingle, because commas are used to separate the arguments of a function call.
After the comma, the expressions that have next lowest precedence are ForExpr, LetExpr, QuantifiedExpr, IfExpr, and OrExpr. Each of these expressions is described in a separate section of this document.
Path expressions are extended to handle JNodes (found in trees of maps and arrays) as well as XNodes (found in trees representing parsed XML). [Issue 2054 ]
PathExpr | ::= | AbsolutePathExpr |
| /* xgc: leading-lone-slash */ | ||
AbsolutePathExpr | ::= | ("/" RelativePathExpr?) | ("//" RelativePathExpr) |
RelativePathExpr | ::= | StepExpr (("/" | "//") StepExpr)* |
[Definition: A path expression is either an absolute path expression or a relative path expression ]
[Definition: An absolute path expression is an instance of the production AbsolutePathExpr: it consists of either (a) the operator / followed by zero or more operands separated by / or // operators, or (b) the operator // followed by one or more operands separated by / or // operators.]
[Definition: A relative path expression is a non-trivial instance of the production RelativePathExpr: it consists of two or more operand expressions separated by / or // operators.]
[Definition: The operands of a path expression are conventionally referred to as steps.]
Note:
The term step must not be confused with axis step. A step can be any kind of expression, often but not necessarily an axis step, while an axis step can be used in any expression context, not necessarily as a step in a path expression.
A path expression is typically used to locate GNodes within GTrees.
Note:
Note the terminology:
The following definitions are copied from the data model specification, for convenience:
[Definition: A tree that is rooted at a parentless JNode is referred to as a JTree.]
[Definition: A tree that is rooted at a parentless XNode is referred to as an XTree.]
[Definition: The term generic node or GNode is a collective term for XNodes (more commonly called simply nodes) representing the parts of an XML document, and JNodes, often used to represent the parts of a JSON document.]
[Definition: A JNode is a kind of item used to represent a value within the context of a tree of maps and arrays. A root JNode represents a map or array; a non-root JNode represents a member of an array or an entry in a map.]
[Definition: The term GTree means JTree or XTree.]
Absolute path expressions (those starting with an initial / or //), start their selection from the root GNode of a GTree; relative path expressions (those without a leading / or //) start from the context value.
AxisStep | ::= | (AbbreviatedStep | FullStep) Predicate* |
AbbreviatedStep | ::= | ".." | ("@" NodeTest) | SimpleNodeTest |
FullStep | ::= | AxisNodeTest |
Axis | ::= | ("ancestor" | "ancestor-or-self" | "attribute" | "child" | "descendant" | "descendant-or-self" | "following" | "following-or-self" | "following-sibling" | "following-sibling-or-self" | "namespace" | "parent" | "preceding" | "preceding-or-self" | "preceding-sibling" | "preceding-sibling-or-self" | "self") "::" |
NodeTest | ::= | UnionNodeTest | SimpleNodeTest |
Predicate | ::= | "[" Expr "]" |
Expr | ::= | (ExprSingle ++ ",") |
[Definition: An axis step is an instance of the production AxisStep: it is an expression that returns a sequence of GNodes that are reachable from a starting GNode via a specified axis. An axis step has three parts: an axis, which defines the direction of movement for the step, a node test, which selects GNodes based on their properties, and zero or more predicates which are used to filter the results.]
Note:
An axis step is an expression in its own right. While axis steps are often used as the operands of path expressions, they can also appear in other contexts (without a / or // operator); equally, the operands of a path expression can be any expression, not restricted to an axis step.
If the context value for an axis step includes a map or array, this is implicitly converted to a JNode as if by applying the fn:jnode function. If, after this conversion, the sequence contains a value that is not a GNode, a type error is raised [err:XPTY0020]. The result of evaluating the axis step is a sequence of zero or more GNodes.
The axis stepS is equivalent to ./S. Thus, if the context value is a sequence containing multiple GNodes, the semantics of a axis step are equivalent to a path expression in which the step is always applied to a single GNode. The following description therefore explains the semantics for the case where the context value is a single GNode, called the origin.
Note:
The equivalence of a axis stepS to the path expression./S means that the resulting GNode sequence is returned in document order.
In the abbreviated syntax for a step, the axis can be omitted and other shorthand notations can be used as described in 4.6.7 Abbreviated Syntax.
The unabbreviated syntax for an axis step consists of the axis name and node test separated by a double colon. The result of the step consists of the GNodes reachable from the origin via the specified axis that match the node test. For example, the step child::para selects the para element children of the origin XNode: child is the name of the axis, and para is the name of the element nodes to be selected on this axis. The available axes are described in 4.6.4.1 Axes. The available node tests are described in 4.6.4.2 Node Tests. Examples of steps are provided in 4.6.6 Unabbreviated Syntax and 4.6.7 Abbreviated Syntax.
[Definition: An alternative form of a node test called a type test can select XNodes based on their type, or in the case of JNodes, the type of their contained ·content· ].
The most general form of type test uses the syntax type(SequenceType). This selects:
XNodes that are instances of the supplied SequenceType;
JNodes whose ·content· property is an instance of the supplied SequenceType.
For the most commonly encountered types, this syntax can be abbreviated: for example node(), text(), array(*), and record(x, y) can be written directly without the enclosing type(...).
If the origin is an XNode the type used will normally be a NodeKindTest such as node() or comment(). Specifying a type that cannot select nodes, such as map(*), is allowed but pointless.
Note:
If T is a NodeKindTest, there is a subtle difference between the expressions $N/T and $N/type(T): if no explicit axis is specified, and if T is in the form attribute(N) or schema-attribute(N), this changes the default axis to be the attribute axis; and similarly for tests implicitly using the namespace axis. This rule does not apply when the step is written as type(attribute(N)) or type(schema-attribute(N)). Such constructs make sense, for example, when selecting members of an array that are XNodes: the expressions $array/type(element(*)) and $array/type(attribute(*)) can be used to select the members of an array that are element nodes or attribute nodes respectively.
Such expressions return a sequence of JNodes, whose ·content· property is an XNode. Note that it is not directly possible to start with a JNode, select a contained XNode, and then navigate from the XNode: a path such as $array/type(element(p))/@id will not work. This is because the first step, $array/type(element(p)), does not select an element node, it selects a JNode whose ·content· property is that element node, and use of the attribute axis starting from a JNode has no effect.
Instead, the required effect can be achieved by adding a step that explicitly extracts the content of the JNode: $array / type(element(p)) / jnode-content() / @id.
The syntax and semantics of a kind test are described in 3.1 Sequence Types and 3.1.2 Sequence Type Matching.
Shown below are several examples of type tests that might be used in path expressions selecting within an XTree:
node() matches any XNode.
text() matches any text node.
comment() matches any comment node.
namespace-node() matches any namespace node.
element() matches any element node.
schema-element(person) matches any element node whose name is person (or is in the substitution group headed by person), and whose type annotation is the same as (or is derived from) the declared type of the person element in the in-scope element declarations.
element(person) matches any element node whose name is person, regardless of its type annotation.
element(doctor|nurse) matches any element node whose name is doctor or nurse, regardless of its type annotation.
element(person, surgeon) matches any non-nilled element node whose name is person, and whose type annotation is surgeon or is derived from surgeon.
element(doctor|nurse, medical-staff) matches any non-nilled element node whose name is doctor or nurse, and whose type annotation is medical-staff or is derived from medical-staff.
element(*, surgeon) matches any non-nilled element node whose type annotation is surgeon (or is derived from surgeon), regardless of its name.
attribute() matches any attribute node.
attribute(price) matches any attribute whose name is price, regardless of its type annotation.
attribute(*, xs:decimal) matches any attribute whose type annotation is xs:decimal (or is derived from xs:decimal), regardless of its name.
document-node() matches any document node.
document-node(element(book)) matches any document node whose children consist of a single element node that satisfies the kind testelement(book), interleaved with zero or more comments and processing instructions, and no text nodes.
document-node(book) is an abbreviation for document-node(element(book)).
The following examples show type type tests that might be used in path expressions selecting within a JTree:
array(*) matches any JNode whose ·content· is an array.
record(longitude, latitude, *) matches any JNode whose ·content· is a map having entries with keys "longitude" and "latitude".
type(empty-sequence()) matches any JNode whose ·content· is an empty sequence.
type(xs:date) matches any JNode whose ·content· is an instance of xs:date.