Please check the errata for any errors or issues reported since publication.
See also translations.
This document is also available in these non-normative formats: XML.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
XPath 4.0 is an expression language that allows the processing of values conforming to the data model defined in [XQuery and XPath Data Model (XDM) 4.0]. The name of the language derives from its most distinctive feature, the path expression, which provides a means of hierarchic addressing of the nodes in an XML tree. As well as modeling the tree structure of XML, the data model also includes atomic items, function items, maps, arrays, and sequences. This version of XPath supports JSON as well as XML, and adds many new functions in [XQuery and XPath Functions and Operators 4.0].
XPath 4.0 is a superset of XPath 3.1. A detailed list of changes made since XPath 3.1 can be found in I Change Log.
This is a draft prepared by the QT4CG (officially registered in W3C as the XSLT Extensions Community Group). Comments are invited.
The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).
Michael was central to the development of XML and many related technologies. He brought a polymathic breadth of knowledge and experience to everything he did. This, combined with his indefatigable curiosity and appetite for learning, made him an invaluable contributor to our project, along with many others. We have lost a brilliant thinker, a patient teacher, and a loyal friend.
As noted in 2.1.3 Values, every value in XPath 4.0 is regarded as a sequence of zero, one, or more items. The type system of XPath 4.0, described in this section, classifies the kinds of value that the language can handle, and the operations permitted on different kinds of value.
The type system of XPath 4.0 is related to the type system of [XML Schema 1.0] or [XML Schema 1.1] in two ways:
atomic items in XPath 4.0 (which are one kind of item) have atomic types such as xs:string, xs:boolean, and xs:integer. These types are taken directly from their definitions in [XML Schema 1.0] or [XML Schema 1.1].
XNodes (which are another kind of item) have a property called a type annotation which determines the type of their content. The type annotation is a schema type. The type annotation of a node must not be confused with the item type of the node. For example, an element <age>23</age> might have been validated against a schema that defines this element as having xs:integer content. If this is the case, the type annotation of the node will be xs:integer, and in the XPath 4.0 type system, the node will match the item typeelement(age, xs:integer).
This chapter of the specification starts by defining sequence types and item types, which describe the range of values that can be bound to variables, used in expressions, or passed to functions. It then describes how these relate to schema types, that is, the simple and complex types defined in an XSD schema.
Note:
In many situations the terms item type and sequence type are used interchangeably to refer either to the type itself, or to the syntactic construct that designates the type: so in the expression $x instance of xs:string*, the construct xs:string* uses the SequenceType syntax to designate a sequence type whose instances are sequences of strings. When more precision is required, the specification is careful to use the terms item type and sequence type to refer to the actual types, while using the production names ItemType and SequenceType to refer to the syntactic designators of these types.
[Definition: An item type is a type that can be expressed using the ItemType syntax, which forms part of the SequenceType syntax. Item types match individual items.]
Note:
While this definition is adequate for the purpose of defining the syntax of XPath 4.0, it ignores the fact that there are also item types that cannot be expressed using XPath 4.0 syntax: specifically, item types that reference an anonymous simple type or complex type defined in a schema. Such types can appear as type annotations on nodes following schema validation.
In most cases, the set of items matched by an item type consists either exclusively of atomic items, exclusively of nodes, or exclusively of function itemsDM. Exceptions include the generic types item(), which matches all items, xs:error, which matches no items, and choice item types, which can match any combination of types.
[Definition: An item type designator is a syntactic construct conforming to the grammar rule ItemType. An item type designator is said to designate an item type.]
Note:
Two item type designators may designate the same item type. For example, element() and element(*) are equivalent, as are attribute(A) and attribute(A, xs:anySimpleType).
Lexical QNames appearing in an item type designator are expanded using the default type namespace rule. Equality of QNames is defined by the eq operator.
This section defines the syntax and semantics of different ItemTypes in terms of the values that they match.
Note:
For an explanation of the EBNF grammar notation (and in particular, the operators ++ and **), see A.1 EBNF.
An item type designator written simply as an EQName (that is, a TypeName) is interpreted as follows:
If the name is written as a lexical QName, then it is expanded using the default type namespace rule.
If the expanded name matches a named item type in the static context, then it is taken as a reference to the corresponding item type. The rules that apply are the rules for the expanded item type definition.
Otherwise, it must match the name of a type in the in-scope schema types in the static context: specifically, an atomic type or a pure union type. See 3.5 Schema Types for details.
Note:
A name in the xs namespace will always fall into this category, since the namespace is reserved. See 2.1.4 Namespaces and QNames.
If the name cannot be resolved to a type, a static error is raised [err:XPST0051].
[Definition: An EnumerationType accepts a fixed set of string values.]
EnumerationType | ::= | "enum" "(" (StringLiteral ++ ",") ")" |
StringLiteral | ::= | AposStringLiteral | QuotStringLiteral |
| /* ws: explicit */ |
An enumeration type has a value space consisting of a set of xs:string values. When matching strings against an enumeration type, strings are always compared using the Unicode codepoint collation.
For example, if an argument of a function declares the required type as enum("red", "green", "blue"), then the string "green" is accepted, while "yellow" is rejected with a type error.
Technically, enumeration types are defined as follows:
[Definition: An enumeration type with a single enumerated value E (such as enum("red")) matches an item S if and only if (a) S is an instance of xs:string, and (b) S is equal to E when compared using Unicode codepoint collation. This is referred to as a singleton enumeration type.]
Note:
When matching a string S against an enumeration type, then apart from the requirement that S is an instance of xs:string, the type annotation of S is immaterial.
A singleton enumeration type whose enumerated value is E is a subtype of xs:string and of every subtype of xs:string that has E in its value space.
Two singleton enumeration types are the same type if and only if they have the same (single) enumerated value, as determined using the Unicode codepoint collation.
An enumeration type with multiple enumerated values is a union of singleton enumeration types, so enum("red", "green", "blue") is equivalent to (enum("red") | enum("green") | enum("blue")).
In consequence, an enumeration type T is a subtype of an enumeration type U if the enumerated values of T are a subset of the enumerated values of U: see 3.3.2 Subtypes of Item Types.
An enumeration type is a generalized atomic type.
It follows from these rules that the expression "red" instance of enum("red", "green", "blue") returns true. By contrast, xs:untypedAtomic("red") instance of enum("red", "green", "blue") returns false; but the coercion rules ensure that where a variable or function declaration specifies an enumeration type as the required type, an xs:untypedAtomic or xs:anyURI value equal to one of the enumerated values will be accepted.
Note:
Some consequences of these rules may not be immediately apparent.
Suppose that an XQuery query contains the declarations:
declare type my:color := enum("red", "green", "orange");
declare type my:fruit := enum("apple", "orange", "banana");
declare variable $orange-color as my:color := "orange";
declare variable $orange-fruit as my:fruit := "orange";The same applies with the equivalent XSLT syntax:
<xsl:item-type name="my:color" as="enum('red', 'green', 'orange')"/>
<xsl:item-type name="my:fruit" as="enum('apple', 'orange', 'banana')"/>
<xsl:variable name="orange-color" as="my:color" select="'orange'"/>
<xsl:variable name="orange-fruit" as="my:fruit" select="'orange'"/>Now, the value of $orange-color is an atomic item whose datum is the string "orange", and whose type annotation is xs:string. Similarly, the value of $orange-fruit is an atomic item whose datum is the string "orange", and whose type annotation is xs:string. That is, the values of the two variables are indistinguishable and interchangeable in every way. In particular, both values are instances of my:color, and both are instances of my:fruit.
This way of handling enumeration values has advantages and disadvantages. On the positive side, it means that enumeration subsets and supersets work cleanly: a value that is an instance of enum("red", "green", "orange") can be used where an instance of enum("red", "orange", "yellow", "green", "blue", "indigo", "violet") is expected. The downside is that labeling a string as an instance of an enumeration type does not provide type safety: a function that expects an instance of my:color can be called with any string that matches one of the required colors, whether or not it has an appropriate type annotation. A function that expects a color can be successfully called passing a fruit, if they happen to have the same name.
In the terminology of computer science, XDM atomic types derived by restriction are nominative types, allowing two types with identical properties but different names to be treated as different types with different instances. By contrast, enumeration types are structural types, where membership of the type is determined purely by a predicate applied to the value.
In consequence, instances of an enumeration type are not annotated as such. The type annotation of such an instance may be xs:string or any type derived by restriction from xs:string, but it will not be the enumeration type itself, which is anonymous.
[Definition: Given two sequence types or item types, the rules in this section determine if one is a subtype of the other. If a type A is a subtype of type B, it follows that every value matched by A is also matched by B.]
Note:
The relationship subtype(A, A) is always true: every type is a subtype of itself.
Note:
The converse is not necessarily true: we cannot infer that if every value matched by A is also matched by B, then A is a subtype of type B. For example, A might be defined as the set of strings matching the regular expression [A-Z]*, while B is the set of strings matching the regular expression [A-Za-z]*; no subtype relationship holds between these types.
The rules for deciding whether one sequence type is a subtype of another are given in 3.3.1 Subtypes of Sequence Types. The rules for deciding whether one item type is a subtype of another are given in 3.3.2 Subtypes of Item Types.
Note:
The subtype relationship is not acyclic. There are cases where subtype(A, B) and subtype(B, A) are both true. This implies that A and B have the same value space, but they can still be different types. For example this applies when A is a union type with member types xs:string and xs:integer, while B is a union type with member types xs:integer and xs:string. These are different types ("23" cast as A produces a string, while "23" cast as B produces an integer, because casting is attempted to each member type in order) but both types have the same value space.
We use the notation A ⊆ B, or itemtype-subtype(A, B) to indicate that an item typeA is a subtype of an item type B. This section defines the rules for deciding whether any two item types have this relationship.
The rules in this section apply to item types, not to item type designators. For example, if the name STR has been defined in the static context as a named item type referring to the type xs:string, then anything said here about the type xs:string applies equally whether it is designated as xs:string or as STR, or indeed as the parenthesized forms (xs:string) or (STR).
References to named item types are handled as described in 3.3.2.10 Subtyping of Named Item Types.
The relationship A ⊆ B is true if and only if at least one of the conditions listed in the following subsections applies:
If A is a singleton enumeration type permitting the string value V, then A ⊆ B is true if anyB ofis the xs:stringfollowing apply:.
B is xs:string.
B is any subtype of xs:string whose value space includes the Section DM of V (regardless of the Section DM of V).
For example, enum("Z") is a subtype of each of the types xs:string, xs:token, and xs:NCName.
Note:
Because a non-singleton enumeration type is defined as a choice type, A ⊆ B also holds if A is enum("red") and B is enum("red", "green"). See 3.3.2.2 Subtyping of Choice Item Types.
Note:
The type enum("red", "green") is not a subtype of xs:NCName, despite the fact that all the enumerated values are valid NCNames. This is because instances of xs:NCName must have a type annotation of xs:NCName or a subtype thereof, whereas instances of enum("red", "green") are not subject to this constraint.
Note:
A type T derived by restriction from xs:string, for example a type with the facet length="0" (which permits only the zero-length string), is not a subtype of any enumeration type, even if every string in the value space of T is an instance of the enumeration type.
Use the arrows to browse significant changes since the 3.1 version of this specification.
See 1 Introduction
Sections with significant changes are marked Δ in the table of contents.
See 1 Introduction
PR 691 2154
Enumeration types are added as a new kind of ItemType, constraining the value space of strings.
Setting the default namespace for elements and types to the special value ##any causes an unprefixed element name to act as a wildcard, matching by local name regardless of namespace.
The terms FunctionType, ArrayType, MapType, and RecordType replace FunctionTest, ArrayTest, MapTest, and RecordTest, with no change in meaning.
Record types are added as a new kind of ItemType, constraining the value space of maps.
Function coercion now allows a function with arity N to be supplied where a function of arity greater than N is expected. For example this allows the function true#0 to be supplied where a predicate function is required.
PR 1817 1853
An inline function may be annotated as a %method, giving it access to its containing map.
See 4.5.6 Inline Function Expressions
See 4.5.6.1 Methods
The symbols × and ÷ can be used for multiplication and division.
The rules for value comparisons when comparing values of different types (for example, decimal and double) have changed to be transitive. A decimal value is no longer converted to double, instead the double is converted to a decimal without loss of precision. This may affect compatibility in edge cases involving comparison of values that are numerically very close.
Operators such as < and > can use the full-width forms < and > to avoid the need for XML escaping.
Operator is-not is introduced, as a complement to the operator is.
Operators precedes and follows are introduced as synonyms for operators << and >>.
The lookup operator ? can now be followed by a string literal, for cases where map keys are strings other than NCNames. It can also be followed by a variable reference.
PR 1763 1830
The syntax on the right-hand side of an arrow operator has been relaxed; a dynamic function call no longer needs to start with a variable reference or a parenthesized expression, it can also be (for example) an inline function expression or a map or array constructor.
The arrow operator => is now complemented by a “mapping arrow” operator =!> which applies the supplied function to each item in the input sequence independently.
PR 1023 1128
It has been clarified that function coercion applies even when the supplied function item matches the required function type. This is to ensure that arguments supplied when calling the function are checked against the signature of the required function type, which might be stricter than the signature of the supplied function item.
A dynamic function call can now be applied to a sequence of functions, and in particular to an empty sequence. This makes it easier to chain a sequence of calls.
The syntax document-node(N), where N is a NameTestUnion, is introduced as an abbreviation for document-node(element(N)). For example, document-node(*) matches any well-formed XML document (as distinct from a document fragment).
See 3.2.7 Node Types
QName literals are new in 4.0.
Path expressions are extended to handle JNodes (found in trees of maps and arrays) as well as XNodes (found in trees representing parsed XML).
PR 28
Multiple for and let clauses can be combined in an expression without an intervening return keyword.
PR 159
Keyword arguments are allowed on static function calls, as well as positional arguments.
PR 202
The presentation of the rules for the subtype relationship between sequence types and item types has been substantially rewritten to improve clarity; no change to the semantics is intended.
PR 230
The rules for “errors and optimization” have been tightened up to disallow many cases of optimizations that alter error behavior. In particular there are restrictions on reordering the operands of and and or, and of predicates in filter expressions, in a way that might allow the processor to raise dynamic errors that the author intended to prevent.
PR 254
The term "function conversion rules" used in 3.1 has been replaced by the term "coercion rules".
The coercion rules allow “relabeling” of a supplied atomic item where the required type is a derived atomic type: for example, it is now permitted to supply the value 3 when calling a function that expects an instance of xs:positiveInteger.
PR 284
Alternative syntax for conditional expressions is available: if (condition) { X }.
PR 286
Element and attribute tests can include alternative names: element(chapter|section), attribute(role|class).
See 3.2.7 Node Types
The NodeTest in an AxisStep now allows alternatives: ancestor::(section|appendix)
See 3.2.7 Node Types
Element and attribute tests of the form element(N) and attribute(N) now allow N to be any NameTest, including a wildcard.
PR 324
String templates provide a new way of constructing strings: for example `{$greeting}, {$planet}!` is equivalent to $greeting || ', ' || $planet || '!'
PR 326
Support for higher-order functions is now a mandatory feature (in 3.1 it was optional).
See 5 Conformance
PR 344
A for member clause is added to FLWOR expressions to allow iteration over an array.
PR 368
The concept of the context item has been generalized, so it is now a context value. That is, it is no longer constrained to be a single item.
PR 433
Numeric literals can now be written in hexadecimal or binary notation; and underscores can be included for readability.
PR 519
The rules for tokenization have been largely rewritten. In some cases the revised specification may affect edge cases that were handled in different ways by different 3.1 processors, which could lead to incompatible behavior.
PR 521
New abbreviated syntax is introduced (focus function) for simple inline functions taking a single argument. An example is fn { ../@code }
PR 603
The rules for reporting type errors during static analysis have been changed so that a processor has more freedom to report errors in respect of constructs that are evidently wrong, such as @price/@value, even though dynamic evaluation is defined to return an empty sequence rather than an error.
PR 606
Element and attribute tests of the form element(A|B) and attribute(A|B) are now allowed.
PR 691
Enumeration types are added as a new kind of ItemType, constraining the value space of strings.
PR 728
The syntax record(*) is allowed; it matches any map.
PR 815
The coercion rules now allow conversion in either direction between xs:hexBinary and xs:base64Binary.
PR 911
The coercion rules now allow any numeric type to be implicitly converted to any other, for example an xs:double is accepted where the required type is xs:decimal.
PR 996
The value of a predicate in a filter expression can now be a sequence of integers.
PR 1031
An otherwise operator is introduced: A otherwise B returns the value of A, unless it is an empty sequence, in which case it returns the value of B.
PR 1071
In map constructors, the keyword map is now optional, so map { 0: false(), 1: true() } can now be written { 0: false(), 1: true() }, provided it is used in a context where this creates no ambiguity.
PR 1131
A positional variable can be defined in a for expression.
The type of a variable used in a for expression can be declared.
The type of a variable used in a let expression can be declared.
PR 1132
Choice item types (an item type allowing a set of alternative item types) are introduced.
PR 1163
Filter expressions for maps and arrays are introduced.
PR 1181
The default namespace for elements and types can be set to the value ##any, allowing unprefixed names in axis steps to match elements with a given local name in any namespace.
If the default namespace for elements and types has the special value ##any, then an unprefixed name in a NameTest acts as a wildcard, matching names in any namespace or none.
PR 1197
The keyword fn is allowed as a synonym for function in function types, to align with changes to inline function declarations.
In inline function expressions, the keyword function may be abbreviated as fn.
PR 1212
XPath 3.0 included empty-sequence and item as reserved function names, and XPath 3.1 added map and array. This was unnecessary since these names never appear followed by a left parenthesis at the start of an expression. They have therefore been removed from the list. New keywords introducing item types, such as record and enum, have not been included in the list.
PR 1217
Predicates in filter expressions for maps and arrays can now be numeric.
PR 1249
A for key/value clause is added to FLWOR expressions to allow iteration over maps.
PR 1250
Several decimal format properties, including minus sign, exponent separator, percent, and per-mille, can now be rendered as arbitrary strings rather than being confined to a single character.
PR 1265
The rules regarding the document-uri property of nodes returned by the fn:collection function have been relaxed.
PR 1344
Parts of the static context that were there purely to assist in static typing, such as the statically known documents, were no longer referenced and have therefore been dropped.
The static typing option has been dropped.
The static typing feature has been dropped.
See 5 Conformance
PR 1361
The term atomic value has been replaced by atomic item.
See 2.1.3 Values
PR 1384
If a type declaration is present, the supplied values in the input sequence are now coerced to the required type. Type declarations are now permitted in XPath as well as XQuery.
PR 1496
The context value static type, which was there purely to assist in static typing, has been dropped.
PR 1498
The EBNF operators ++ and ** have been introduced, for more concise representation of sequences using a character such as "," as a separator. The notation is borrowed from Invisible XML.
See 2.1 Terminology
The EBNF notation has been extended to allow the constructs (A ++ ",") (one or more occurrences of A, comma-separated, and (A ** ",") (zero or more occurrences of A, comma-separated.
The EBNF operators ++ and ** have been introduced, for more concise representation of sequences using a character such as "," as a separator. The notation is borrowed from Invisible XML.
See A.1 EBNF
See A.1.1 Notation
PR 1501
The coercion rules now apply recursively to the members of an array and the entries in a map.
PR 1532
Four new axes have been defined: preceding-or-self, preceding-sibling-or-self, following-or-self, and following-sibling-or-self.
See 4.6.4.1 Axes
PR 1577
The syntax record() is allowed; the only thing it matches is an empty map.
PR 1686
With the pipeline operator ->, the result of an expression can be bound to the context value before evaluating another expression.
PR 1696
Parameter names may be included in a function signature; they are purely documentary.
PR 1703
Ordered maps are introduced.
See 4.13.1 Maps
The order of key-value pairs in the map constructor is now retained in the constructed map.
PR 1874
The coercion rules now reorder the entries in a map when the required type is a record type.
PR 1898
The rules for subtyping of document node types have been refined.
PR 1991
Named record types used in the signatures of built-in functions are now available as standard in the static context.
PR 2026
The module feature is no longer an optional feature; processing of library modules is now required.
See 5 Conformance
PR 2031
The terms XNode and JNode are introduced; the existing term node remains in use as a synonym for XNode where the context does not specify otherwise.
See 2.1.3 Values
JNodes are introduced
PR 2055
Sequences, arrays, and maps can be destructured in a let expression to extract their components into multiple variables.
PR 2094
A general expression is allowed within a map constructor; this facilitates the creation of maps in which the presence or absence of particular keys is decided dynamically.
PR 2115
This section describes and formalizes a convention that was already in use, but not explicitly stated, in earlier versions of the specification.