Please check the errata for any errors or issues reported since publication.
See also translations.
This document is also available in these non-normative formats: XML.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
XML is a versatile markup language, capable of labeling the information content of diverse data sources, including structured and semi-structured documents, relational databases, and object repositories. A query language that uses the structure of XML intelligently can express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources.
A list of changes made since XQuery 3.1 can be found in J Change Log.
This is a draft prepared by the QT4CG (officially registered in W3C as the XSLT Extensions Community Group). Comments are invited.
The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).
This section discusses each of the basic kinds of expression. Each kind of expression has a name such as PathExpr, which is introduced on the left side of the grammar production that defines the expression. Since XQuery 4.0 is a composable language, each kind of expression is defined in terms of other expressions whose operators have a higher precedence. In this way, the precedence of operators is represented explicitly in the grammar.
The order in which expressions are discussed in this document does not reflect the order of operator precedence. In general, this document introduces the simplest kinds of expressions first, followed by more complex expressions. For the complete grammar, see Appendix [A XQuery 4.0 Grammar].
[Definition: A query consists of one or more modules.] If a query is executable, one of its modules has a Query Body containing an expression whose value is the result of the query. An expression is represented in the XQuery grammar by the symbol Expr.
Expr | ::= | (ExprSingle ++ ",") |
ExprSingle | ::= | FLWORExpr |
ExprSingle | ::= | FLWORExpr |
FLWORExpr | ::= | InitialClauseIntermediateClause* ReturnClause |
QuantifiedExpr | ::= | ("some" | "every") (QuantifierBinding ++ ",") "satisfies" ExprSingle |
SwitchExpr | ::= | "switch" SwitchComparand (SwitchCases | BracedSwitchCases) |
TypeswitchExpr | ::= | "typeswitch" "(" Expr ")" (TypeswitchCases | BracedTypeswitchCases) |
IfExpr | ::= | "if" "(" Expr ")" (UnbracedActions | BracedAction) |
TryCatchExpr | ::= | TryClauseCatchClause+ |
OrExpr | ::= | AndExpr ("or" AndExpr)* |
The XQuery 4.0 operator that has lowest precedence is the comma operator, which is used to combine two operands to form a sequence. As shown in the grammar, a general expression (Expr) can consist of multiple ExprSingle operands, separated by commas.
The name ExprSingle denotes an expression that does not contain a top-level comma operator (despite its name, an ExprSingle may evaluate to a sequence containing more than one item.)
The symbol ExprSingle is used in various places in the grammar where an expression is not allowed to contain a top-level comma. For example, each of the arguments of a function call must be a ExprSingle, because commas are used to separate the arguments of a function call.
After the comma, the expressions that have next lowest precedence are FLWORExpr,QuantifiedExpr, SwitchExpr, TypeswitchExpr, IfExpr, TryCatchExpr, and OrExpr. Each of these expressions is described in a separate section of this document.
An arrow operator may now be followed by any dynamic function call; theThe syntax on the right-hand side of an arrow operator has been relaxed; a dynamic function call no longer needs to start with a variable reference or a parenthesized expression, it can also be (for example) an inline function expression or a map or array constructor. [IssueIssues 1716 1829 ]
Arrow expressions apply a function to a value, using the value of the left-hand expression as the first argument to the function.
The arrow syntax is particularly helpful when applying multiple functions to a value in turn. For example, the following expression invites syntax errors due to misplaced parentheses:
tokenize((normalize-unicode(upper-case($string))),"\s+")
In the following reformulation, it is easier to see that the parentheses are balanced:
$string => upper-case() => normalize-unicode() => tokenize("\s+")When the operator is written as =!>, the function is applied to each item in the sequence in turn. Assuming that $string is a single string, the above example could equally be written:
$string =!> upper-case() =!> normalize-unicode() =!> tokenize("\s+")The difference between the two operators is seen when the left-hand operand evaluates to a sequence:
(1, 2, 3) => avg()
returns a value of only one item, 2, the average of all three items, whereas
(1, 2, 3) =!> avg()
returns the original sequence of three items, (1, 2, 3), each item being the average of itself. The following example:
"The cat sat on the mat"
=> tokenize()
=!> concat(".")
=!> upper-case()
=> string-join(" ")returns "THE. CAT. SAT. ON. THE. MAT.". The first arrow could be written either as => or =!> because the operand is a singleton; the next two arrows have to be =!> because the function is applied to each item in the tokenized sequence individually; the final arrow must be => because the string-join function applies to the sequence as a whole.
Note:
It may be useful to think of this as a map/reduce pipeline. The functions introduced by =!> are mapping operations; the function introduced by => is a reduce operation.
The following example introduces an inline function to the pipeline:
(1 to 5) =!> xs:double() =!> math:sqrt() =!> fn($a) { $a + 1 }() => sum()This is equivalent to sum((1 to 5) ! (math:sqrt(xs:double(.)) + 1)).
The same effect can be achieved using a focus function:
(1 to 5) =!> xs:double() =!> math:sqrt() =!> fn { . + 1 }() => sum()It could also be expressed using the mapping operator !:
(1 to 5) ! xs:double(.) ! math:sqrt(.) ! (. + 1) => sum()
Where the value of an expression is a map containing functions, simulating the behavior of objects in object-oriented languages, then the lookup arrow operator=?> can be used to retrieve a function from the map and to invoke the function with the map as its first argument. For example, if my:rectangle returns a map with entries width, height, expand, and area, then it becomes possible to write:
my:rectangle(3, 5) =?> expand(2) =?> area()
Note:
The ArgumentList of the function call may include PlaceHolders, though this is not especially useful. For example, the expression "$" => concat(?) is equivalent to concat("$", ?): its value is a function that prepends a supplied string with a $ symbol.
Note:
The ArgumentList may include keyword arguments if the function is identified statically (that is, by name). For example, the following is valid: $xml => xml-to-json(indent := true()) => parse-json(escape := false()).
The sequence arrow operator thus applies the supplied function to the left-hand operand as a whole, while the mapping arrow operator applies the function to each item in the value of the left-hand operand individually. In the case where the result of the left-hand operand is a single item, the two operators have the same effect.
Note:
The mapping arrow symbol =!> is intended to suggest a combination of function application (=>) and sequence mapping (!) combined in a single operation.
Similarly, the lookup arrow symbol =?> is intended to suggest a combination of function application (=>) and map lookup (?) in a single operation.
Note:
The expression U => $V(X)(Y) satisfies the grammar, but its meaning might not be obvious. It is interpreted as being equivalent to $V(X)(U, Y). The same applies to the mapping arrow operator: U =!> $V(X)(Y) is interpreted as for $u in U return $V(X)($u, Y). This follows from the way that the syntax of a DynamicFunctionCall is defined.
The construct on the right-hand side of the arrow operator (=>) can either be a static function call, or a restricted form of dynamic function call. The restrictions are there to ensure that the two forms can be distinguished by the parser with limited lookahead. For a dynamic call, the function item to be called can be expressed as a variable reference, an inline function expression, a named function reference, a map constructor, or an array constructor. Any other expression used to return the required function item must be enclosed in parentheses.
[Definition: The sequence arrow operator=> applies a function to a supplied sequence.] It is defined as follows:
If the arrow is followed by a static FunctionCall:
Given a UnaryExprU and a FunctionCallF(A, B, C...), the expression U => F(A, B, C...) is equivalent to the expression F(U, A, B, C...).
If the arrow is followed by a DynamicFunctionCallRestrictedDynamicCall:
Given a UnaryExprU, and a DynamicFunctionCallRestrictedDynamicCallE(A, B, C...), the expression U => E(A, B, C...) is equivalent to the expressiondynamic function call E(U, A, B, C...).
Note:
Although the syntax of an arrow expression makes use of the grammatical productions FunctionCall and DynamicFunctionCall, these are not evaluated in the same way as a function call that appears as a free-standing expression.
The arrow operator => is now complemented by a “mapping arrow” operator =!> which applies the supplied function to each item in the input sequence independently.
[Definition: The mapping arrow operator=!> applies a function to each item in a sequence.] It is defined as follows:
If the arrow is followed by a static FunctionCall:
Given a UnaryExprU and a FunctionCallF(A, B, C...), the expression U =!> F(A, B, C...) is equivalent to the expression for $u in U return F($u, A, B, C...).
If the arrow is followed by a DynamicFunctionCallRestrictedDynamicCall:
Given a UnaryExprU, and a DynamicFunctionCallRestrictedDynamicCallE(A, B, C...), the expression U => E(A, B, C...) is equivalent to the expression for $u in U return E($u, A, B, C...).
The grammar of XQuery 4.0 uses the same simple Extended Backus-Naur Form (EBNF) notation as [XML 1.0] with the following differences.
The notation XYZ ** "," indicates a sequence of zero or more occurrences of XYZ, with a single comma between adjacent occurrences.
The notation XYZ ++ "," indicates a sequence of one or more occurrences of XYZ, with a single comma between adjacent occurrences.
All named symbols have a name that begins with an uppercase letter.
It adds a notation for referring to productions in external specifications.
Comments or extra-grammatical constraints on grammar productions are between '/*' and '*/' symbols.
A 'xgc:' prefix is an extra-grammatical constraint, the details of which are explained in A.1.2 Extra-grammatical Constraints
A 'ws:' prefix explains the whitespace rules for the production, the details of which are explained in A.3.5 Whitespace Rules
A 'gn:' prefix means a 'Grammar Note', and is meant as a clarification for parsing rules, and is explained in A.1.3 Grammar Notes. These notes are not normative.
The terminal symbols for this grammar include the quoted strings used in the production rules below, and the terminal symbols defined in section A.3.1 Terminal Symbols. The grammar is a little unusual in that parsing and tokenization are somewhat intertwined: for more details see A.3 Lexical structure.
The EBNF notation is described in more detail in A.1.1 Notation.
Use the arrows to browse significant changes since the 3.1 version of this specification.
See 1 Introduction
Sections with significant changes are marked Δ in the table of contents.
See 1 Introduction
Setting the default namespace for elements and types to the special value ##any causes an unprefixed element name to act as a wildcard, matching by local name regardless of namespace.
The terms FunctionType, ArrayType, MapType, and RecordType replace FunctionTest, ArrayTest, MapTest, and RecordTest, with no change in meaning.
Record types are added as a new kind of ItemType, constraining the value space of maps.
Function coercion now allows a function with arity N to be supplied where a function of arity greater than N is expected. For example this allows the function true#0 to be supplied where a predicate function is required.
The symbols × and ÷ can be used for multiplication and division.
The rules for value comparisons when comparing values of different types (for example, decimal and double) have changed to be transitive. A decimal value is no longer converted to double, instead the double is converted to a decimal without loss of precision. This may affect compatibility in edge cases involving comparison of values that are numerically very close.
Operators such as < and > can use the full-width forms < and > to avoid the need for XML escaping.
The lookup operator ? can now be followed by a string literal, for cases where map keys are strings other than NCNames. It can also be followed by a variable reference.
The syntax on the right-hand side of an arrow operator has been relaxed; a dynamic function call no longer needs to start with a variable reference or a parenthesized expression, it can also be (for example) an inline function expression or a map or array constructor.
The arrow operator => is now complemented by a “mapping arrow” operator =!> which applies the supplied function to each item in the input sequence independently.
All implementations must now predeclare the namespace prefixes math, map, array, and err. In XQuery 3.1 it was permitted but not required to predeclare these namespaces.
Function definitions in the static context may now have optional parameters, provided this does not cause ambiguity across multiple function definitions with the same name. Optional parameters are given a default value, which can be any expression, including one that depends on the context of the caller (so an argument can default to the context value).
$err:map contains entries for all values that are bound to the single variables.
$err:stack-trace provides information about the current state of execution.
PR 1023 1128
It has been clarified that function coercion applies even when the supplied function item matches the required function type. This is to ensure that arguments supplied when calling the function are checked against the signature of the required function type, which might be stricter than the signature of the supplied function item.
Parameter names may be included in a function signature; they are purely documentary.
PR tba
Predicates in filter expressions for maps and arrays can now be numeric.
The ordered { E } and unordered { E } expressions are retained for backwards compatibility reasons, but in XQuery 4.0 they are deprecated and have no useful effect.
See 4.15 Ordered and Unordered Expressions
The ordering mode declaration is retained for backwards compatibility reasons, but in XQuery 4.0 it is deprecated and has no useful effect.
The static typing feature has been dropped.
See 6 Conformance
Parts of the static context that were there purely to assist in static typing, such as the statically known documents, were no longer referenced and have therefore been dropped.
The syntax record() is allowed; the only thing it matches is an empty map.
The context value static type, which was there purely to assist in static typing, has been dropped.
Four new axes have been defined: preceding-or-self, preceding-sibling-or-self, following-or-self, and following-sibling-or-self.
See 4.6.4.1 Axes
The syntax document-node(N), where N is a NameTestUnion, is introduced as an abbreviation for document-node(element(N)). For example, document-node(*) matches any well-formed XML document (as distinct from a document fragment).
See 3.2.7 Node Types
An arrow operator may now be followed by any dynamic function call; the dynamic function call no longer needs to start with a variable reference or a parenthesized expression.
PR 159
Keyword arguments are allowed on static function calls, as well as positional arguments.
PR 202
The presentation of the rules for the subtype relationship between sequence types and item types has been substantially rewritten to improve clarity; no change to the semantics is intended.
PR 230
The rules for “errors and optimization” have been tightened up to disallow many cases of optimizations that alter error behavior. In particular there are restrictions on reordering the operands of and and or, and of predicates in filter expressions, in a way that might allow the processor to raise dynamic errors that the author intended to prevent.
PR 254
The term "function conversion rules" used in 3.1 has been replaced by the term "coercion rules".
The coercion rules allow “relabeling” of a supplied atomic item where the required type is a derived atomic type: for example, it is now permitted to supply the value 3 when calling a function that expects an instance of xs:positiveInteger.
The value bound to a variable in a let clause is now converted to the declared type by applying the coercion rules.
The coercion rules are now used when binding values to variables (both global variable declarations and local variable bindings). This aligns XQuery with XSLT, and means that the rules for binding to variables are the same as the rules for binding to function parameters.
PR 284
Alternative syntax for conditional expressions is available: if (condition) { X } else { Y }, with the else part being optional.
PR 286
Element and attribute tests can include alternative names: element(chapter|section), attribute(role|class).
See 3.2.7 Node Types
The NodeTest in an AxisStep now allows alternatives: ancestor::(section|appendix)
See 3.2.7 Node Types
Element and attribute tests of the form element(N) and attribute(N) now allow N to be any NameTest, including a wildcard.
PR 324
String templates provide a new way of constructing strings: for example `{$greeting}, {$planet}!` is equivalent to $greeting || ', ' || $planet || '!'
PR 326
Support for higher-order functions is now a mandatory feature (in 3.1 it was optional).
See 6 Conformance
PR 344
A for member clause is added to FLWOR expressions to allow iteration over an array.
PR 364
Switch expressions now allow a case clause to match multiple atomic items.
PR 368
The concept of the context item has been generalized, so it is now a context value. That is, it is no longer constrained to be a single item.
PR 433
Numeric literals can now be written in hexadecimal or binary notation; and underscores can be included for readability.
PR 483
The start clause in window expressions has become optional, as well as the when keyword and its associated expression.
PR 493
A new variable err:map is available, capturing all error information in one place.
PR 519
The rules for tokenization have been largely rewritten. In some cases the revised specification may affect edge cases that were handled in different ways by different 3.1 processors, which could lead to incompatible behavior.
PR 521
New abbreviated syntax is introduced (focus function) for simple inline functions taking a single argument. An example is fn { ../@code }
PR 587
Switch and typeswitch expressions can now be written with curly brackets, to improve readability.
PR 603
The rules for reporting type errors during static analysis have been changed so that a processor has more freedom to report errors in respect of constructs that are evidently wrong, such as @price/@value, even though dynamic evaluation is defined to return an empty sequence rather than an error.
PR 606
Element and attribute tests of the form element(A|B) and attribute(A|B) are now allowed.
PR 635
The rules for the consistency of schemas imported by different query modules, and for consistency between imported schemas and those used for validating input documents, have been defined with greater precision. It is now recognized that these schemas will not always be identical, and that validation with respect to different schemas may produce different outcomes, even if the components of one are a subset of the components of the other.
PR 659
In previous versions the interpretation of location hints in import schema declarations was entirely at the discretion of the processor. To improve interoperability, XQuery 4.0 recommends (but does not mandate) a specific strategy for interpreting these hints.
PR 678
The comparand expression in a switch expression can be omitted, allowing the switch cases to be provided as arbitrary boolean expressions.
PR 682
The values true() and false() are allowed in function annotations, and negated numeric literals are also allowed.
PR 691
Enumeration types are added as a new kind of ItemType, constraining the value space of strings.
PR 728
The syntax record(*) is allowed; it matches any map.
PR 753
The default namespace for elements and types can now be declared to be fixed for a query module, meaning it is unaffected by a namespace declaration appearing on a direct element constructor.
PR 815
The coercion rules now allow conversion in either direction between xs:hexBinary and xs:base64Binary.
PR 820
The value bound to a variable in a for clause is now converted to the declared type by applying the coercion rules.
PR 837
A deep lookup operator ?? is provided for searching trees of maps and arrays.
PR 911
The coercion rules now allow any numeric type to be implicitly converted to any other, for example an xs:double is accepted where the required type is xs:double.
PR 943
A FLWOR expression may now include a while clause, which causes early exit from the iteration when a condition is encountered.
PR 985
With the lookup arrow expression and the =?> operator, a function in a map can be looked up and called with the map as first argument.
PR 996
The value of a predicate in a filter expression can now be a sequence of integers.
PR 1031
An otherwise operator is introduced: A otherwise B returns the value of A, unless it is an empty sequence, in which case it returns the value of B.
PR 1071
In map constructors, the keyword map is now optional, so map { 0: false(), 1: true() } can now be written { 0: false(), 1: true() }, provided it is used in a context where this creates no ambiguity.
PR 1125
Lookup expressions can now take a modifier (such as keys, values, or pairs) enabling them to return structured results rather than a flattened sequence.
PR 1132
Choice item types (an item type allowing a set of alternative item types) are introduced.
PR 1163
Filter expressions for maps and arrays are introduced.
PR 1181
The default namespace for elements and types can be set to the value ##any, allowing unprefixed names in axis steps to match elements with a given local name in any namespace.
If the default namespace for elements and types has the special value ##any, then an unprefixed name in a NameTest acts as a wildcard, matching names in any namespace or none.
The default namespace for elements and types can be set to the value ##any, allowing unprefixed names in axis steps to match elements with a given local name in any namespace.
PR 1197
The keyword fn is allowed as a synonym for function in function types, to align with changes to inline function declarations.
In inline function expressions, the keyword function may be abbreviated as fn.
PR 1212
XQuery 3.0 included empty-sequence and item as reserved function names, and XQuery 3.1 added map and array. This was unnecessary since these names never appear followed by a left parenthesis at the start of an expression. They have therefore been removed from the list. New keywords introducing item types, such as record and enum, have not been included in the list.
PR 1249
A for key/value clause is added to FLWOR expressions to allow iteration over a map.
PR 1250
Several decimal format properties, including minus sign, exponent separator, percent, and per-mille, can now be rendered as arbitrary strings rather than being confined to a single character.
PR 1254
The rules concerning the interpretation of xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes have been tightened up.
PR 1265
The rules regarding the document-uri property of nodes returned by the fn:collection function have been relaxed.
PR 1344
Parts of the static context that were there purely to assist in static typing, such as the statically known documents, were no longer referenced and have therefore been dropped.
The static typing option has been dropped.
PR 1361
The term atomic value has been replaced by atomic item.
See 2.1.2 Values
PR 1384
If a type declaration is present, the supplied values in the input sequence are now coerced to the required type. Type declarations are now permitted in XPath as well as XQuery.
PR 1432
In earlier versions, the static context for the initializing expression excluded the variable being declared. This restriction has been lifted.
PR 1480
When the element name matches a language keyword such as div or value, it must now be written in quotes as a string literal. This is a backwards incompatible change.
See 4.12.3.1 Computed Element Constructors
When the attribute name matches a language keyword such as by or of, it must now be written in quotes as a string literal. This is a backwards incompatible change.
PR 1498
The EBNF operators ++ and ** have been introduced, for more concise representation of sequences using a character such as "," as a separator. The notation is borrowed from Invisible XML.
See 2.1 Terminology
The EBNF notation has been extended to allow the constructs (A ++ ",") (one or more occurrences of A, comma-separated, and (A ** ",") (zero or more occurrences of A, comma-separated.
The EBNF operators ++ and ** have been introduced, for more concise representation of sequences using a character such as "," as a separator. The notation is borrowed from Invisible XML.
See A.1 EBNF
See A.1.1 Notation
PR 1513
When the processing instruction name matches a language keyword such as try or validate, it must now be written in quotes as a string literal. This is a backwards incompatible change.
See 4.12.3.5 Computed Processing Instruction Constructors
When the namespace prefix matches a language keyword such as as or at, it must now be written in quotes as a string literal. This is a backwards incompatible change.
PR 1686
With the pipeline operator ->, the result of an expression can be bound to the context value before evaluating another expression.
PR 1703
Ordered maps are introduced.
See 4.14.1 Maps
The order of key-value pairs in the map constructor is now retained in the constructed map.