Please check the errata for any errors or issues reported since publication.
See also translations.
Copyright © 2000 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
XML is a versatile markup language, capable of labeling the information content of diverse data sources, including structured and semi-structured documents, relational databases, and object repositories. A query language that uses the structure of XML intelligently can express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources.
A list of changes made since XQuery 3.1 can be found in K Change Log.
This is a draft prepared by the QT4CG (officially registered in W3C as the XSLT Extensions Community Group). Comments are invited.
This section discusses each of the basic kinds of expression. Each kind of expression has a name such as PathExpr, which is introduced on the left side of the grammar production that defines the expression. Since XQuery 4.0 is a composable language, each kind of expression is defined in terms of other expressions whose operators have a higher precedence. In this way, the precedence of operators is represented explicitly in the grammar.
The order in which expressions are discussed in this document does not reflect the order of operator precedence. In general, this document introduces the simplest kinds of expressions first, followed by more complex expressions. For the complete grammar, see Appendix [A XQuery 4.0 Grammar].
[Definition: A query consists of one or more modules.] If a query is executable, one of its modules has a Query Body containing an expression whose value is the result of the query. An expression is represented in the XQuery grammar by the symbol Expr.
Expr | ::= | (ExprSingle ++ ",") |
ExprSingle | ::= | FLWORExpr |
The XQuery 4.0 operator that has lowest precedence is the comma operator, which is used to combine two operands to form a sequence. As shown in the grammar, a general expression (Expr) can consist of multiple ExprSingle operands, separated by commas.
The name ExprSingle denotes an expression that does not contain a top-level comma operator (despite its name, an ExprSingle may evaluate to a sequence containing more than one item.)
The symbol ExprSingle is used in various places in the grammar where an expression is not allowed to contain a top-level comma. For example, each of the arguments of a function call must be a ExprSingle, because commas are used to separate the arguments of a function call.
After the comma, the expressions that have next lowest precedence are FLWORExpr,QuantifiedExpr, SwitchExpr, TypeswitchExpr, IfExpr, TryCatchExpr, and OrExpr. Each of these expressions is described in a separate section of this document.
Switch expressions now allow a case clause to match multiple atomic items. [Issue 328 PR 364 7 March 2023]
Switch and typeswitch expressions can now be written with curly brackets, to improve readability. [Issue 365 PR 587 7 November 2023]
The comparand expression in a switch expression can be omitted, allowing the switch cases to be provided as arbitrary boolean expressions. [Issue 671 PR 678 12 September 2023]
SwitchExpr | ::= | "switch" SwitchComparand? (SwitchCases | BracedSwitchCases) |
SwitchComparand | ::= | "(" Expr? ")" |
SwitchCases | ::= | SwitchCaseClause+ "default" "return" ExprSingle |
BracedSwitchCases | ::= | "{" SwitchCases "}" |
SwitchCaseClause | ::= | ("case" SwitchCaseOperand)+ "return" ExprSingle |
SwitchCaseOperand | ::= | Expr |
The switch expression chooses one of several expressions to evaluate based on the input value.
In a switch expression, the switch keyword is followed by an expression enclosed in parentheses, called the switch comparand. This is the expression whose value is being compared. This expression is optional, and defaults to true(). The remainder of the switch expression consists of one or more case clauses, with one or more case operand expressions each, and a default clause.
The first step in evaluating a switch expression is to apply atomization to the value of the switch comparand. Call the result the switch value. If the switch value is a sequence of length greater than one, a type error is raised [err:XPTY0004]. In the absence of a switch comparand, the switch value is the xs:boolean value true.
The switch value is compared to each SwitchCaseOperand in turn until a match is found or the list is exhausted. The matching is performed as follows:
The SwitchCaseOperand is evaluated.
The resulting value is atomized: call this the case value.
If the case value is an empty sequence, then a match occurs if and only if the switch value is an empty sequence.
Otherwise, the singleton switch value is compared individually with each item in the case value in turn, and a match occurs if and only if these two atomic items compare equal under the rules of the fn:deep-equal function with default options, using the default collation in the static context.
[Definition: The effective case of a switch expression is the first case clause that matches, using the rules given above, or the default clause if no such case clause exists.] The value of the switch expression is the value of the return expression in the effective case.
Switch expressions have rules regarding the propagation of dynamic errors: see 2.4.5 Guarded Expressions. These rules mean that the return clauses of a switch expression must not raise any dynamic errors except in the effective case. Dynamic errors raised in the operand expressions of the switch or the case clauses are propagated; however, an implementation must not raise dynamic errors in the operand expressions of case clauses that occur after the effective case. An implementation is permitted to raise dynamic errors in the operand expressions of case clauses that occur before the effective case, but not required to do so.
The following example shows how a switch expression might be used:
switch ($animal) {
case "Cow" return "Moo"
case "Cat" return "Meow"
case "Duck", "Goose" return "Quack"
default return "What's that odd noise?"
}The curly brackets in a switch expression are optional. The above example can equally be written:
switch ($animal) case "Cow" return "Moo" case "Cat" return "Meow" case "Duck", "Goose" return "Quack" default return "What's that odd noise?"
The following example illustrates a switch expression where the comparand is defaulted to true:
switch {
case ($a le $b) return "lesser"
case ($a ge $b) return "greater"
case ($a eq $b) return "equal"
default return "not comparable"
}switch () {
case ($a le $b) return "lesser"
case ($a ge $b) return "greater"
case ($a eq $b) return "equal"
default return "not comparable"
}Note:
The comparisons are performed using the fn:deep-equal function, after atomization. This means that a case expression such as @married tests fn:data(@married) rather than fn:boolean(@married). If the effective boolean value of the expression is wanted, this can be achieved with an explicit call of fn:boolean.
The grammar of XQuery 4.0 uses the same simple Extended Backus-Naur Form (EBNF) notation as [XML 1.0] with the following differences.
The notation XYZ ** "," indicates a sequence of zero or more occurrences of XYZ, with a single comma between adjacent occurrences.
The notation XYZ ++ "," indicates a sequence of one or more occurrences of XYZ, with a single comma between adjacent occurrences.
All named symbols have a name that begins with an uppercase letter.
It adds a notation for referring to productions in external specifications.
Comments or extra-grammatical constraints on grammar productions are between '/*' and '*/' symbols.
A 'xgc:' prefix is an extra-grammatical constraint, the details of which are explained in A.1.2 Extra-grammatical Constraints
A 'ws:' prefix explains the whitespace rules for the production, the details of which are explained in A.3.5 Whitespace Rules
A 'gn:' prefix means a 'Grammar Note', and is meant as a clarification for parsing rules, and is explained in A.1.3 Grammar Notes. These notes are not normative.
The terminal symbols for this grammar include the quoted strings used in the production rules below, and the terminal symbols defined in section A.3.1 Terminal Symbols. The grammar is a little unusual in that parsing and tokenization are somewhat intertwined: for more details see A.3 Lexical structure.
The EBNF notation is described in more detail in A.1.1 Notation.