XPath and XQuery Functions and Operators 4.0

5 Processing strings

This section specifies functions and operators on the [XML Schema Part 2: Datatypes Second Edition]xs:string datatype and the datatypes derived from it.

5.5 Functions based on substring matching

The functions described in this section examine a string $arg1 to see whether it contains another string $arg2 as a substring. The result depends on whether $arg2 is a substring of $arg1, and if so, on the range of characters in $arg1 which $arg2 matches.

When the Unicode codepoint collation is used, this simply involves determining whether $arg1 contains a contiguous sequence of characters whose codepoints are the same, one for one, with the codepoints of the characters in $arg2.

When a collation is specified, the rules are more complex.

All collations support the capability of deciding whether two strings are considered equal, and if not, which of the strings should be regarded as preceding the other. For functions such as fn:compare, this is all that is required. For other functions, such as fn:contains, the collation needs to support an additional property: it must be able to decompose the string into a sequence of collation units, each unit consisting of one or more characters, such that two strings can be compared by pairwise comparison of these units. (“collation unit” is equivalent to "collation element" as defined in [UTS #10].) The string $arg1 is then considered to contain $arg2 as a substring if the sequence of collation units corresponding to $arg2 is a subsequence of the sequence of the collation units corresponding to $arg1. The characters in $arg1 that match are the characters corresponding to these collation units.

[Definition] The term collation unit as used in this specification is equivalent to the term collation element used in [UTS #10].

The string Q is then considered to contain P as a substring if the sequence of collation units corresponding to P is a subsequence of the sequence of collation units corresponding to Q. The characters in P that match are the characters corresponding to these collation units.

This rule may occasionally lead to surprises. For example, consider a collation that treats "Jaeger" and "Jäger" as equal. It might do this by treating "ä" as representing two collation units, in which case the expression fn:contains("Jäger", "eg") will return true. Alternatively, a collation might treat "ae" as a single collation unit, in which case the expression fn:contains("Jaeger", "eg") will return false. The results of these functions thus depend strongly on the properties of the collation that is used.

In addition, collations may specify that some collation units should be ignored during matching. If hyphen is an ignored collation unit, then fn:contains("code-point", "codepoint") will be true, and fn:contains("codepoint", "-") will also be true.

In the definitions below, we refer to the terms match and minimal match as defined in definitions DS2 and DS4 of [UTS #10]. In applying these definitions:

In the rules for the functions defined in this section, we use the following terms taken from [UTS #10]:

[Definition] The term match is used in the sense of definition DS2 from [UTS #10].
[Definition] The term minimal match is used in the sense of definition DS4 from [UTS #10].

In the definitions in [UTS #10], these rules involve a number of parameters. In the context of the functions defined in this section, these parameters are interpreted as follows:

C is the collation; that is, the value of the $collation argument if specified, otherwise the default collation.
P is the (candidate) substring $arg2P is the (candidate) substring, the value of the $arg2substring argument to the function.
Q is the (candidate) containing string, the value of the $arg1value argument to the function.
The boundary condition B is satisfied at the start and end of a string, and between any two characters that belong to different collation units (“collation elements” in the language of [UTS #10]). It is not satisfied between two characters that belong to the same collation unit.

It is possible to define collations that do not have the ability to decompose a string into units suitable for substring matching. An argument to a function defined in this section may be a URI that identifies a collation that is able to compare two strings, but that does not have the capability to split the string into collation units. Such a collation may cause the function to fail, or to give unexpected results, or it may be rejected as an unsuitable argument. The ability to decompose strings into collation units is an implementation-defined property of the collation. The fn:collation-available function can be used to ask whether a particular collation has this property.

Function	Meaning
`fn:contains`	Returns `true` if the string `$value` contains `$substring` as a substring, taking collations into account.
`fn:starts-with`	Returns `true` if the string `$value` contains `$substring` as a leading substring, taking collations into account.
`fn:ends-with`	Returns `true` if the string `$value` contains `$substring` as a trailing substring, taking collations into account.
`fn:substring-before`	Returns the part of `$value` that precedes the first occurrence of `$substring`, taking collations into account.
`fn:substring-after`	Returns the part of `$value` that follows the first occurrence of `$substring`, taking collations into account.

5.5.1 fn:contains

Summary

Returns true if the string $value contains $substring as a substring, taking collations into account.

Signature

`fn:contains`(
`$value`	`as` `xs:string?`,
`$substring`	`as` `xs:string?`,
`$collation`	`as` `xs:string?`	`:=` `fn:default-collation()`
) `as` `xs:boolean`

Properties

The two-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations.

The three-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations, and static base URI.

Rules

If $value or $substring is the empty sequence, or contains only ignorable collation units, it is interpreted as the zero-length string.

If $substring is the zero-length string, then the function returns true.

If $value is the zero-length string, the function returns false.

The collation used by this function is determined according to the rules in 5.3.7 Choosing a collation.

The function returns an xs:boolean indicating whether or not $value contains (at the beginning, at the end, or anywhere within) at least one sequence of collation units that provides a minimal matchminimal match to the collation units in $substring, according to the collation that is used.

Note:

Minimal match is defined in [UTS #10].

Error Conditions

A dynamic error may be raised [err:FOCH0004] if the specified collation does not support collation units.

Examples

Variables
let $coll := "http://www.w3.org/2013/collation/UCA?lang=en;alternate=blanked;strength=primary"

Expression	Result
The collation used in some of these examples, `$coll`, is a collation in which both `-` and `*` are ignorable collation units.
“Ignorable collation unit” is equivalent to “ignorable collation element” in [UTS #10].
`contains("tattoo", "t")`	true()
`contains("tattoo", "ttt")`	false()
`contains("", ())`	true() (The first rule is applied, followed by the second rule.)
contains( "abcdefghi", "-d-e-f-", $coll )	true()
contains( "abcdefghi*", "d-ef-", $coll )	true()
contains( "abcd**e---f--*ghi", "def", $coll )	true()
contains( (), "--**----", $coll )	true() (The second argument contains only ignorable collation units and is equivalent to the zero-length string.)

5.5.2 fn:starts-with

Summary

Returns true if the string $value contains $substring as a leading substring, taking collations into account.

Signature

`fn:starts-with`(
`$value`	`as` `xs:string?`,
`$substring`	`as` `xs:string?`,
`$collation`	`as` `xs:string?`	`:=` `fn:default-collation()`
) `as` `xs:boolean`

Properties

The two-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations.

The three-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations, and static base URI.

Rules

If $value or $substring is the empty sequence, or contains only ignorable collation units, it is interpreted as the zero-length string.

If $substring is the zero-length string, then the function returns true. If $value is the zero-length string and $substring is not the zero-length string, then the function returns false.

The collation used by this function is determined according to the rules in 5.3.7 Choosing a collation.

The function returns an xs:boolean indicating whether or not $value starts with a sequence of collation units that provides a matchmatch to the collation units of $substring according to the collation that is used.

Note:

Match is defined in [UTS #10].

Error Conditions

A dynamic error may be raised [err:FOCH0004] if the specified collation does not support collation units.

Examples

Variables
let $coll := "http://www.w3.org/2013/collation/UCA?lang=en;alternate=blanked;strength=primary"

Expression	Result
The collation used in some of these examples, `$coll`, is a collation in which both `-` and `*` are ignorable collation units.
“Ignorable collation unit” is equivalent to “ignorable collation element” in [UTS #10].
`starts-with("tattoo", "tat")`	true()
`starts-with("tattoo", "att")`	false()
`starts-with((), ())`	true()
starts-with( "abcdefghi", "-a-b-c-", $coll )	true()
starts-with( "abcdefghi*", "a-bc-", $coll )	true()
starts-with( "abcd**e---f--*ghi", "abcdef", $coll )	true()
starts-with( (), "--**----", $coll )	true() (The second argument contains only ignorable collation units and is equivalent to the zero-length string.)
starts-with( "-abcdefghi", "-abc", $coll )	true()

5.5.3 fn:ends-with

Summary

Returns true if the string $value contains $substring as a trailing substring, taking collations into account.

Signature

`fn:ends-with`(
`$value`	`as` `xs:string?`,
`$substring`	`as` `xs:string?`,
`$collation`	`as` `xs:string?`	`:=` `fn:default-collation()`
) `as` `xs:boolean`

Properties

The two-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations.

The three-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations, and static base URI.

Rules

If $value or $substring is the empty sequence, or contains only ignorable collation units, it is interpreted as the zero-length string.

If $substring is the zero-length string, then the function returns true. If $value is the zero-length string and the value of $substring is not the zero-length string, then the function returns false.

The collation used by this function is determined according to the rules in 5.3.7 Choosing a collation.

The function returns an xs:boolean indicating whether or not $value ends with a sequence of collation units that provides a matchmatch to the collation units of $substring according to the collation that is used.

Note:

Match is defined in [UTS #10].

Error Conditions

A dynamic error may be raised [err:FOCH0004] if the specified collation does not support collation units.

Examples

Variables
let $coll := "http://www.w3.org/2013/collation/UCA?lang=en;alternate=blanked;strength=primary"

Expression	Result
The collation used in some of these examples, `$coll`, is a collation in which both `-` and `*` are ignorable collation units.
“Ignorable collation unit” is equivalent to “ignorable collation element” in [UTS #10].
`ends-with("tattoo", "tattoo")`	true()
`ends-with("tattoo", "atto")`	false()
`ends-with((), ())`	true()
ends-with( "abcdefghi", "-g-h-i-", $coll )	true()
ends-with( "abcd**e---f--*ghi", "defghi", $coll )	true()
ends-with( "abcd**e---f--*ghi", "defghi", $coll )	true()
ends-with( (), "--**----", $coll )	true() (The second argument contains only ignorable collation units and is equivalent to the zero-length string.)
ends-with( "abcdefghi", "ghi-", $coll )	true()

5.5.4 fn:substring-before

Summary

Returns the part of $value that precedes the first occurrence of $substring, taking collations into account.

Signature

`fn:substring-before`(
`$value`	`as` `xs:string?`,
`$substring`	`as` `xs:string?`,
`$collation`	`as` `xs:string?`	`:=` `fn:default-collation()`
) `as` `xs:string`

Properties

The two-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations.

The three-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations, and static base URI.

Rules

If $value or $substring is the empty sequence, or contains only ignorable collation units, it is interpreted as the zero-length string.

If $substring is the zero-length string, then the function returns the zero-length string.

If $value does not contain a string that is equal to $substring, then the function returns the zero-length string.

The collation used by this function is determined according to the rules in 5.3.7 Choosing a collation.

The function returns the substring of $value that precedes in $value the first occurrence of a sequence of collation units that provides a minimal matchminimal match to the collation units of $substring according to the collation that is used.

Note:

Minimal match is defined in [UTS #10].

Error Conditions

A dynamic error may be raised [err:FOCH0004] if the specified collation does not support collation units.

Examples

Variables
let $coll := "http://www.w3.org/2013/collation/UCA?lang=en;alternate=blanked;strength=primary"

Expression	Result
The collation used in some of these examples, `$coll`, is a collation in which both `-` and `*` are ignorable collation units.
“Ignorable collation unit” is equivalent to “ignorable collation element” in [UTS #10].
`substring-before("tattoo", "attoo")`	"t"
`substring-before("tattoo", "tatto")`	""
`substring-before((), ())`	""
substring-before( "abcdefghi", "--d-e-", $coll )	"abc"
substring-before( "abc--d-e-fghi", "--d-e-", $coll )	"abc--"
substring-before( "abcdefghi", "**cde", $coll )	"ab"
substring-before( "Eureka!", "--**----", $coll )	"" (The second argument contains only ignorable collation units and is equivalent to the zero-length string.)

5.5.5 fn:substring-after

Summary

Returns the part of $value that follows the first occurrence of $substring, taking collations into account.

Signature

`fn:substring-after`(
`$value`	`as` `xs:string?`,
`$substring`	`as` `xs:string?`,
`$collation`	`as` `xs:string?`	`:=` `fn:default-collation()`
) `as` `xs:string`

Properties

The two-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations.

The three-argument form of this function is deterministic, context-dependent, and focus-independent. It depends on collations, and static base URI.

Rules

If $value or $substring is the empty sequence, or contains only ignorable collation units, it is interpreted as the zero-length string.

If $substring is the zero-length string, then the function returns the value of $value.

If $value does not contain a string that is equal to $substring, then the function returns the zero-length string.

The collation used by this function is determined according to the rules in 5.3.7 Choosing a collation.

The function returns the substring of $value that follows in $value the first occurrence of a sequence of collation units that provides a minimal matchminimal match to the collation units of $substring according to the collation that is used.

Note:

Minimal match is defined in [UTS #10].

Error Conditions

A dynamic error may be raised [err:FOCH0004] if the specified collation does not support collation units.

Examples

Variables
let $coll := "http://www.w3.org/2013/collation/UCA?lang=en;alternate=blanked;strength=primary"

Expression	Result
The collation used in some of these examples, `$coll`, is a collation in which both `-` and `*` are ignorable collation units.
“Ignorable collation unit” is equivalent to “ignorable collation element” in [UTS #10].
`substring-after("tattoo", "tat")`	"too"
`substring-after("tattoo", "tattoo")`	""
`substring-after((), ())`	""
substring-after( "abcdefghi", "--d-e-", $coll )	"fghi"
substring-after( "abc--d-e-fghi", "--d-e-", $coll )	"-fghi"
substring-after( "abcdefghi", "cde*", $coll )	"fghi*"
substring-after( "Eureka!", "--**----", $coll )	"Eureka!" (The second argument contains only ignorable collation units and is equivalent to the zero-length string.)

A References

A.1 Normative references

Character Model for the World Wide Web 1.0: Fundamentals: Character Model for the World Wide Web 1.0: Fundamentals, Martin J. Dürst, François Yergeau, et. al., Editors. World Wide Web Consortium, 15 February 2015. This version is http://www.w3.org/TR/2005/REC-charmod-20050215/. The latest version is available at https://www.w3.org/TR/charmod/.
HTML: Living Standard: HTML: Living Standard. WHATWG, 18 November 2022.
DOM: Living Standard: DOM: Living Standard. WHATWG, 26 October 2022.
IANA Timezone Database: The tz timezone database, available at http://www.iana.org/time-zones. It is implementation-defined which version of the database is used.
IEEE 754-2019: IEEE. IEEE Standard for Floating-Point Arithmetic.
IEEE 1003.1-2024: Open Group Base Specifications Issue 8. IEEE, 2024.
IEEE 802-3: “IEEE Standard for Ethernet,” in IEEE Std 802.3-2022 (Revision of IEEE Std 802.3-2018). 29 July 2022. doi: 10.1109/IEEESTD.2022.9844436.
ISO 3166-1: ISO (International Organization for Standardization) Codes for the representation of names of countries and their subdivisions - Part 1: Country codes ISO 3166-1:2013.
ISO 8601: ISO (International Organization for Standardization). Representations of dates and times. Third edition, 2004-12-01. ISO 8601:2004(E). Available from: http://www.iso.org/".
ISO 10967: ISO (International Organization for Standardization). ISO/IEC 10967-1:2012, Information technology—Language Independent Arithmetic—Part 1: Integer and floating point arithmetic [Geneva]: International Organization for Standardization, 2012. Available from: http://www.iso.org/.
ISO 15924: ISO (International Organization for Standardization) Information and documentation — Codes for the representation of names of scripts ISO 15924:2004, January 2004.
ISO 15924 Register: Unicode Consortium. Codes for the representation of names of scripts — Alphabetical list of four-letter script codes. See http://www.unicode.org/iso15924/iso15924-codes.html. Retrieved February 2013; continually updated.
Legacy extended IRIs for XML resource identification: Legacy extended IRIs for XML resource identification. Henry S. Thomson, Richard Tobin, and Norman Walsh (eds), World Wide Web Consortium. 3 November 2008. Available at http://www.w3.org/TR/leiri/.
RFC 1321: IETF. RFC 1321: The MD5 Message-Digest Algorithm. Available at: http://www.ietf.org/rfc/rfc1321.txt.
RFC 2376: IETF. RFC 2376: XML Media Types. Available at: http://www.ietf.org/rfc/rfc2376.txt.
RFC 3986: IETF. RFC 3986: Uniform Resource Identifiers (URI): Generic Syntax. Available at: http://www.ietf.org/rfc/rfc3986.txt.
RFC 3987: IETF. RFC 3987: Internationalized Resource Identifiers (IRIs). Available at: http://www.ietf.org/rfc/rfc3987.txt.
RFC 4180: IETF. RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files. Available at: http://www.ietf.org/rfc/rfc4180.txt.
RFC 6151: IETF. RFC 6151: Updated Security Considerations for the MD5 Message-Digest and the HMAC-MD5 Algorithms Available at: http://www.ietf.org/rfc/rfc6151.txt.
RFC 7159: IETF. RFC 7159: The Javascript Object Notation (JSON) Data Interchange Format Available at: http://www.rfc-editor.org/rfc/rfc7159.txt.
RFC 7303: H. Thompson and C. Lilley. XML Media Types. IETF RFC 7303. See http://www.ietf.org/rfc/rfc7303.txt.
FIPS 180-4: National Institute of Standards and Technology. Secure Hash Standard (SHS). FIPS PUB 180-4. August 2015. See http://dx.doi.org/10.6028/NIST.FIPS.180-4.
UAX #15: Unicode Standard Annex #15: Unicode Normalization Forms. Ed. Mark Davis and Ken Whistler, Unicode Consortium. The current version is 9.0.016.0.0, dated 2016-02-242024-08-14. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr15/.
UAX #29: Unicode Standard Annex #29: Unicode Text Segmentation. Ed. Josh Hadley, Unicode Consortium. The current version is 15.1.016.0.0, dated 20232024-08-1628. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr29/.
The Unicode Standard: The Unicode Consortium, Reading, MA, Addison-Wesley, 2016. The Unicode Standard as updated from time to time by the publication of new versions. See http://www.unicode.org/standard/versions/ for the latest version and additional information on versions of the standard and of the Unicode Character Database. The version of Unicode to be used is implementation-defined, but implementations are recommended to use the latest Unicode version; currently, Version 9.0.0.
UTS #10: Unicode Technical Standard #10: Unicode Collation Algorithm. Ed. Mark Davis and Ken Whistler, Unicode Consortium. The current version is 9.0.016.0.0, dated 2016-05-182024-08-22. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr10/.
UTS #35: Unicode Technical Standard #35: Unicode Locale Data Markup Language. Ed Mark Davis et al, Unicode Consortium. The current version is 2947, dated 20162025-03-1511. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr35/.
XML Infoset: World Wide Web Consortium. XML Information Set (Second Edition). W3C Recommendation 4 February 2004. See http://www.w3.org/TR/xml-infoset/
Extensible Markup Language (XML) 1.0 (Fifth Edition): Extensible Markup Language (XML) 1.0 (Fifth Edition), Tim Bray, Jean Paoli, Michael Sperberg-McQueen, et. al., Editors. World Wide Web Consortium, 26 Nov 2008. This version is http://www.w3.org/TR/2008/REC-xml-20081126/. The latest version is available at http://www.w3.org/TR/xml.
Extensible Markup Language (XML) 1.1 Recommendation: Extensible Markup Language (XML) 1.1 (Second Edition), Tim Bray, Jean Paoli, Michael Sperberg-McQueen, et. al., Editors. World Wide Web Consortium, 16 Aug 2006. This version is http://www.w3.org/TR/2006/REC-xml11-20060816. The latest version is available at http://www.w3.org/TR/xml11/.
XML Path Language (XPath) Version 1.0: XML Path Language (XPath) Version 1.0, James Clark and Steven DeRose, Editors. World Wide Web Consortium, 16 Nov 1999. This version is http://www.w3.org/TR/1999/REC-xpath-19991116. The latest version is available at http://www.w3.org/TR/xpath.
XML Path Language (XPath) 4.0: CITATION: T.B.D.
XSL Transformations (XSLT) Version 1.0: XSL Transformations (XSLT) Version 1.0, James Clark, Editor. World Wide Web Consortium, 16 November 1999. This version is https://www.w3.org/TR/1999/REC-xslt-19991116/. The latest version is available at https://www.w3.org/TR/xslt/.
XSL Transformations (XSLT) Version 2.0: XSL Transformations (XSLT) Version 2.0 (Second Edition), Michael Kay, Editor. World Wide Web Consortium, 23 January 2007. This version is https://www.w3.org/TR/2007/REC-xslt20-20070123/. The latest version is available at https://www.w3.org/TR/xslt20/.
XSL Transformations (XSLT) Version 4.0: CITATION: T.B.D.
XQuery and XPath Data Model (XDM) 3.0: XQuery and XPath Data Model (XDM) 3.0, Norman Walsh, Anders Berglund, John Snelson, Editors. World Wide Web Consortium, 08 April 2014. This version is https://www.w3.org/TR/2014/REC-xpath-datamodel-30-20140408/. The latest version is available at https://www.w3.org/TR/xpath-datamodel-30/.
XQuery and XPath Data Model (XDM) 3.1: XQuery and XPath Data Model (XDM) 3.1, Norman Walsh, John Snelson, Andrew Coleman, Editors. World Wide Web Consortium, 21 March 2017. This version is https://www.w3.org/TR/2017/REC-xpath-datamodel-31-20170321/. The latest version is available at https://www.w3.org/TR/xpath-datamodel-31/.
XQuery and XPath Data Model (XDM) 4.0: XQuery and XPath Data Model (XDM) 4.0, XSLT Extensions Community Group, World Wide Web Consortium.
XSLT and XQuery Serialization 3.1: XSLT and XQuery Serialization 3.1, Andrew Coleman and Michael Sperberg-McQueen, Editors. World Wide Web Consortium, 21 March 2017. This version is https://www.w3.org/TR/2017/REC-xslt-xquery-serialization-31-20170321/. The latest version is available at https://www.w3.org/TR/xslt-xquery-serialization-31/.
XQuery 1.0 and XPath 2.0 Formal Semantics: XQuery 1.0 and XPath 2.0 Formal Semantics (Second Edition), Jérôme Siméon, Denise Draper, Peter Frankhauser, et. al., Editors. World Wide Web Consortium, 14 December 2010. This version is https://www.w3.org/TR/2010/REC-xquery-semantics-20101214/. The latest version is available at https://www.w3.org/TR/xquery-semantics/.
XQuery 4.0: An XML Query Language: CITATION: T.B.D.
XML Inclusions (XInclude) Version 1.0 (Second Edition): XML Inclusions (XInclude) Version 1.0 (Second Edition), Jonathan Marsh, David Orchard, and Daniel Veillard, Editors. World Wide Web Consortium, 15 Nov 2006. This version is http://www.w3.org/TR/2006/REC-xinclude-20061115/. The latest version is available at http://www.w3.org/TR/xinclude/.
XML Schema Part 1: Structures Second Edition: XML Schema Part 1: Structures Second Edition, Oct 28 2004. Available at: http://www.w3.org/TR/xmlschema-1/
XML Schema Part 2: Datatypes Second Edition: XML Schema Part 2: Datatypes Second Edition, Oct. 28 2004. Available at: http://www.w3.org/TR/xmlschema-2/
XSD 1.1 Part 1: W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures, Sandy Gao, Michael Sperberg-McQueen, Henry Thompson, et. al., Editors. World Wide Web Consortium, 05 Apr 2012. This version is http://www.w3.org/TR/2012/REC-xmlschema11-1-20120405/. The latest version is available at http://www.w3.org/TR/xmlschema11-1/.
XSD 1.1 Part 2: W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes, David Peterson, Sandy Gao, Ashok Malhotra, et. al., Editors. World Wide Web Consortium, 05 Apr 2012. This version is http://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/. The latest version is available at http://www.w3.org/TR/xmlschema11-2/.
Namespaces in XML: Namespaces in XML 1.0 (Third Edition), Tim Bray, Dave Hollander, Andrew Layman, et. al., Editors. World Wide Web Consortium, 08 Dec 2009. This version is http://www.w3.org/TR/2009/REC-xml-names-20091208/. The latest version is available at http://www.w3.org/TR/xml-names/.
Invisible XML: Invisible XML Specification, Steven Pemberton, editor. World Wide Web Consortium, 20 June 2020. This version is https://invisiblexml.org/1.0/. The latest version is available at https://invisiblexml.org/current/.

E Glossary (Non-Normative)

atomic item: An atomic item is a pair (T, D) where T (the type annotation) is an atomic type, and D (the datum) is a point in the value space of T.
capturing subexpression: A left parenthesis is recognized as a capturing left parenthesis provided it is not immediately followed by ? or * (see below), is not within a character group (square brackets), and is not escaped with a backslash. The sub-expression enclosed by a capturing left parenthesis and its matching right parenthesis is referred to as a capturing subexpression.
character: A character is an instance of the Char^XML production of [Extensible Markup Language (XML) 1.0 (Fifth Edition)].
character position: A string of length N has N+1character positions: one immediately before each character in the string, and one after the last character. In interfaces where character positions are exposed, they are numbered from 1 to N+1.
codepoint: A codepoint is an integer assigned to a character by the Unicode consortium, or reserved for future assignment to a character.
collation: A collation is an algorithm that determines, for any two given strings S₁ and S₂, whether S₁ is less than, equal to, or greater than S₂. In this specification, a collation is identified by an absolute URI.
collation unit: The term collation unit as used in this specification is equivalent to the term collation element used in [UTS #10].
context-dependent: A function definition^XP may have the property of being context-dependent: the result of such a function depends on the values of properties in the static and dynamic evaluation context of the caller as well as on the actual supplied arguments (if any). A function definition may be context-dependent for some arities in its arity range, and context-independent for others: for example fn:name#0 is context-dependent while fn:name#1 is context-independent.
context-independent: A function definition^XP that is not context-dependent is called context-independent.
CSV: The term comma separated values or CSV refers to a wide variety of plain-text tabular data formats with fields and records separated by standard character delimiters (often, but not invariably, commas).
date formatting function: The three functions fn:format-dateTime, fn:format-date, and fn:format-time are referred to collectively as the date formatting functions.
datum: The datum of an atomic item is a point in the value space of its type, which is also a point in the value space of the primitive type from which that type is derived.
deterministic: A function that is guaranteed to produce identical results from repeated calls within a single execution scope if the explicit and implicit arguments are identical is referred to as deterministic.
digit family: The decimal digit family of a decimal format is the sequence of ten digits with consecutive Unicode codepoints starting with the character that is the value of the zero-digit^XP31 property.
disjoint matching segments: The disjoint matching segments obtained by applying a regular expression R to a string S in the presence of a set of flags F are the segments of S that match R (using flags F), after elimination of overlapping segments.
end position: The end position of a segment is the start position of the segment plus its length.
execution scope: An execution scope is a sequence of calls to the function library during which certain aspects of the state are required to remain invariant. For example, two calls to fn:current-dateTime within the same execution scope will return the same result. The execution scope is defined by the host language that invokes the function library.
expanded-QName: An expanded-QName is a value in the value space of the xs:QName datatype as defined in the XDM data model (see [XQuery and XPath Data Model (XDM) 4.0]): that is, a triple containing namespace prefix (optional), namespace URI (optional), and local name. Two expanded QNames are equal if the namespace URIs are the same (or both absent) and the local names are the same. The prefix plays no part in the comparison, but is used only if the expanded QName needs to be converted back to a string.
focus-dependent: A function is focus-dependent if its result depends on the focus^XP31 (that is, the context item, position, or size) of the caller.
focus-dependent: A function that is not focus-dependent is called focus-independent.
Gregorian: The eight primitive types xs:dateTime, xs:date, xs:time, xs:gYearMonth, xs:gYear, xs:gMonthDay, xs:gMonth, xs:gDay are referred to collectively as the Gregorian types.
higher-order: Functions that accept functions among their arguments, or that return functions in their result, are described in this specification as higher-order functions.
identical: Two values $V1 and $V2 are defined to be identical if they contain the same number of items and the items are pairwise identical. Two items are identical if and only if one of the following conditions applies:
implementation-defined: Where behavior is described as implementation-defined, variations between processors are permitted, but a conformant implementation must document the choices it has made.
implementation-dependent: Where behavior is described as implementation-dependent, variations between processors are permitted, and conformant implementations are not required to document the choices they have made.
key-value pair map: A key-value pair map is a map containing two entries, one (with the key "key") containing the key part of a key value pair, the other (with the key "value") containing the value part of a key value pair.
map: A map consists of a sequence of entries, also known as key-value pairs. Each entry comprises a key which is an arbitrary atomic item, and an arbitrary sequence called the associated value.
match: The term match is used in the sense of definition DS2 from [UTS #10].
minimal match: The term minimal match is used in the sense of definition DS4 from [UTS #10].
nondeterministic: A function that is not deterministic is referred to as nondeterministic.
nondeterministic with respect to ordering: Some functions (such as fn:distinct-values, fn:unordered, map:keys, and map:for-each) produce results in an implementation-defined or implementation-dependent order. In such cases two calls with the same arguments are not guaranteed to produce the results in the same order. These functions are said to be nondeterministic with respect to ordering.
optional digit character: The optional digit character is the character that is the value of the digit^XP31 property.
option parameter conventions: Functions that take an options parameter adopt common conventions on how the options are used. These are referred to as the option parameter conventions. These rules apply only to functions that explicitly refer to them.
permitted character: A permitted character is one within the repertoire accepted by the implementation.
picture string: The formatting of a number is controlled by a picture string. The picture string is a sequence of characters, in which the characters assigned to the properties decimal-separator^XP31 , exponent-separator^XP31, grouping-separator^XP31, digit^XP31, and pattern-separator^XP31 and the members of the decimal digit family, are classified as active characters, and all other characters (including the values of the properties percent^XP31 and per-mille^XP31) are classified as passive characters.
primitive type: A primitive type is one of the 19 primitive atomic types defined in Section 3.2 Primitive datatypes^XS2 of [XML Schema Part 2: Datatypes Second Edition], or the type xs:untypedAtomic defined in [XQuery and XPath Data Model (XDM) 4.0].
same key: Within a map, no two entries have the same key. Two atomic items K1 and K2 are the same key for this purpose if the function call fn:atomic-equal($K1, $K2) returns true.
segment: A segment of a string S is a sequence of zero or more contiguous characters starting at a given character position within S.
single-entry map: A single-entry map is a map containing a single entry.
string: A string is a sequence of zero or more characters, or equivalently, a value in the value space of the xs:string datatype.
type annotation: The type annotation of an atomic item is the most specific atomic type that it is an instance of (it is also an instance of every type from which that type is derived).
Unicode codepoint collation: The collation URI http://www.w3.org/2005/xpath-functions/collation/codepoint identifies a collation which must be recognized by every implementation: it is referred to as the Unicode codepoint collation (not to be confused with the Unicode collation algorithm).
URI: Within this specification, the term URI refers to Universal Resource Identifiers as defined in [RFC 3986] and extended in [RFC 3987] with a new name IRI. The term URI Reference, unless otherwise stated, refers to a string in the lexical space of the xs:anyURI datatype as defined in [XML Schema Part 2: Datatypes Second Edition].
variadic: The function fn:concat is defined to be variadic: it accepts any number of arguments. No other function has this property.

G Implementation-defined features (Non-Normative)

It is implementation-defined which version of Unicode is supported, but it is recommended that the most recent version of Unicode be used. (See Conformance.)
It is implementation-defined whether the type system is based on XML Schema 1.0 or XML Schema 1.1. (See Conformance.)
It is implementation-defined whether definitions that rely on XML (for example, the set of valid XML characters) should use the definitions in XML 1.0 or XML 1.1. (See Conformance.)
Implementations may attach an implementation-defined meaning to options in the map that are not described in this specification. These options should use values of type xs:QName as the option names, using an appropriate namespace. (See Options.)
It is implementation-defined which version of [The Unicode Standard] is supported, but it is recommended that the most recent version of Unicode be used. (See Strings, characters, and codepoints.)
[Definition] Some functions (such as fn:distinct-values, fn:unordered, map:keys, and map:for-each) produce results in an implementation-defined or implementation-dependent order. In such cases two calls with the same arguments are not guaranteed to produce the results in the same order. These functions are said to be nondeterministic with respect to ordering. (See Properties of functions.)
Where the results of a function are described as being (to a greater or lesser extent) implementation-defined or implementation-dependent, this does not by itself remove the requirement that the results should be deterministic: that is, that repeated calls with the same explicit and implicit arguments must return identical results. (See Properties of functions.)
In addition, the values of $input, typically serialized and converted to an xs:string, and $label (if supplied and non-empty) may be output to an implementation-defined destination. (See fn:trace.)
Consider a situation in which a user wants to investigate the actual value passed to a function. Assume that in a particular execution, $v is an xs:decimal with value 124.84. Writing fn:trace($v, 'the value of $v is:') will return $v. The processor may output "124.84" and "the value of $v is:" to an implementation-defined destination. (See fn:trace.)
Similar to fn:trace, the values of $input, typically serialized and converted to an xs:string, and $label (if supplied and non-empty) may be output to an implementation-defined destination. (See fn:message.)
They may provide an implementation-defined mechanism that allows users to choose between raising an error and returning a result that is modulo the largest representable integer value. See [ISO 10967]. (See Arithmetic operators on numeric values.)
For xs:decimal values, let N be the number of digits of precision supported by the implementation, and let M (M <= N) be the minimum limit on the number of digits required for conformance (18 digits for XSD 1.0, 16 digits for XSD 1.1). Then for addition, subtraction, and multiplication operations, the returned result should be accurate to N digits of precision, and for division and modulus operations, the returned result should be accurate to at least M digits of precision. The actual precision is implementation-defined. If the number of digits in the mathematical result exceeds the number of digits that the implementation retains for that operation, the result is truncated or rounded in an implementation-defined manner. (See Arithmetic operators on numeric values.)
The [IEEE 754-2019] specification also describes handling of two exception conditions called divideByZero and invalidOperation. The IEEE divideByZero exception is raised not only by a direct attempt to divide by zero, but also by operations such as log(0). The IEEE invalidOperation exception is raised by attempts to call a function with an argument that is outside the function’s domain (for example, sqrt(-1) or log(-1)). Although IEEE defines these as exceptions, it also defines “default non-stop exception handling” in which the operation returns a defined result, typically positive or negative infinity, or NaN. With this function library, these IEEE exceptions do not cause a dynamic error at the application level; rather they result in the relevant function or operator returning the defined non-error result. The underlying IEEE exception may be notified to the application or to the user by some implementation-defined warning condition, but the observable effect on an application using the functions and operators defined in this specification is simply to return the defined result (typically -INF, +INF, or NaN) with no error. (See Arithmetic operators on numeric values.)
The [IEEE 754-2019] specification distinguishes two NaN values: a quiet NaN and a signaling NaN. These two values are not distinguishable in the XDM model: the value spaces of xs:float and xs:double each include only a single NaN value. This does not prevent the implementation distinguishing them internally, and triggering different implementation-defined warning conditions, but such distinctions do not affect the observable behavior of an application using the functions and operators defined in this specification. (See Arithmetic operators on numeric values.)
The implementation may adopt a different algorithm provided that it is equivalent to this formulation in all cases where implementation-dependent or implementation-defined behavior does not affect the outcome, for example, the implementation-defined precision of the result of xs:decimal division. (See op:numeric-integer-divide.)
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation. (See fn:round.)
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation. (See fn:round-half-to-even.)
There may be implementation-defined limits on the precision available. If the requested $precision is outside this range, it should be adjusted to the nearest value supported by the implementation. (See fn:divide-decimals.)
XSD 1.1 allows the string +INF as a representation of positive infinity; XSD 1.0 does not. It is implementation-defined whether XSD 1.1 is supported. (See fn:number.)
Any other format token, which indicates a numbering sequence in which that token represents the number 1 (one) (but see the note below). It is implementation-defined which numbering sequences, additional to those listed above, are supported. If an implementation does not support a numbering sequence represented by the given token, it must use a format token of 1. (See fn:format-integer.)
For all format tokens other than a digit-pattern, there may be implementation-defined lower and upper bounds on the range of numbers that can be formatted using this format token; indeed, for some numbering sequences there may be intrinsic limits. For example, the format token U+2460 (CIRCLED DIGIT ONE, ①) has a range imposed by the Unicode character repertoire — zero to 20 in Unicode versions prior to 3.2, or zero to 50 in subsequent versions. For the numbering sequences described above any upper bound imposed by the implementation must not be less than 1000 (one thousand) and any lower bound must not be greater than 1. Numbers that fall outside this range must be formatted using the format token 1. (See fn:format-integer.)
The set of languages for which numbering is supported is implementation-defined. If the $language argument is absent, or is set to an empty sequence, or is invalid, or is not a language supported by the implementation, then the number is formatted using the default language from the dynamic context. (See fn:format-integer.)
...either a or t, to indicate alphabetic or traditional numbering respectively, the default being implementation-defined. (See fn:format-integer.)
The string of characters between the parentheses, if present, is used to select between other possible variations of cardinal or ordinal numbering sequences. The interpretation of this string is implementation-defined. No error occurs if the implementation does not define any interpretation for the defined string. (See fn:format-integer.)
It is implementation-defined what combinations of values of the format token, the language, and the cardinal/ordinal modifier are supported. If ordinal numbering is not supported for the combination of the format token, the language, and the string appearing in parentheses, the request is ignored and cardinal numbers are generated instead. (See fn:format-integer.)
The use of the a or t modifier disambiguates between numbering sequences that use letters. In many languages there are two commonly used numbering sequences that use letters. One numbering sequence assigns numeric values to letters in alphabetic sequence, and the other assigns numeric values to each letter in some other manner traditional in that language. In English, these would correspond to the numbering sequences specified by the format tokens a and i. In some languages, the first member of each sequence is the same, and so the format token alone would be ambiguous. In the absence of the a or t modifier, the default is implementation-defined. (See fn:format-integer.)
The static context provides a set of decimal formats. One of the decimal formats is unnamed, the others (if any) are identified by a QName. There is always an unnamed decimal format available, but its contents are implementation-defined. (See Defining a decimal format.)
IEEE states that the preferred quantum is language-defined. In this specification, it is implementation-defined. (See Trigonometric and exponential functions.)
IEEE defines various rounding algorithms for inexact results, and states that the choice of rounding direction, and the mechanisms for influencing this choice, are language-defined. In this specification, the rounding direction and any mechanisms for influencing it are implementation-defined. (See Trigonometric and exponential functions.)
The map returned by the fn:random-number-generator function may contain additional entries beyond those specified here, but it must match the record type defined above. The meaning of any additional entries is implementation-defined. To avoid conflict with any future version of this specification, the keys of any such entries should start with an underscore character. (See fn:random-number-generator.)
It is no longer automatically an error if the input contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. (See fn:codepoints-to-string.)
If two query parameters use the same keyword then the last one wins. If a query parameter uses a keyword or value which is not defined in this specification then the meaning is implementation-defined. If the implementation recognizes the meaning of the keyword and value then it should interpret it accordingly; if it does not recognize the keyword or value then if the fallback parameter is present with the value no it should reject the collation as unsupported, otherwise it should ignore the unrecognized parameter. (See The Unicode Collation Algorithm.)
The following query parameters are defined. If any parameter is absent, the default is implementation-defined except where otherwise stated. The meaning given for each parameter is non-normative; the normative specification is found in [UTS #35]. (See The Unicode Collation Algorithm.)
Because the set of collations that are supported is implementation-defined, an implementation has the option to support all collation URIs, in which case it will never raise this error. (See Choosing a collation.)
The properties available are as defined for the Unicode Collation Algorithm (see 5.3.4 The Unicode Collation Algorithm). Additional implementation-defined properties may be specified as described in the rules for UCA collation URIs. (See fn:collation.)
It is possible to define collations that do not have the ability to generate collation keys. Supplying such a collation will cause the function to fail. The ability to generate collation keys is an implementation-defined property of the collation. (See fn:collation-key.)
Conforming implementations must support normalization form NFC and may support normalization forms NFD, NFKC, NFKD, and FULLY-NORMALIZED. They may also support other normalization forms with implementation-defined semantics. (See fn:normalize-unicode.)
It is implementation-defined which version of Unicode (and therefore, of the normalization algorithms and their underlying data) is supported by the implementation. See [UAX #15] for details of the stability policy regarding changes to the normalization rules in future versions of Unicode. If the input string contains codepoints that are unassigned in the relevant version of Unicode, or for which no normalization rules are defined, the fn:normalize-unicode function leaves such codepoints unchanged. If the implementation supports the requested normalization form then it must be able to handle every input string without raising an error. (See fn:normalize-unicode.)
It is possible to define collations that do not have the ability to decompose a string into units suitable for substring matching. An argument to a function defined in this section may be a URI that identifies a collation that is able to compare two strings, but that does not have the capability to split the string into collation units. Such a collation may cause the function to fail, or to give unexpected results, or it may be rejected as an unsuitable argument. The ability to decompose strings into collation units is an implementation-defined property of the collation. The fn:collation-available function can be used to ask whether a particular collation has this property. (See Functions based on substring matching.)
The result of the function will always be such that validation against this schema would succeed. However, it is implementation-defined whether the result is typed or untyped, that is, whether the elements and attributes in the returned tree have type annotations that reflect the result of validating against this schema. (See fn:analyze-string.)
Some URI schemes are hierarchical and some are non-hierarchical. Implementations must treat the following schemes as non-hierarchical: jar, mailto, news, tag, tel, and urn. Whether additional schemes are known to be non-hierarchical implementation-defined. If a scheme is not known to be non-hierarchical, it must be treated as hierarchical. (See Parsing and building URIs.)
If the omit-default-ports option is true, the port is discarded and set to the empty sequence if the port number is the same as the default port for the given scheme. Implementations should recognize the default ports for http (80), https (443), ftp (21), and ssh (22). Exactly which ports are recognized is implementation-defined. (See fn:parse-uri.)
If the omit-default-ports option is true then the $port is set to the empty sequence if the port number is the same as the default port for the given scheme. Implementations should recognize the default ports for http (80), https (443), ftp (21), and ssh (22). Exactly which ports are recognized is implementation-defined. (See fn:build-uri.)
Processors may support a greater range and/or precision. The limits are implementation-defined. (See Limits and precision.)
Similarly, a processor may be unable accurately to represent the result of dividing a duration by 2, or multiplying a duration by 0.5. A processor that limits the precision of the seconds component of duration values must deliver a result that is as close as possible to the mathematically precise result, given these limits; if two values are equally close, the one that is chosen is implementation-defined. (See Limits and precision.)
All conforming processors must support year values in the range 1 to 9999, and a minimum fractional second precision of 1 millisecond or three digits (i.e., s.sss). However, processors may set larger implementation-defined limits on the maximum number of digits they support in these two situations. Processors may also choose to support the year 0 and years with negative values. The results of operations on dates that cross the year 0 are implementation-defined. (See Limits and precision.)
Similarly, a processor that limits the precision of the seconds component of date and time or duration values may need to deliver a rounded result for arithmetic operations. Such a processor must deliver a result that is as close as possible to the mathematically precise result, given these limits: if two values are equally close, the one that is chosen is implementation-defined. (See Limits and precision.)
...the format token n, N, or Nn, indicating that the value of the component is to be output by name, in lower-case, upper-case, or title-case respectively. Components that can be output by name include (but are not limited to) months, days of the week, timezones, and eras. If the processor cannot output these components by name for the chosen calendar and language then it must use an implementation-defined fallback representation. (See The picture string.)
...indicates alphabetic or traditional numbering respectively, the default being implementation-defined. This has the same meaning as in the second argument of fn:format-integer. (See The picture string.)
The sequence of characters in the (adjusted) first presentation modifier is reversed (for example, 999'### becomes ###'999). If the result is not a valid decimal digit pattern, then the output is implementation-defined. (See Formatting Fractional Seconds.)
The output for these components is entirely implementation-defined. The default presentation modifier for these components is n, indicating that they are output as names (or conventional abbreviations), and the chosen names will in many cases depend on the chosen language: see 10.8.4.8 The language, calendar, and place arguments. (See Formatting Other Components.)
The set of languages, calendars, and places that are supported in the date formatting functions is implementation-defined. When any of these arguments is omitted or is an empty sequence, an implementation-defined default value is used. (See The language, calendar, and place arguments.)
The choice of the names and abbreviations used in any given language is implementation-defined. For example, one implementation might abbreviate July as Jul while another uses Jly. In German, one implementation might represent Saturday as Samstag while another uses Sonnabend. Implementations may provide mechanisms allowing users to control such choices. (See The language, calendar, and place arguments.)
The choice of the names and abbreviations used in any given language for calendar units such as days of the week and months of the year is implementation-defined. (See The language, calendar, and place arguments.)
The calendar value if present must be a valid EQName (dynamic error: [err:FOFD1340]). If it is a lexical QName then it is expanded into an expanded QName using the statically known namespaces; if it has no prefix then it represents an expanded-QName in no namespace. If the expanded QName is in no namespace, then it must identify a calendar with a designator specified below (dynamic error: [err:FOFD1340]). If the expanded QName is in a namespace then it identifies the calendar in an implementation-defined way. (See The language, calendar, and place arguments.)
At least one of the above calendars must be supported. It is implementation-defined which calendars are supported. (See The language, calendar, and place arguments.)
The requirement to deliver a deterministic result has performance implications, and for this reason implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call of the function must either return a deterministic result or must raise a dynamic error [err:FODC0003]. (See fn:doc.)
Various aspects of this processing are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:... (See fn:doc.)
It is implementation-defined whether DTD validation and/or schema validation is applied to the source document. (See fn:doc.)
The effect of a fragment identifier in the supplied URI is implementation-defined. One possible interpretation is to treat the fragment identifier as an ID attribute value, and to return a document node having the element with the selected ID value as its only child. (See fn:doc.)
By default, this function is deterministic. This means that repeated calls on the function with the same argument will return the same result. However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call to this function must either return a deterministic result or must raise a dynamic error [err:FODC0003]. (See fn:collection.)
By default, this function is deterministic. This means that repeated calls on the function with the same argument will return the same result. However, for performance reasons, implementations may provide a user option to evaluate the function without a guarantee of determinism. The manner in which any such option is provided is implementation-defined. If the user has not selected such an option, a call to this function must either return a deterministic result or must raise a dynamic error [err:FODC0003]. (See fn:uri-collection.)
It is no longer automatically an error if the resource (after decoding) contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. (See fn:unparsed-text.)
The processor may use implementation-defined heuristics to determine the likely encoding. (See fn:unparsed-text.)
The fact that the resolution of URIs is defined by a mapping in the dynamic context means that in effect, various aspects of the behavior of this function are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:... (See fn:unparsed-text.)
The fact that the resolution of URIs is defined by a mapping in the dynamic context means that in effect, various aspects of the behavior of this function are implementation-defined. Implementations may provide external configuration options that allow any aspect of the processing to be controlled by the user. In particular:... (See fn:unparsed-binary.)
The collation used for matching names is implementation-defined, but must be the same as the collation used to ensure that the names of all environment variables are unique. (See fn:environment-variable.)
Except to the extent defined by these options, the precise process used to construct the XDM instance is implementation-defined. In particular, it is implementation-defined whether an XML 1.0 or XML 1.1 parser is used. (See fn:parse-xml.)
Options set in $options may be supplemented or modified based on configuration options defined externally using implementation-defined mechanisms. (See fn:parse-xml.)
Except as explicitly defined, the precise process used to construct the XDM instance is implementation-defined. In particular, it is implementation-defined whether an XML 1.0 or XML 1.1 parser is used. (See fn:parse-xml-fragment.)
If the second argument is omitted, or is supplied in the form of an output:serialization-parameters element, then the values of any serialization parameters that are not explicitly specified is implementation-defined, and may depend on the context. (See fn:serialize.)
A list of target namespaces identifying schema components to be used for validation. The way in which the processor locates schema components for the specified target namespaces is implementation-defined. A zero-length string denotes a no-namespace schema.... (See fn:xsd-validator.)
Set to the decimal value 1.0 or 1.1 to indicate which version of XSD is to be used. The default is implementation-defined. A processor may use a later version of XSD than the version requested, but must not use an earlier version.... (See fn:xsd-validator.)
The XSD specification allows a schema to be used for validation even when it contains unresolved references to absent schema components. It is implementation-defined whether this function allows the schema to be incomplete in this way. For example, some processors might allow validation using a schema in which an element declaration contains a reference to a type declaration that is not present in the schema, provided that the element declaration is never needed in the course of a particular validation episodes. (See fn:xsd-validator.)
...error-details as map(*)*. This field is present only when (a) the option return-error-details was set to true, and (b) the supplied document was found to be invalid. The value is a sequence of maps, each containing details of one invalidity that was found. The precise details of the invalidities are implementation-defined, but they may include the following fields, if the information is available:... (See fn:xsd-validator.)
Because the [DOM: Living Standard] and [HTML: Living Standard] are not fixed, it is implementation-defined which versions are used. (See XDM Mapping from HTML DOM Nodes.)
If an implementation allows these nodes to be passed in via an API or similar mechanism, their behaviour is implementation-defined. (See XDM Mapping from HTML DOM Nodes.)
If the local name contains a character that is not a valid XML NameStartChar or NameChar, then an implementation-defined replacement string is used. The result must be a valid NCName. (See node-name Accessor.)
If the local name contains a character that is not a valid XML NameStartChar or NameChar, then an implementation-defined replacement string is used. The result must be a valid NCName. (See node-name Accessor.)
The default behaviour is implementation-defined. (See fn:parse-html.)
The input may contain deviations from the grammar of [RFC 7159], which are handled in an implementation-defined way. (Note: some popular extensions include allowing quotes on keys to be omitted, allowing a comma to appear after the last item in an array, allowing leading zeroes in numbers, and allowing control characters such as tab and newline to be present in unescaped form.) Since the extensions accepted are implementation-defined, an error may be raised [err:FOJS0001] if the input does not conform to the grammar. (See fn:parse-json.)
The supplied function is called to process the string value of any JSON number in the input. By default, numbers are processed by converting to xs:double using the XPath casting rules. Supplying the value xs:decimal#1 will instead convert to xs:decimal (which potentially retains more precision, but disallows exponential notation), while supplying a function that casts to (xs:decimal | xs:double) will treat the value as xs:decimal if there is no exponent, or as xs:double otherwise. Supplying the value fn:identity#1 causes the value to be retained unchanged as an xs:untypedAtomic. If the liberal option is false (the default), then the supplied number-parser is called if and only if the value conforms to the JSON grammar for numbers (for example, a leading plus sign and redundant leading zeroes are not allowed). If the liberal option is true then it is also called if the value conforms to an implementation-defined extension of this grammar. (See fn:parse-json.)
It is no longer automatically an error if the input contains a codepoint that is not valid in XML. Instead, the codepoint must be a permitted character. The set of permitted characters is implementation-defined, but it is recommended that all Unicode characters should be accepted. (See fn:json-doc.)
The input may contain deviations from the grammar of [RFC 7159], which are handled in an implementation-defined way. (Note: some popular extensions include allowing quotes on keys to be omitted, allowing a comma to appear after the last item in an array, allowing leading zeroes in numbers, and allowing control characters such as tab and newline to be present in unescaped form.) Since the extensions accepted are implementation-defined, an error may be raised (see below) if the input does not conform to the grammar. (See fn:json-to-xml.)
Default: Implementation-defined. (See fn:json-to-xml.)
Indicates that the resulting XDM instance must be typed; that is, the element and attribute nodes must carry the type annotations that result from validation against the schema given at D.2 Schema for the result of fn:json-to-xml, or against an implementation-defined schema if the liberal option has the value true. (See fn:json-to-xml.)
The result of the function will always be such that validation against this schema would succeed. However, it is implementation-defined whether the result is typed or untyped, that is, whether the elements and attributes in the returned tree have type annotations that reflect the result of validating against this schema. (See fn:csv-to-xml.)
Additional, implementation-defined options may be available, for example, to control aspects of the XML serialization, to specify the grammar start symbol, or to produce output formats other than XML. (See fn:invisible-xml.)
If the arguments to fn:function-lookup identify a function that is present in the static context of the function call, the function will always return the same function that a static reference to this function would bind to. If there is no such function in the static context, then the results depend on what is present in the dynamic context, which is implementation-defined. (See fn:function-lookup.)
Default: The version given in the prolog of the library module; or implementation-defined if this is absent. (See fn:load-xquery-module.)
A sequence of URIs (in the form of xs:string values) which may be used or ignored in an implementation-defined way.... (See fn:load-xquery-module.)
Values for vendor-defined configuration options for the XQuery processor used to process the request. The key is the name of an option, expressed as a QName: the namespace URI of the QName should be a URI controlled by the vendor of the XQuery processor. The meaning of the associated value is implementation-defined. Implementations should ignore options whose names are in an unrecognized namespace. The option parameter conventions do not apply to this contained map.... (See fn:load-xquery-module.)
It is implementation-defined whether constructs in the library module are evaluated in the same execution scope as the calling module. (See fn:load-xquery-module.)
The library module that is loaded may import schema declarations using an import schema declaration. It is implementation-defined whether schema components in the in-scope schema definitions of the calling module are automatically added to the in-scope schema definitions of the dynamically loaded module. The in-scope schema definitions of the calling and called modules must be consistent, according to the rules defined in Section 2.2.5 Consistency Constraints ^XQ31. (See fn:load-xquery-module.)
Default: Implementation-defined. (See fn:transform.)
Default: Implementation-defined. (See fn:transform.)
If the implementation provides a way of writing or invoking functions with side-effects, this post-processing function might be used to save a copy of the result document to persistent storage. For example, if the implementation provides access to the EXPath File library [EXPath], then a serialized document might be written to filestore by calling the file:write function. Similar mechanisms might be used to issue an HTTP POST request that posts the result to an HTTP server, or to send the document to an email recipient. The semantics of calling functions with side-effects are entirely implementation-defined. (See fn:transform.)
Calls to fn:transform can potentially have side-effects even in the absence of the post-processing option, because the XSLT specification allows a stylesheet to invoke extension functions that have side-effects. The semantics in this case are implementation-defined. (See fn:transform.)
A string intended to be used as the static base URI of the principal stylesheet module. This value must be used if no other static base URI is available. If the supplied stylesheet already has a base URI (which will generally be the case if the stylesheet is supplied using stylesheet-node or stylesheet-location) then it is implementation-defined whether this parameter has any effect. If the value is a relative reference, it is resolved against the executable base URI^XP of the fn:transform function call.... (See fn:transform.)
Values for vendor-defined configuration options for the XSLT processor used to process the request. The key is the name of an option, expressed as a QName: the namespace URI of the QName should be a URI controlled by the vendor of the XSLT processor. The meaning of the associated value is implementation-defined. Implementations should ignore options whose names are in an unrecognized namespace. Default is an empty map.... (See fn:transform.)
It is implementation-defined whether the XSLT transformation is executed within the same execution scope as the calling code. (See fn:transform.)
XSLT 1.0 does not define any error codes, so this is the likely outcome with an XSLT 1.0 processor. XSLT 2.0 and 3.0 do define error codes, but some APIs do not expose them. If multiple errors are signaled by the transformation (which is most likely to happen with static errors) then the error code should where possible be that of one of these errors, chosen arbitrarily; the processor may make details of additional errors available to the application in an implementation-defined way. (See fn:transform.)
It is to some extent implementation-defined whether two maps or arrays have the same function identity. Processors should ensure as a minimum that when a variable $m is bound to a map or array, calling jtree($m) more than once (with the same variable reference) will deliver the same JNode each time. (See fn:jtree.)
If ST is xs:float or xs:double, then TV is the xs:decimal value, within the set of xs:decimal values that the implementation is capable of representing, that is numerically closest to SV. If two values are equally close, then the one that is closest to zero is chosen. If SV is too large to be accommodated as an xs:decimal, (see [XML Schema Part 2: Datatypes Second Edition] for implementation-defined limits on numeric values) a dynamic error is raised [err:FOCA0001]. If SV is one of the special xs:float or xs:double values NaN, INF, or -INF, a dynamic error is raised [err:FOCA0002]. (See Casting to xs:decimal.)
In casting to xs:decimal or to a type derived from xs:decimal, if the value is not too large or too small but nevertheless cannot be represented accurately with the number of decimal digits available to the implementation, the implementation may round to the nearest representable value or may raise a dynamic error [err:FOCA0006]. The choice of rounding algorithm and the choice between rounding and error behavior is implementation-defined. (See Casting from xs:string and xs:untypedAtomic.)
If ST is xs:decimal, xs:float or xs:double, then TV is SV with the fractional part discarded and the value converted to xs:integer. Thus, casting 3.1456 returns 3 while -17.89 returns -17. Casting 3.124E1 returns 31. If SV is too large to be accommodated as an integer, (see [XML Schema Part 2: Datatypes Second Edition] for implementation-defined limits on numeric values) a dynamic error is raised [err:FOCA0003]. If SV is one of the special xs:float or xs:double values NaN, INF, or -INF, a dynamic error is raised [err:FOCA0002]. (See Casting to xs:integer.)
The tz timezone database, available at http://www.iana.org/time-zones. It is implementation-defined which version of the database is used. (See IANA Timezone Database.)
Unicode Standard Annex #15: Unicode Normalization Forms. Ed. Mark Davis and Ken Whistler, Unicode Consortium. The current version is 9.0.016.0.0, dated 2016-02-242024-08-14. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr15/. (See UAX #15.)
Unicode Standard Annex #29: Unicode Text Segmentation. Ed. Josh Hadley, Unicode Consortium. The current version is 15.1.016.0.0, dated 20232024-08-1628. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr29/. (See UAX #29.)
The Unicode Consortium, Reading, MA, Addison-Wesley, 2016. The Unicode Standard as updated from time to time by the publication of new versions. See http://www.unicode.org/standard/versions/ for the latest version and additional information on versions of the standard and of the Unicode Character Database. The version of Unicode to be used is implementation-defined, but implementations are recommended to use the latest Unicode version; currently, Version 9.0.0. (See The Unicode Standard.)
Unicode Technical Standard #10: Unicode Collation Algorithm. Ed. Mark Davis and Ken Whistler, Unicode Consortium. The current version is 9.0.016.0.0, dated 2016-05-182024-08-22. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr10/. (See UTS #10.)
Unicode Technical Standard #35: Unicode Locale Data Markup Language. Ed Mark Davis et al, Unicode Consortium. The current version is 2947, dated 20162025-03-1511. As with [The Unicode Standard], the version to be used is implementation-defined. Available at: http://www.unicode.org/reports/tr35/. (See UTS #35.)

XPath and XQuery Functions and Operators 4.0

W3C Editor's Draft 23 February 2026

Abstract

Status of this Document

Dedication

5 Processing strings

5.5 Functions based on substring matching

5.5.1 fn:contains

5.5.2 fn:starts-with

5.5.3 fn:ends-with

5.5.4 fn:substring-before

5.5.5 fn:substring-after

A References

A.1 Normative references

E Glossary (Non-Normative)

G Implementation-defined features (Non-Normative)