View Old View New View Both View Only Previous Next

This draft contains only sections that have differences from the version that it modified.

W3C

XPath and XQuery Functions and Operators 4.0

W3C Editor's Draft 23 February 2026

This version:
https://qt4cg.org/specifications/xpath-functions-40/
Latest version of XPath and XQuery Functions and Operators 4.0:
https://qt4cg.org/specifications/xpath-functions-40/
Most recent Recommendation of XPath and XQuery Functions and Operators:
https://www.w3.org/TR/2017/REC-xpath-functions-31-20170321/
Editor:
Michael Kay, Saxonica <http://www.saxonica.com/>

Please check the errata for any errors or issues reported since publication.

See also translations.

This document is also available in these non-normative formats: Specification in XML format and XML function catalog.


Abstract

This document defines constructor functions, operators, and functions on the datatypes defined in [XML Schema Part 2: Datatypes Second Edition] and the datatypes defined in [XQuery and XPath Data Model (XDM) 3.1]. It also defines functions and operators on nodes and node sequences as defined in the [XQuery and XPath Data Model (XDM) 3.1]. These functions and operators are defined for use in [XML Path Language (XPath) 4.0] and [XQuery 4.0: An XML Query Language] and [XSL Transformations (XSLT) Version 4.0] and other related XML standards. The signatures and summaries of functions defined in this document are available at: http://www.w3.org/2005/xpath-functions/.

A summary of changes since version 3.1 is provided at G Changes since 3.1.

Status of this Document

This version of the specification is work in progress. It is produced by the QT4 Working Group, officially the W3C XSLT 4.0 Extensions Community Group. Individual functions specified in the document may be at different stages of review, reflected in their History notes. Comments are invited, in the form of GitHub issues at https://github.com/qt4cg/qtspecs.

Dedication

The publications of this community group are dedicated to our co-chair, Michael Sperberg-McQueen (1954–2024).


18 Processing maps

Maps were introduced as a new datatype in XDM 3.1. This section describes functions that operate on maps.

A map is a kind of item.

[Definition] A map consists of a sequence of entries, also known as key-value pairs. Each entry comprises a key which is an arbitrary atomic item, and an arbitrary sequence called the associated value.

[Definition] Within a map, no two entries have the same key. Two atomic items K1 and K2 are the same key for this purpose if the function call fn:atomic-equal($K1, $K2) returns true.

It is not necessary that all the keys in a map should be of the same type (for example, they can include a mixture of integers and strings).

Maps are immutable, and have no identity separate from their content. For example, the map:remove function returns a map that differs from the supplied map by the omission (typically) of one entry, but the supplied map is not changed by the operation. Two calls on map:remove with the same arguments return maps that are indistinguishable from each other; there is no way of asking whether these are “the same map”.

A map can also be viewed as a function from keys to associated values. To achieve this, a map is also a function item. The function corresponding to the map has the signature function($key as xs:anyAtomicValue) as item()*. Calling the function has the same effect as calling the map:get function: the expression $map($key) returns the same result as get($map, $key). For example, if $books-by-isbn is a map whose keys are ISBNs and whose assocated values are book elements, then the expression $books-by-isbn("0470192747") returns the book element with the given ISBN. The fact that a map is a function item allows it to be passed as an argument to higher-order functions that expect a function item as one of their arguments.

18.5 Converting elements to maps

Changes in 4.0  

  1. A new function fn:element-to-map is provided for converting XDM trees to maps suitable for serialization as JSON. Unlike the fn:xml-to-json function retained from 3.1, this can handle arbitrary XML as input.   [Issues 528 1645 1646 1647 1648 1658 1658 1797 PRs 1575 1906 19 November 2024]

The fn:element-to-map function converts a tree rooted at an XML element node to a corresponding tree of maps, in a form suitable for serialization as JSON. This section describes the mappings used by this functionIn effect it provides a mechanism for converting XML to JSON.

This section describes the mappings used by this function.

This mapping is designed with three objectives:

  • It should be possible to represent any XML element as a map suitable for JSON serialization.

  • The resulting JSON should be intuitive and easy to use.

  • The JSON should be consistent and stable: small variations in the input should not result in large variations in the output.

Achieving all three objectives requires design compromises. It also requires sacrificing some other desiderata. In consequence:

  • The conversion is not lossless (see 18.5.8 Lost XDM Information for details).

  • The conversion is not streamable.

  • The results are not necessarily compatible with those produced by other popular libraries.

The requirement for consistency and stability is particularly challenging. An element such as <name>John</name> maps naturally to the map { "name": "John" }; but adding an attribute (so it becomes <name role="first">John</name>) then requires an incompatible change in the JSON representation. The format could be made extensible by converting <name>John</name> to { "name": {"#content":"John"} } and <name role="first">John</name> to { "name": { "@role":"first", "#content":"John" } }, but this imposes unwanted complexity on the simplest cases. The solution adopted is threefold:

  • The function makes use of schema information where available, so it considers not just the structure of an individual element instance, but the rules governing the element type.

  • It is possible to analyze a corpus of XML documents to develop a conversion plan, which can then be applied consistently to individual input documents, whether or not these documents were present in the corpus. The conversion plan can be serialized and subsequently reused, so that it can be applied to input documents that might not have existed at the time the conversion plan was formulated.

  • Alternatively, the function can make use of schema information where available, so it considers not just the structure of an individual element instance, but the rules governing the element type.

  • It is possible to override the choices made by the system, and explicitly specify the format to be used for elements or attributes having a given name.

18.5.1 Element Layouts

The key challenge in mapping XML to JSON is in deciding how element content is to be represented. To illustrate the variety of mappings that are possible, the following table lists some examples of typical XML elements and their JSON equivalents:

XML elementJSON equivalent
<hr/>
"hr": ""
<date-of-birth>2023-05-18</date-of-birth>
"date-of-birth": "2023-05-18"
<box width="5" height="10"/>
"box": { "@width": "5", "@height": "10" }
<label id="t41">Warning!</label>
"label": { "@id": "t41", "#content": "Warning!" }
<box>
    <width>5</width>
    <height>10</height>
</box>
"box": {
    "width": 5, 
    "height": 10
}
<polygon>
    <point x="0" y="0"/>
    <point x="1" y="0"/>
    <point x="1" y="1"/>
    <point x="0" y="1"/>
</polygon>
"polygon": [
    { "x": 0, "y": 0 }, 
    { "x": 1, "y": 0 }, 
    { "x": 1, "y": 1 }, 
    { "x": 0, "y": 1 }
]

This specification defines a number of named mappings, called layouts, and allows the layout for a particular element to be selected in a number of different ways:

  • The layout to be used for a specific elements can be explicitly selected by supplying a conversion plan as input to the fn:element-to-map function.

  • It is possible to construct a conversion plan by analyzing a corpus of documents using the fn:element-to-map-plan function.

  • It is also possible to construct a conversion plan manually, or to modify the conversion plan produced by the fn:element-to-map-plan function before use.

  • In the absence of an explicit conversion plan, if the data has been schema-validated, the layout is inferred from the content model for the element type as defined in the schema.

  • When the data is untyped and no specific layout has been selected, a default layout is chosen based on the properties of the individual element instance.

The advantage of using schema information is that it gives a consistent representation for all elements of a particular type, even if they vary in content: for example if an element type allows optional attributes, the JSON representation will be consistent between those elements that have attributes and those without. In the absence of a schema, consistency can be achieved by supplying a conversion plan that applies uniformly to multiple documents.

The different layouts available are defined in the following sections. For each layout there is a table showing:

  • Layout name: the name to be used to select this layout in a conversion plan supplied to the fn:element-to-map function.

  • Usage: the situations for which this layout is designed.

  • Example input: an example of a typical element for which this layout is appropriate, shown as serialized XML.

  • Example output: the result of converting this example, shown as serialized JSON. The result is always shown as a singleton map, which is how it will appear when the layout is used for the top-level elements supplied in the $elements argument; when used to convert a descendant element, the corresponding key-value pair may appear as part of a larger map, depending on the layout chosen for its parent element..

    Note:

    The fn:element-to-map function produces a map as its result, but it is convenient to illustrate the form of the map by showing the effect of serializing the map as JSON.

  • Mapping rules: The rules for mapping the XML element to an XDM map representation.

  • Mapping for nilled elements: special rules that apply to an element having the attribute xsi:nil="true". These rules only apply if the element has been schema-validated.

  • Errors: situations where the layout cannot be used, and where attempting to use it will fail. For example, the empty layout cannot be used for an element that is not empty. In such a situation the recovery action is as follows, in order:

    1. Attributes are dropped, and if this is sufficient to enable the layout to be used, then the element is converted without its attributes.

    2. If the type of an element or attribute in the conversion plan is given as boolean or numeric, but the actual value of the element or attribute is not castable to xs:boolean or xs:numeric respectively, then the node is output ignoring the type property, that is, as an instance of xs:untypedAtomic.

    3. If the conversion plan supplies a fallback layout (an entry with key "*"), then the fallback layout is used.

    4. The element-to-map function fails with a dynamic error.

  • Notes: General observations, especially concerning what information is retained by this mapping and what information is lost.

The rules for selecting the layout for a particular element are given later, in 18.5.5 Selecting an element layout.

Note that it is possible to use any layout for any element. If an inappropriate layout is chosen for a particular element (for example, empty layout for an element that is not empty), then in general the conversion falls back to using xml layout, which outputs the element as a string containing serialized XML. This ensures that no information is lost, and might act as an indication that the conversion plan needs to be revised. This will never happen when converting a document that was used as input to the conversion plan. There is an exception to this rule: if a layout is chosen that does not allow attributes, then any attribute that is present on the element is discarded.

Note that it is possible to request any layout for any element. If an inappropriate layout is chosen for a particular element (for example, empty layout for an element that is not empty), then the rules for that layout specify what happens. It is possible to specify a fallback layout for use when the selected layout fails: this will typically be a layout such as xml or mixed that can handle any element.

Note:

Acknowledgements for this categorization: see [Goessner]. Although Goessner's categories have been used, the actualdetailed mappings vary from his proposal.

18.5.1.3 Layout: Simple Content
Layout name

simple

Usage

Intended for XML elements that have simple content and no attributes.

Example input
<date>2023-05-30</date>
Example output
{ "date": "2023-05-30" }
Mapping rules

The element is atomized and the resulting atomized value is handled as described in 18.5.7 Element and Attribute Content. If atomization fails, the element is treated as if it were untyped.

Note:

If the element is untyped, the atomized value will always appear in the result as an instance of xs:untypedAtomic.

Mapping for nilled elements

The content is represented by the value xs:QName("fn:null"), which is serialized as the JSON value null. For example. <name xsi:nil="true"/> becomes { "name": xs:QName("fn:null") }.

NotesErrors

Attributes are discarded, along with child comment nodes and processing instructions; whitespace is retained.

If any child elements are present, this layout fails.

18.5.1.7 Layout: Record
Layout name

record

Usage

Intended primarily for XML elements that contain multiple child elements, with different names, where the order of the child elements is not significant. Also used for elements whose content is a single element node child. The element may or may not have attributes.

Example input (1)
<employee id="x">
  <date-of-birth>1984-03-20</date>
  <location>Germany</location>
  <position>Janitor</position>
</employee>
Example output (1)
{ "employee": { "@id": "x", 
                "date-of-birth": "1984-03-20", 
                "location": "Germany", 
                "position": "Janitor"
              }
}
Example input (2)
<employee id="x">
  <date-of-birth>1984-03-20</date>
  <location>Germany</location>
  <position>Janitor</position>
  <position>Gardener</position>
</employee>
Example output (2)
{ "employee": { "@id": "x", 
                "date-of-birth": "1984-03-20", 
                "location": "Germany", 
                "position": [ "Janitor", "Gardener" ]
              }
}
Mapping rules

If the element has non-whitespace text node children, then it is output as if mixed layout were chosen (see 18.5.1.9 Layout: Mixed). This is fallback behavior for use when this layout is chosen inappropriately.

In other cases, theThe content is represented by a map containing one entry for each attribute in the XML element, plus one entry for each child element, whose value is formatted according to the rules for that element.

If two or more child elements have the same name, or names that are represented by the same string (taking into account the chosen name-format option), then they are combined into a single entry containing all the corresponding values as members of an array. For example, if there are two children <author>Mills</author> and <author>Boon</author>, they are combined into a single entry "author": ["Mills", "Boon"].

The entry orderDM of the resulting map first contains entries derived from attributes (in unpredictable order), then entries derived from child elements, in order of first appearance.

Mapping for nilled elements

Alongside any attributes, the value includes the additional entry "#content": xs:QName("fn:null"), which will be serialized in JSON as "#content": null.

Errors

Although this layout is intended primarily for elements whose children are unordered and uniquely named, it is also viable to use it in cases where elements can repeat, so long as order relative to other elements is not significant.

Comments, processing instructions, and whitespace text nodes in the content are discarded.

This layout fails if there are non-whitespace text node children.

18.5.1.8 Layout: Sequence
layout name

sequence

Usage

Intended for XML elements that contain a sequence of element node children, whose order is significant. The element may or may not have attributes.

Example input
<section id="x">
   <head>Introduction</head>
   <p>Lorem ipsum.</p>
   <p>Dolor sit amet.</p>
</section>
Example output
{ "section": [
      { "@id": "x" },                        
      { "head": "Introduction" },
      { "p": "Lorem ipsum" },
      { "p": "Dolor sit amet" }
   ] }
{ "section": [
      { "@id": "x" },                        
      { "head": "Introduction" },
      { "p": "Lorem ipsum." },
      { "p": "Dolor sit amet." }
   ] }
Mapping rules

The mapping rules are identical to the rules for the mixed layout (see 18.5.1.9 Layout: Mixed) except that whitespace-only text nodes are discarded.

Mapping for nilled elements

A nilled element is indicated by including an additional map { "#content" : xs:QName("fn:null")} in the array, after any attributes.

Errors

This layout fails if there are non-whitespace text node children.

18.5.2 Creating a conversion plan

It is possible to create a conversion plan by analyzing a collection of sample input documents. The function fn:element-to-map-plan is supplied with a collection of nodes (which will normally be element or document nodes), and it examines all the elements within the trees rooted at these nodes, looking for commonalities among like-named elements.

The output of this function (the conversion plan) holds information about how elements and attributes (identified by name) should be converted.

For elements, the information is primarily a mapping from element names (xs:QName instances) to layout names. In some cases additional information beyond the layout name is also included. The conversion plan is represented as an XDM map, whose structure is defined in this specification. A conversion plan can be constructed directly, or the plan produced by calling fn:element-to-map-plan can be modified before use. The plan can be serialized using the JSON output method and reloaded so that the same plan is used whenever a query or stylesheet is executed.

The fn:element-to-map-plan function selects a layout for a given element name N by applying the following rules:

  1. Let $EE be the set of all elements named N, specifically $input/descendant-or-self::*[node-name(.) eq N].

  2. If empty($EE/(* | text()) (that is, if there are no child elements or text nodes) then:

    1. If empty($EE/@*) (that is, if there are no attributes), then the layout is empty: see 18.5.1.1 Layout: Empty Content.

    2. Otherwise, the layout is empty-plus: see 18.5.1.2 Layout: Empty Content with Attributes.

  3. If empty($EE/*) (that is, if there are no child elements) then:

    1. If empty($EE/@*) (that is, if there are no attributes) then the layout is simple: see 18.5.1.3 Layout: Simple Content.

    2. Otherwise, simple-plus: see 18.5.1.4 Layout: Simple Content with Attributes.

    3. The plan also includes the property type. If all the elements in $EE are castable as xs:boolean, then the type is boolean; otherwise, if all the elements in $EE as castable as xs:numeric, then the type is numeric; otherwise, the type is string.

  4. If empty($EE/text()[normalize-space()]) (that is, there are no text node children other than whitespace), then:

    1. If all-equal($EE/*/node-name()) and exists($EE/*[2]) (that is, if all child elements have the same name, and at least one element has multiple child elements), then:

      1. If empty($EE/@*) (that is, if there are no attributes) then list: see 18.5.1.5 Layout: Simple List.

      2. Otherwise, list-plus: see 18.5.1.6 Layout: List with Attributes.

    2. If every $e in $EE satisfies all-different($e/*/node-name()) (that is, the child elements are uniquely named among their siblings), then record: see 18.5.1.7 Layout: Record.

    3. Otherwise, sequence: see 18.5.1.8 Layout: Sequence.

  5. Otherwise, mixed: see 18.5.1.9 Layout: Mixed.

For elements with simple content (more specifically, elements where the chosen layout is simple or simple-plus) the conversion plan also includes an entry indicating whether the content should be represented as a boolean, a number, or a string. If every instance of the element name has content that is castable to xs:boolean, the plan indicates "type": "boolean". If every instance of the element name has content that is castable to xs:numeric, the plan indicates "type": "numeric". In other cases, the plan indicates "type": "string"; however, this may be omitted because it is the default.

For attributes, the conversion plan identifies whether attributes (with a given name) should be represented as booleans, numbers, or strings; alternatively, it may indicate that attributes with a given name should be discarded. For every distinct attribute name present in the input, an entry is output associating the attribute name with one of the types boolean, or numeric, or; the entry is generally omitted when the values are to be represented as strings, though the type can also be given explicitly as string. An entry with type boolean is outputgenerated for an attribute name if all the attributes with that name are castable as xs:boolean. Similarly, an entry with type numeric is outputgenerated for an attribute name if all the attributes with that name are castable as xs:numeric. In other case, the attributes are treated as being of type string. Entries with type string may be omitted, since that is the default. The entry for an attribute may also specify "type": "skip" to indicate that the attribute should be discarded.

TheA plan that is produced by analyzing a corpus of input documents can then be customized by the user if required. For example:

  • If simple layout is chosen for a particular element name, but it is known that some documents might be encountered in which that element has attributes, then simple might be changed to simple-plus.

  • If record layout is chosen for a particular element name, but it is known that some documents might be encountered in which child elements can be repeated, then record might be changed to sequence.

  • If thea generated plan determines that phone numbers should be represented as numbers, it might be modified to treat them as strings.

The conversion plan is a map of type map(xs:string, record(*)). The key is an element or attribute name, representing element names in the form Q{uri}local, and attributes in the form @Q{uri}localnotation: in both cases the Q{uri} part must be omitted for a name in no namespace. Strings are used as keys in preference to xs:QName instances to allow the plan to be serialized in JSON format.

A more detailed definition of the structure is given in 18.5.3 Structure of the conversion plan.

A small example might be (in its JSON serialization):

{ "bookList": { "layout": "list", "child": "book" },
  "book": { "layout": "record" },
  "author: { "layout": "simple" }
  "title: { "layout": "simple" }
  "price: { "layout": "simple", "type": "numeric" }
  "hardback: { "layout": "simple", "type": "boolean" }
  "@out-of-print: { "type": "boolean" }
}
{ "bookList": { "layout": "list", "child": "book" },
  "book": { "layout": "record" },
  "author": { "layout": "simple" },
  "title": { "layout": "simple" },
  "price": { "layout": "simple", "type": "numeric" },
  "hardback": { "layout": "simple", "type": "boolean" },
  "@out-of-print": { "type": "boolean" },
  "@Q{http://www.w3.org/2001/XMLSchema-instance}nil": { "type": "skip" }
}

18.5.3 Structure of the conversion plan

This section provides a definition of the structure of the conversion plan that is output by the fn:element-to-map-plan function, and used as input to the fn:element-to-map function.

The structure is defined by the following item type:

map( xs:string,
     record ( layout? as enum("empty", "empty-plus", "simple", "simple-plus",
                              "list", list-plus",
                              "record", "sequence", "mixed",
                              "xml", "error", "deep-skip"),
              child? as xs:string,
              type? as enum("boolean", "numeric")
              * )
)
map( xs:string,
     record ( layout? as enum("empty", "empty-plus", "simple", "simple-plus",
                              "list", list-plus",
                              "record", "sequence", "mixed",
                              "xml", "error", "deep-skip"),
              child? as xs:string,
              type? as enum("boolean", "numeric", "string", "skip")
              * )
)

The rules relating to this structure are as follows:

  1. The keys of the map entries are strings of the form:

    1. local-name representing the name of an element in no namespace.

    2. Q{uri}local-name representing the name of an element in a namespace.

    3. * representing a fallback rule for use with elements where either (a) there is no more specific rule, or (b) processing using the selected layout fails.

    4. @local-name representing the name of an attribute in no namespace.

    5. @Q{uri}local-name representing the name of an attribute in a namespace.

    Any entries whose keys are not in this format will be ignored.

  2. The layout entry is present if and only if the key represents the name of an element.

  3. The child entry is present if and only if the value of layout is list or list-plus. It represents an element name in the format local-name for a name in no namespace, or Q{uri}local-name for a name in a namespace.

  4. The type entry is present if, and only if, one of the following conditions applies:

    1. The key represents the name of an attribute.

    2. The layout is simple or simple-plus. In this case the value must not be "skip".

If additional entries (beyond those described above) are present in any of the maps, they are ignored, provided that the map is coercible to the given type definition.

The fallback rule (with key "*") is used to process elements whose name has no specific entry, and also for elements where normal processing fails (for example when the selected layout is "empty", but the element has children). If no fallback rule is present then "error" is assumed: this causes processing to fail with a dynamic error. The fallback rule will typically set the layout property to one of the following:

  • error: this causes the function to fail with a dynamic error.

  • deep-skip: this causes the element and its content (recursively) to be omitted from the output.

  • mixed: this causes the element to be output using layout mixed

  • xml: this outputs the element to be output using layout xml, which represents the content as a string containing serialized XML.

However, any layout may be used as the fallback; if it fails, the error is unrecoverable.

18.5.5 Selecting an element layout

The various layouts available for elements are described in 18.5.1 Element Layouts. This section defines the rules for selecting an element layout for a given element E. The rules are applied in order.

  1. If an explicit layout is given for the element name of E in the conversion plan supplied to the fn:element-to-map function call, then that layout is used. If the selected layout is deep-skip, then no output is produced for that element. If the selected layout is error, then the function fails with a dynamic error. If the selected layout fails for the element instance, then the fallback layout (identified with the key "*" in the conversion plan) is used; in the absence of a fallback layout, the function fails with a dynamic error.

  2. Otherwise (when no explicit layout is given for E), if the type annotation of the element is something other than xs:untyped or xs:anyType, then a schema-determined layout is used as defined in 18.5.4 Schema-based conversion.

  3. Otherwise, if the conversion plan supplies a fallback layout (identified with the key "*"), then the fallback layout is used.

  4. If the above rules do not provide a layout for E, then a conversion plan for E is determined by applying the rules in 18.5.2 Creating a conversion plan, with an input that contains the single element E and no others. (Only the element E itself is considered, not its descendants.)

18.5.6 Element and Attribute Names

The name-format option gives control over how element and attribute names are formatted. There are four options:

  • The default option (which may be explicitly requested by specifying "name-format": "default") retains the namespace URI for any element that is either (a) the top-level element of a tree being converted, or (b) has a name that is in a different namespace from its parent element. In such cases the format "Q{uri}local" is used. For other elements, the name is output using the local part of the element name alone. For attributes, the form "Q{uri}local" is used for an attribute in a namespace, and the local name alone is used for a no-namespace name. Namespace prefixes are not retained.

  • The option eqname uses the format "Q{uri}local" for all element and attribute names that are in a namespace, or the local name alone for all names that are not in a namespace.

  • The option local discards all namespace information: all elements and attributes are output using the local name alone.

  • The option lexical outputs element and attribute names in the form obtained by calling the function fn:name. If the name has a prefix, the prefix is retained in the output. However, the output contains no information that enables the prefix to be associated with a namespace URI, so this format is suitable only when prefixes in the input documents are used predictably.

Regardless of the chosen name-format, and regardless of the above rules, attributes in the xml namespace (http://www.w3.org/XML/1998/namespace) are output using a lexical QName, with the prefix xml.

Attribute names in the output are typically prefixed with the character "@". The option attribute-marker allows this to be changed to a different prefix or none.

Whichever format of names is chosen, if the rules for the selected layout would result in an output map having two entries with the same key, the conflict is resolved by combining these entries into an array. For example if name-format is set to local then the element <data x:val="3" y:val="4"/> becomes either { "data": { "@val": ["3", "4"] } } or (because attribute order is unpredictable) { "data": { "@val": ["4", "3"] } }.

Regardless of the chosen name-format, and regardless of the above rules:

  • Attributes in the xsi namespace (http://www.w3.org/2001/XMLSchema-instance) are discarded.

    Note:

    This is because these attributes can appear even when the schema does not allow the element to have attributes, which means that a layout might be chosen that does not accommodate attributes.

  • Attributes in the xml namespace (http://www.w3.org/XML/1998/namespace) are output using a lexical QName, with the prefix xml.

18.5.7 Element and Attribute Content

The conversion plan may indicate that element content is to be output as type string, numeric, or boolean: the default is string. In the case of untyped elements and attributes, the value is output as an instance of a string, numeric, or boolean type, according to this prescription. Specifically:

  • If the prescribed type is boolean and the value is castable as xs:boolean, then it is output as an instance of xs:boolean.

  • If the prescribed type is numeric and the value is castable as xs:numeric, then it is output as an instance of xs:integer, xs:decimal, or xs:double depending on the lexical form of the value, following the same rules as for XPath numeric literals. For example, "-1" becomes an xs:integer, 12.00 becomes an xs:decimal, and 1e-3 becomes an xs:double. The special xs:double values NaN and INF (which cannot be used as numeric literals) are also recognized.

  • In all other cases the value is output as an instance of xs:untypedAtomic, retaining its original lexical form.

Where the element or attribute is schema-validated, however:

  1. If an element has the nilled property (that is, xsi:nil="true"), then the mapping for nilled elements with the chosen layout is used.

  2. Let AV be the typed value of the node (that is, the result of atomization).

  3. If, however, an element is annotated with a type that does not allow atomization (specifically, a complex type with element-only content) then let AV be the string value of the element, as an atomic item of type xs:untypedAtomic.

  4. If an attribute is annotated as having a simple type of {variety} list, or if an element using layout simple or simple-plus is annotated as having either a simple type of {variety} list or a complex type with simple content of {variety} list then the atomized value AV is represented in the result as the array represented by the XPath expression array{AV}. This applies whether or not the atomized value actually contains multiple atomic items. The individual atomic items in the array retain their type, for example items of type xs:date remain items of type xs:date in the result.

  5. In all other cases AV will be a single atomic item, and this value is used as is, retaining its type.

Note:

Atomic items in the result of the fn:element-to-map function may thus be of any atomic type. The type information is lost if the result is subsequently serialized as JSON.

18.5.8 Lost XDM Information

This section is non-normative. Its purpose is to explain what information available in the XDM nodes supplied as input to the fn:element-to-map function is missing from the output.

  • Element and attribute names: If the chosen name-format is default or eqname, then local names and namespace URIs of elements and attributes are retained, but namespace prefixes are lost. If the chosen name-format is lexical, then prefixes are retained but namespace URIs are lost. If the chosen name-format is local then only local names are retained; namespace URIs and prefixes are lost.

    In addition, element names are lost when the parent element is mapped using list layout: see 18.5.1.5 Layout: Simple List.

  • In-scope namespaces: All information about in-scope namespaces (and in particular, bindings for namespaces that are declared but not used in element and attribute names) is lost.

  • The xsi namespace: All attributes in the xsi namespace (http://www.w3.org/2001/XMLSchema-instance) are lost, except when xml layout is selected.

  • Comments and processing instructions: Comments and processing instructions are lost except when they appear as children of elements that are mapped using the sequence, mixed or xml layouts.

  • Text nodes: Whitespace text nodes are discarded withwhen they appear as children of elements that are mapped using the empty, empty-plus, list, list-plus, record, or sequence layouts. Non-whitespace text nodes are never discarded.

  • Additional node properties: The values of the is-id, is-idref, and is-nilled properties of a node are lost.

  • Type annotations: The values of type annotations on elements are lost. Type annotations on atomized values of schema-validated nodes, however, are retained.

  • Element order: The order of child elements is lost when record layout is used and the element has multiple children with the same name.

18.5.10 fn:element-to-map-plan

Changes in 4.0  

  1. New in 4.0  [Issue 1797 PR 1906]

Summary

Analyzes sample data to generate a conversion plan suitable for use by the element-to-map function.

Signature
fn:element-to-map-plan(
$inputas (document-node() | element(*)) *
) as map(xs:string, record(*))
Properties

This function is deterministic, context-independent, and focus-independent.

Rules

The function takes as input a collection of document and element nodes and analyzes the trees rooted at these nodes to determine a conversion plan for converting elements in these trees to maps, suitable for serialization in JSON format. The conversion plan can be used as-is by supplying it directly to the element-to-map function; alternatively it can be amended before use. The plan can also be serialized to a file (in JSON format) allowing the same plan to be used repeatedly for transforming documents with a similar structure to those in the sample provided.

The rules followed by the function, and the detailed format of the conversion plan, are described in 18.5.2 Creating a conversion plan.

Formal Equivalent

The effect of the function is equivalent to the result of the following XPathXQuery expression.

let $data-type := fn($nodes as node()*) {
  if (every($nodes ! (. castable as xs:boolean))) then "boolean"
  else if (every($nodes ! (. castable as xs:numeric))) then "numeric"
  else ()
}
let $name := fn($node as node()) {
  if (namespace-uri($node)) 
  then expanded-QName(node-name($node))
  else local-name($node)
}  
return (
  for $ee in $input/descendant-or-self::*
  group by $n := $name($ee)
  return { $n :
           if (empty($ee/(*|text())))
             then { 'layout' : if (empty($ee/@*)) 
                               then 'empty' 
                               else 'empty-plus' } 
           else if (empty($ee/*)) 
             then map:merge((
                    if (empty($ee/@*)) 
                      then {'layout': 'simple'}
                      else {'layout': 'simple-plus'},
                    $data-type($ee) ! { 'type': . }
                 ))
           else if (empty($ee/text()[normalize-space()])) 
             then if (all-equal($ee/*/node-name()) and exists($ee/*[2]))
                    then { 'layout': if (empty($ee/@*)) 
                                     then 'list' 
                                     else 'list-plus',
                           'child': $name(head($ee/*))
                         }
                    else { 'layout' : if (every($ee ! all-different(*/node-name())))
                                      then 'record'
                                      else 'sequence'
                         }             
           else {'layout': 'mixed'}
        },
  for $a in $input//@*
  group by $n := $name($a)
  let $t := $data-type($a)
  return $t ! { `@{$n}`: { 'type': $t } }
) => map:merge()
Notes

The conversion plan is organized by element and attribute name, so its effectiveness depends on the $input collection being homogenous in its structure, and representative of the documents that will subsequently be converted using the element-to-map function.

This function is separate from the element-to-map function for a number of reasons:

  • The collection of documents that need to be analyzed to establish an effective conversion plan might be much smaller than the set of documents actually being converted.

  • Conversely, it might be that only a small number of documents need to be converted at a particular time, but the conversion plan used needs to take into account variations that might exist within a larger corpus.

  • If JSON output is required in a particular format, it might be necessary to fine-tune the automatically generated conversion plan to take account of these requirements.

  • It might be necessary to devise a conversion plan that can be used to convert individual documents as they arrive over a period of time, and to ensure that the same conversion rules are applied to each document even though documents might exhibit variations in structure.

  • The conversion plan is human-readable, which can help in understanding why the output of element-to-map is in a particular form.

Examples
Expression:
element-to-map-plan(<a><b>3</b><b>4</b></a>)
Result:
{ 'a': { 'layout': 'list', 'child': 'b' },
  'b': { 'layout': 'simple', 'type': 'numeric' }
}
Expression:
element-to-map-plan((<a x="2">red</a>, <a x="3">blue</a>))
Result:
{ 'a': { 'layout': 'simple-plus' },
  '@x': { 'type': 'numeric' }
}
Expression:
element-to-map-plan(
   <a xmlns="http://example.ns">H<sub>2</sub>SO<sub>4</sub></a>
)
Result:
{ 'Q{http://example.ns}a': { 'layout': 'mixed' },
  'Q{http://example.ns}sub': { 'layout': 'simple', 'type': 'numeric' }
}
Expression:
element-to-map-plan((<a><b/><b/></a>, <a><b/><c/></a>))
Result:
{ 'a': { 'layout': 'sequence' },
  'b': { 'layout': 'empty' },
  'c': { 'layout': 'empty' }
}

18.5.11 fn:element-to-map

Changes in 4.0  

  1. New in 4.0.  [ PR 1906]

Summary

Converts an element node into a map that is suitable for JSON serialization.

Signature
fn:element-to-map(
$elementas element()?,
$optionsas map(*):= map{}
) as map(xs:string, item()?)?
Properties

This function is deterministic, context-independent, and focus-independent.

Rules

This function returns a map derived from the element node supplied in $element. The map is in a form that is suitable for JSON serialization, thus providing a mechanism for conversion of arbitrary XML to JSON.

The map that is returned will always be a single-entry map; the key of this entry will be a string representing the element name, and the value of the entry will be a representation of the element's attributes and children.

The entries that may appear in the $options map are as follows. The option parameter conventions apply.

record(
plan?as map(xs:string, record(layout?, child?, type?, *)),
attribute-marker?as xs:string,
name-format?as xs:string
)
KeyValueMeaning

plan?

A conversion plan, supplied as a map whose keys represent element and attribute names. The plan might be generated using the function element-to-map-plan, or it might be constructed in some other way. The format of the plan is described in 18.5.2 Creating a conversion plan.
  • Type: map(xs:string, record(layout?, child?, type?, *))

  • Default: {}

attribute-marker?

A string that is prepended to any key value in the output that represents an XDM attribute node in the input. The string may be empty. If, after applying the requested prefix (or no prefix) there is a conflict between the names of attributes and child elements, then the requested prefix (or lack thereof) is ignored and the default prefix "@" is used.
  • Type: xs:string

  • Default: "@"

name-format?

Indicates how the names of element and attribute nodes are handled.
  • Type: xs:string

  • Default: "default"

lexicalNames are output in the form produced by the fn:name function.
localNames are output in the form produced by the fn:local-name function.
eqnameNames in a namespace are output in the form "Q{uri}local". Names in no namespace are output using the local name alone.
defaultAn element name is output as a local name alone if either (a) it is a top-level element and is in no namespace, or (b) it is in the same namespace as its parent element. An attribute name is output as a local name alone if it is in no namespace. All other names are output in the format "Q{uri}local" if in a namespace, or "Q{}local" if in no namespace. "Top-level" here means that the element is one that appears explicitly in the sequence of elements passed in the $elements argument, as distinct from a descendant of such an element.

If $element is an empty sequence, the result is an empty sequence.

The principles for conversion from elements to maps are described in 18.5.1 Element Layouts, and the rules for selecting an element layout for each element are given in 18.5.5 Selecting an element layout.

In general, every descendant element within the tree rooted at the supplied $element maps to a key-value pair in which the key represents the element name, and the corresponding value represents the attributes and children of the element. This key-value pair will be added to the content representing its parent element, in a way that depends on the parent element's layout.

The representation of a node of any other kind depends on the layout chosen for its parent element.

Error Conditions

A dynamic error [err:FOJS0008] occurs if any element cannot be processed using the selected layout for that element, unless fallback processing is defined; or if error action is explicitly requested for an element.

Any error in the conversion plan is treated as a type error [err:XPTY0004]XP whether or not it is technically a contravention of the defined type for the value. This relieves users and implementers of the burden of distinguishing different kinds of error in the plan.

Examples
Expression:

element-to-map(())

Result:
()
Expression:

element-to-map(<foo>bar</foo>)

Result:
{ "foo": "bar" }
Expression:
element-to-map(
    <list>
      <item value='1'/>
      <item value='2'/>
    </list>, { 'attribute-marker': '' }
  )
Result:
{ "list": [ 
    { "value": "1" },
    { "value": "2" }
  ] }
Expression:
element-to-map(
    <name>
      <first>Jane</first>
      <last>Smith</last>
    </name>
  )
Result:
{ "name": { 
  "first": "Jane",
  "last": "Smith" 
} }
Expression:
element-to-map(
    <name xmlns="http://example.ns">
      <first>Jane</first>
      <middle>Elizabeth</middle>
      <middle>Mary</middle>
      <last>Smith</last>
    </name>, 
    { 'plan': {'name': { 'layout': 'record' }},
      'name-format' : 'local'
    }
  )
Result:
{ "name": { 
    "first": "Jane",
    "middle": ["Elizabeth", "Mary"]
    "last": "Smith" 
  } 
}
Expression:
element-to-map(
    <name xmlns="http://example.ns">
      <first>Jane</first>
      <middle>Elizabeth</middle>
      <middle>Mary</middle>
      <last>Smith</last>
    </name>, 
    { 'plan': {'name': { 'layout': 'record' },
               'middle': { 'layout': 'deep-skip' },
      'name-format' : 'local'
    }
  )
Result:
{ "name": { 
    "first": "Jane",
    "last": "Smith" 
  } 
}