XML DOM - Traverse Node Tree

Traversing means looping through or traveling across the node tree

Traversing the Node Tree

Often you want to loop an XML document, for example: when you want to extract the value of each element.

This is called "Traversing the node tree"

The example below loops through all child nodes of <book>, and displays their names and values:

Example

<!DOCTYPE html>
<html>
<body>

<p id="demo"></p>

<script>
var x, i ,xmlDoc;
var txt = "";
var text = "<book>" +
"<title>Everyday Italian</title>" +
"<author>Giada De Laurentiis</author>" +
"<year>2005</year>" +
"</book>";

parser = new DOMParser();
xmlDoc = parser.parseFromString(text,"text/xml");

// documentElement always represents the root node
x = xmlDoc.documentElement.childNodes;
for (i = 0; i < x.length ;i++) {
    txt += x[i].nodeName + ": " + x[i].childNodes[0].nodeValue + "<br>";
}
document.getElementById("demo").innerHTML = txt;
</script>

</body>
</html>

Output:

title: Everyday Italian
author: Giada De Laurentiis
year: 2005

Example explained:

  • Load the XML string into xmlDoc
  • Get the child nodes of the root element
  • For each child node, output the node name and the node value of the text node
  • Browser Differences in DOM Parsing

    All modern browsers support the W3C DOM specification.

    However, there are some differences between browsers. One important difference is:

  • The way they handle white-spaces and new lines
  • DOM - White Spaces and New Lines

    XML often contains new line, or white space characters, between nodes. This is often the case when the document is edited by a simple editor like Notepad.

    The following example (edited by Notepad) contains CR/LF (new line) between each line and two spaces in front of each child node:

    <book>
      <title>Everyday Italian</title>
      <author>Giada De Laurentiis</author>
      <year>2005</year>
      <price>30.00</price>
    </book>

    Internet Explorer 9 and earlier do NOT treat empty white-spaces, or new lines as text nodes, while other browsers do.

    The following example will output the number of child nodes the root element (of books.xml) has. IE9 and earlier will output 4 child nodes, while IE10 and later versions, and other browsers will output 9 child nodes:

    Example

    function myFunction(xml) {
    var xmlDoc = xml.responseXML;
        x = xmlDoc.documentElement.childNodes;
        document.getElementById("demo").innerHTML =
        "Number of child nodes: " + x.length;
    }

    PCDATA - Parsed Character Data

    XML parsers normally parse all the text in an XML document.

    When an XML element is parsed, the text between the XML tags is also parsed:

    <message>This text is also parsed</message>

    The parser does this because XML elements can contain other elements, as in this example, where the <name> element contains two other elements (first and last):

    <name><first>Bill</first><last>Gates</last></name>

    and the parser will break it up into sub-elements like this:

    <name>
      <first>Bill</first>
      <last>Gates</last>
    </name>

    Parsed Character Data (PCDATA) is a term used about text data that will be parsed by the XML parser.

    CDATA - (Unparsed) Character Data

    The term CDATA is used about text data that should not be parsed by the XML parser.

    Characters like "<" and "&" are illegal in XML elements.

    "<" will generate an error because the parser interprets it as the start of a new element.

    "&" will generate an error because the parser interprets it as the start of an character entity.

    Some text, like JavaScript code, contains a lot of "<" or "&" characters. To avoid errors script code can be defined as CDATA.

    Everything inside a CDATA section is ignored by the parser.

    A CDATA section starts with "<![CDATA[" and ends with "]]>":

    <script>
    <![CDATA[
    function matchwo(a,b) {
        if (a < b && a < 0) {
            return 1;
        } else {
            return 0;
        }
    }
    ]]>
    </script>

    In the example above, everything inside the CDATA section is ignored by the parser.

    Notes on CDATA sections:

    A CDATA section cannot contain the string "]]>". Nested CDATA sections are not allowed.

    The "]]>" that marks the end of the CDATA section cannot contain spaces or line breaks.