Chapter: XML and Web Services : Building XML-Based Applications : Parsing XML Using Document Object Model

DOM Traversal and Range

Traversal and range are features added in DOM Level 2. They are supported by Apache Xerces. You can determine whether traversal is supported by calling the hasFeature() method of the DOMImplementation interface. For traversal, you can use the arguments “Traversal” and “2.0” for the feature and version parameters of the hasFeature() method.

DOM Traversal and Range

Traversal

Traversal is a convenient way to walk through a DOM tree and select specific nodes. This is useful when you want to find certain elements and perform operations on them.

Traversal Interfaces

The traversal interfaces are listed in Table 7.4, along with a brief description of each.

TABLE 7.4 Summary of Traversal Interfaces

Traversal Example

Let’s look at an example in which traversal is used. Let’s say we want to print out just the names of books in our library. One way to do this is to write code to iterate through every node recursively and look for book elements. This will work, but we don’t need to do all that work ourselves. Instead, we can use NodeIterator to iterate through all the nodes and define a NodeFilter to select only the nodes with the name “book.” When we find a book node, we can get the value of the text content and print it out.

There are two classes we need to define. The first one, IteratorApp.java, contains the application code. The second one, NameNodeFilter.java, selects nodes with a given name. The source code for IteratorApp.java is shown in Listing 7.9, and the source code for NameNodeFilter.java is shown in Listing 7.10. Both source files must import org.w3c.dom.traversal in order to reference the traversal interfaces.

LISTING 7.9 IteratorApp.java

package com.madhu.xml;

import java.io.*;

import org.w3c.dom.*;

import org.w3c.dom.traversal.*;

import javax.xml.parsers.*;

public class IteratorApp {

protected DocumentBuilder docBuilder;

protected Document document;

protected Element root;

public IteratorApp() throws Exception {

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

docBuilder = dbf.newDocumentBuilder();

DOMImplementation domImp = docBuilder.getDOMImplementation();

if (domImp.hasFeature(“Traversal”, “2.0”)) {

System.out.println(“Parser supports Traversal”);

}

public void parse(String fileName) throws Exception {

document = docBuilder.parse(new FileInputStream(fileName));

root = document.getDocumentElement();

System.out.println(“Root element is “ + root.getNodeName());

}

public void iterate() { NodeIterator iter =

((DocumentTraversal)document).createNodeIterator(

root, NodeFilter.SHOW_ELEMENT,

new NameNodeFilter(“book”), true);

Node n = iter.nextNode(); while (n != null) {

System.out.println(n.getFirstChild().getNodeValue()); n = iter.nextNode();

}

public static void main(String args[]) throws Exception {

IteratorApp ia = new IteratorApp();

ia.parse(args[0]);

ia.iterate();

}

LISTING 7.10 NameNodeFilter.java

package com.madhu.xml;

import org.w3c.dom.*;

import org.w3c.dom.traversal.*;

public class NameNodeFilter implements NodeFilter {

protected String name;

public NameNodeFilter(String inName) {

name = inName;

}

public short acceptNode(Node n) {

if (n.getNodeName().equals(name)) {

return FILTER_ACCEPT;

} else {

return FILTER_REJECT;

}

Looking at IteratorApp.java, you’ll see that the traversal code is found in the iterate() method. We can create an instance of NodeIterator from the DocumentTraversal interface. But how do we get an instance of a DocumentTraversal interface? It turns out that if traversal is supported, the Document instance will also implement DocumentTraversal. If you look carefully at the iterate() method, you will see that the document is downcast into DocumentTraversal. The cast succeeds because traversal is supported by our implementation (Xerces). If it wasn’t supported, a ClassCastException would be raised at runtime.

The method for creating a NodeIterator is createNodeIterator(...), which accepts four parameters: the root node, a flag determining which nodes to show, a possible NodeFilter, and a flag determining whether entity references are to be expanded. In our example, we start at the document root, because we want to search the entire document. Constants in the NodeFilter interface define which nodes will be visible. You can choose options such as elements, attributes, text, and so on. The NodeFilter is optional. If you don’t want to use a NodeFilter, just supply “null” and no filter will be applied.

In our example, we define a node filter that looks for nodes with a given name. To define a node filter, we need to implement NodeFilter and fill in one method: acceptNode(). As we iterate through nodes, the traversal API will call our acceptNode() method, which can return either FILTER_ACCEPT, FILTER_REJECT, or FILTER_SKIP. For node iterators, FILTER_REJECT and FILTER_SKIP do the same thing. The behavior is slightly different for TreeWalker interfaces (refer to the documentation for the details). In our acceptNode() method, we just compare the name of the node and return FILTER_ACCEPT if the node name matches the name supplied when NameNodeFilter was created. We created an instance of NameNodeFilter with the name “book,” so we should find only book elements.

Going back to the iterate() method in the IteratorApp class, we can use a while loop to go through the nodes. The method nextNode() will return null when we get to the end of the list. Only element nodes with name “book” are returned. Once we find a book ele-ment, we can obtain the text content node by calling getFirstChild() and then calling getNodeValue() on that node. The input XML file is shown in Listing 7.11.

LISTING 7.11 library.xml—Input XML Document

<?xml version=”1.0” encoding=”UTF-8”?> <library>

<book>The Last Trail</book>

</fiction>

<book>The Last Lion, Winston Spencer Churchill</book>

</biography>

</library>

Here’s the output from IteratorApp:

Parser supports MutationEvents

Root element is library

Moby Dick

The Last Trail

The Last Lion, Winston Spencer Churchill

The TreeWalker interface provides many of the same benefits as NodeIterator. The main difference is that TreeWalker presents a tree-oriented view of the nodes instead of a list-oriented view. An iterator allows you to move forward and backward, but a TreeWalker interface allows you to also move to the parent of a node, to one of its chil-dren, or to a sibling. The DOM specification explains this in greater detail.

Range

Range interfaces provide a convenient way to select, delete, extract, and insert content. You can determine whether range is supported by calling the hasFeature(...) method of the DOMImplementation interface. You can use the arguments “Range” and “2.0” for feature and version. There are a number of applications for which the range interfaces are useful.

A range consists of two boundary points corresponding to the start and the end of the range. A boundary point’s position in a Document or DocumentFragment tree can be char-acterized by a node and an offset. The node is the container of the boundary point and its position. The container and its ancestors are the ancestor containers of the boundary point and its position. The offset within the node is the offset of the boundary point and its position. If the container is an Attr, Document, DocumentFragment, Element, or EntityReference node, the offset is between its child nodes. If the container is a

CharacterData, Comment, or ProcessingInstruction node, the offset is between the 16-bit units of the UTF-16 encoded string contained by it.

The boundary points of a range must have a common ancestor container that is either a Document, DocumentFragment, or Attr node. That is, the content of a range must be entirely within the subtree rooted by a single Document, DocumentFragment, or Attr node. This common ancestor container is known as the root container of the range. The tree rooted by the root container is known as the range’s context tree.

The container of a boundary point of a range must be an Element, Comment,

ProcessingInstruction, EntityReference, CDATASection, Document,

DocumentFragment, Attr, or Text node. None of the ancestor containers of the boundary point of a range can be a DocumentType, Entity, or Notation node.

Range Interfaces

The range interfaces are listed in Table 7.5, along with a brief description of each.

TABLE 7.5 Summary of Range Interfaces

Range Example

Let’s look at an example in which range is used. Let’s say we want to delete the first child node under the root. One way to do this is to write code to iterate through every node under the first child and remove it. However, we can accomplish the same operation with less code using ranges.

The source code for RangeApp.java is shown in Listing 7.12. We must import org.w3c. dom.range in order to refer to the range interfaces.

LISTING 7.12 RangeApp.java

package com.madhu.xml;

import java.io.*;

import org.w3c.dom.*;

import org.w3c.dom.ranges.*;

import javax.xml.parsers.*;

public class RangeApp {

protected DocumentBuilder docBuilder;

protected Document document;

protected Element root;

public RangeApp() throws Exception {

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

docBuilder = dbf.newDocumentBuilder();

DOMImplementation domImp = docBuilder.getDOMImplementation();

if (domImp.hasFeature(“Range”, “2.0”)) {

System.out.println(“Parser supports Range”);

}

public void parse(String fileName) throws Exception {

document = docBuilder.parse(new FileInputStream(fileName));

root = document.getDocumentElement();

System.out.println(“Root element is “ + root.getNodeName());

}

public void deleteRange() {

Range r = ((DocumentRange)document).createRange();

r.selectNodeContents(root.getFirstChild());

r.deleteContents();

}

public static void main(String args[]) throws Exception {

RangeApp ra = new RangeApp();

ra.parse(args[0]);

ra.deleteRange();

}

Looking at RangeApp.java, you’ll see that the traversal code is found in the deleteRange() method. We can create an instance of a range from the DocumentRange interface. We obtain a DocumentRange instance similar to the traversal example. If range is supported, the Document instance will also implement DocumentRange. In the deleteRange() method, you will see that the document is downcast into DocumentRange. The cast succeeds because range is supported by our implementation (Xerces). If it wasn’t supported, a ClassCastException would be raised at runtime.

The method for creating a range is createRange(), with no arguments. A number of methods in the Range interface set the range. In the example, we used selectNodeContents() to select all the content under the first child node under he root. We can delete this content using deleteContents().

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

XML and Web Services : Building XML-Based Applications : Parsing XML Using Document Object Model : DOM Traversal and Range |