Home | | Service Oriented Architecture | DOM Traversal and Range

Chapter: XML and Web Services : Building XML-Based Applications : Parsing XML Using Document Object Model

DOM Traversal and Range

Traversal and range are features added in DOM Level 2. They are supported by Apache Xerces. You can determine whether traversal is supported by calling the hasFeature() method of the DOMImplementation interface. For traversal, you can use the arguments “Traversal” and “2.0” for the feature and version parameters of the hasFeature() method.

DOM Traversal and Range

 

Traversal and range are features added in DOM Level 2. They are supported by Apache Xerces. You can determine whether traversal is supported by calling the hasFeature() method of the DOMImplementation interface. For traversal, you can use the arguments “Traversal” and “2.0” for the feature and version parameters of the hasFeature() method.

 

 

Traversal

 

Traversal is a convenient way to walk through a DOM tree and select specific nodes. This is useful when you want to find certain elements and perform operations on them.

 

Traversal Interfaces

 

The traversal interfaces are listed in Table 7.4, along with a brief description of each.

 

TABLE 7.4     Summary of Traversal Interfaces


Traversal Example

 

Let’s look at an example in which traversal is used. Let’s say we want to print out just the names of books in our library. One way to do this is to write code to iterate through every node recursively and look for book elements. This will work, but we don’t need to do all that work ourselves. Instead, we can use NodeIterator to iterate through all the nodes and define a NodeFilter to select only the nodes with the name “book.” When we find a book node, we can get the value of the text content and print it out.

 

There are two classes we need to define. The first one, IteratorApp.java, contains the application code. The second one, NameNodeFilter.java, selects nodes with a given name. The source code for IteratorApp.java is shown in Listing 7.9, and the source code for NameNodeFilter.java is shown in Listing 7.10. Both source files must import org.w3c.dom.traversal in order to reference the traversal interfaces.

 

LISTING 7.9   IteratorApp.java

 

package  com.madhu.xml;

 

import  java.io.*;

 

import  org.w3c.dom.*;

 

import org.w3c.dom.traversal.*;

import javax.xml.parsers.*;

 

public  class  IteratorApp  {

 

protected DocumentBuilder docBuilder;

protected Document document;

protected Element root;

 

public  IteratorApp()  throws  Exception  {

 

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

docBuilder = dbf.newDocumentBuilder();

DOMImplementation domImp = docBuilder.getDOMImplementation();

if (domImp.hasFeature(“Traversal”, “2.0”)) {

System.out.println(“Parser  supports  Traversal”);

 

}

 

}

 

public  void  parse(String  fileName)  throws  Exception  {

 

document = docBuilder.parse(new FileInputStream(fileName));

root = document.getDocumentElement();

System.out.println(“Root element is “ + root.getNodeName());

 

}

 

public void iterate() { NodeIterator iter =

 

((DocumentTraversal)document).createNodeIterator(

root, NodeFilter.SHOW_ELEMENT,

 

new  NameNodeFilter(“book”),  true);

 

Node n = iter.nextNode(); while (n != null) {

 

System.out.println(n.getFirstChild().getNodeValue()); n = iter.nextNode();

 

}

 

}

 

public static void main(String args[]) throws Exception {

IteratorApp ia = new IteratorApp();

ia.parse(args[0]);

 

ia.iterate();

 

}

 

}

 

 

LISTING 7.10   NameNodeFilter.java

 

package  com.madhu.xml;

 

import  org.w3c.dom.*;

 

import  org.w3c.dom.traversal.*;

 

public class NameNodeFilter implements NodeFilter {

protected String name;

 

public NameNodeFilter(String inName) {

name = inName;

 

}

 

public  short  acceptNode(Node  n)  {

 

if (n.getNodeName().equals(name)) {

return FILTER_ACCEPT;

}  else  {

 

return  FILTER_REJECT;

 

}

 

}

 

}

 

Looking at IteratorApp.java, you’ll see that the traversal code is found in the iterate() method. We can create an instance of NodeIterator from the DocumentTraversal interface. But how do we get an instance of a DocumentTraversal interface? It turns out that if traversal is supported, the Document instance will also implement DocumentTraversal. If you look carefully at the iterate() method, you will see that the document is downcast into DocumentTraversal. The cast succeeds because traversal is supported by our implementation (Xerces). If it wasn’t supported, a ClassCastException would be raised at runtime.

 

The method for creating a NodeIterator is createNodeIterator(...), which accepts four parameters: the root node, a flag determining which nodes to show, a possible NodeFilter, and a flag determining whether entity references are to be expanded. In our example, we start at the document root, because we want to search the entire document. Constants in the NodeFilter interface define which nodes will be visible. You can choose options such as elements, attributes, text, and so on. The NodeFilter is optional. If you don’t want to use a NodeFilter, just supply “null” and no filter will be applied.

 

In our example, we define a node filter that looks for nodes with a given name. To define a node filter, we need to implement NodeFilter and fill in one method: acceptNode(). As we iterate through nodes, the traversal API will call our acceptNode() method, which can return either FILTER_ACCEPT, FILTER_REJECT, or FILTER_SKIP. For node iterators, FILTER_REJECT and FILTER_SKIP do the same thing. The behavior is slightly different for TreeWalker interfaces (refer to the documentation for the details). In our acceptNode() method, we just compare the name of the node and return FILTER_ACCEPT if the node name matches the name supplied when NameNodeFilter was created. We created an instance of NameNodeFilter with the name “book,” so we should find only book elements.

 

Going back to the iterate() method in the IteratorApp class, we can use a while loop to go through the nodes. The method nextNode() will return null when we get to the end of the list. Only element nodes with name “book” are returned. Once we find a book ele-ment, we can obtain the text content node by calling getFirstChild() and then calling getNodeValue() on that node. The input XML file is shown in Listing 7.11.

LISTING 7.11  library.xmlInput XML Document

 

<?xml version=”1.0” encoding=”UTF-8”?> <library>

 

<fiction>

 

<book>Moby Dick</book>

<book>The Last Trail</book>

 

</fiction>

 

<biography>

 

<book>The Last Lion, Winston Spencer Churchill</book>

</biography>

 

</library>

 

Here’s the output from IteratorApp:

 

Parser  supports  MutationEvents

 

Root  element  is  library

 

Moby  Dick

 

The  Last  Trail

 

The  Last  Lion,  Winston  Spencer  Churchill

 

The TreeWalker interface provides many of the same benefits as NodeIterator. The main difference is that TreeWalker presents a tree-oriented view of the nodes instead of a list-oriented view. An iterator allows you to move forward and backward, but a TreeWalker interface allows you to also move to the parent of a node, to one of its chil-dren, or to a sibling. The DOM specification explains this in greater detail.

 

Range

 

Range interfaces provide a convenient way to select, delete, extract, and insert content. You can determine whether range is supported by calling the hasFeature(...) method of the DOMImplementation interface. You can use the arguments “Range” and “2.0” for feature and version. There are a number of applications for which the range interfaces are useful.

 

A range consists of two boundary points corresponding to the start and the end of the range. A boundary point’s position in a Document or DocumentFragment tree can be char-acterized by a node and an offset. The node is the container of the boundary point and its position. The container and its ancestors are the ancestor containers of the boundary point and its position. The offset within the node is the offset of the boundary point and its position. If the container is an Attr, Document, DocumentFragment, Element, or EntityReference node, the offset is between its child nodes. If the container is a

 

CharacterData, Comment, or ProcessingInstruction node, the offset is between the 16-bit units of the UTF-16 encoded string contained by it.

The boundary points of a range must have a common ancestor container that is either a Document, DocumentFragment, or Attr node. That is, the content of a range must be entirely within the subtree rooted by a single Document, DocumentFragment, or Attr node. This common ancestor container is known as the root container of the range. The tree rooted by the root container is known as the range’s context tree.

 

The container of a boundary point of a range must be an Element, Comment,

 

ProcessingInstruction, EntityReference, CDATASection, Document,

 

DocumentFragment, Attr, or Text node. None of the ancestor containers of the boundary point of a range can be a DocumentType, Entity, or Notation node.

 

Range Interfaces

 

The range interfaces are listed in Table 7.5, along with a brief description of each.

 

TABLE 7.5     Summary of Range Interfaces


Range Example

 

Let’s look at an example in which range is used. Let’s say we want to delete the first child node under the root. One way to do this is to write code to iterate through every node under the first child and remove it. However, we can accomplish the same operation with less code using ranges.

 

The source code for RangeApp.java is shown in Listing 7.12. We must import org.w3c. dom.range in order to refer to the range interfaces.

 

LISTING 7.12  RangeApp.java

 

package  com.madhu.xml;

 

import  java.io.*;

 

import  org.w3c.dom.*;

 

import org.w3c.dom.ranges.*;

import javax.xml.parsers.*;

public  class  RangeApp  {

 

protected DocumentBuilder docBuilder;

protected Document document;

protected Element root;

 

public  RangeApp()  throws  Exception  {

 

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

docBuilder = dbf.newDocumentBuilder();

 

DOMImplementation domImp = docBuilder.getDOMImplementation();

if (domImp.hasFeature(“Range”, “2.0”)) {

 

System.out.println(“Parser  supports  Range”);

 

}

 

}

 

public  void  parse(String  fileName)  throws  Exception  {

 

document = docBuilder.parse(new FileInputStream(fileName));

root = document.getDocumentElement();

System.out.println(“Root element is “ + root.getNodeName());

 

}

 

public  void  deleteRange()  {

 

Range r = ((DocumentRange)document).createRange();

r.selectNodeContents(root.getFirstChild());

r.deleteContents();

 

}

 

public static void main(String args[]) throws Exception {

RangeApp ra = new RangeApp();

 

ra.parse(args[0]);

 

ra.deleteRange();

 

}

 

}

 

Looking at RangeApp.java, you’ll see that the traversal code is found in the deleteRange() method. We can create an instance of a range from the DocumentRange interface. We obtain a DocumentRange instance similar to the traversal example. If range is supported, the Document instance will also implement DocumentRange. In the deleteRange() method, you will see that the document is downcast into DocumentRange. The cast succeeds because range is supported by our implementation (Xerces). If it wasn’t supported, a ClassCastException would be raised at runtime.

 

The method for creating a range is createRange(), with no arguments. A number of methods in the Range interface set the range. In the example, we used selectNodeContents() to select all the content under the first child node under he root. We can delete this content using deleteContents().


Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
XML and Web Services : Building XML-Based Applications : Parsing XML Using Document Object Model : DOM Traversal and Range |


Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.