DOM Traversal and Range
Traversal and range are features added in DOM Level 2. They are supported
by Apache Xerces. You can determine whether traversal is supported by calling
the hasFeature() method
of the DOMImplementation interface. For traversal, you can use the arguments “Traversal” and
“2.0” for the feature and version parameters of the hasFeature() method.
Traversal
Traversal is a convenient way to walk through a DOM tree and select
specific nodes. This is useful when you want to find certain elements and
perform operations on them.
Traversal Interfaces
The traversal interfaces are listed in Table 7.4, along with a brief
description of each.
TABLE 7.4 Summary of Traversal
Interfaces
Traversal Example
Let’s look at an example in which traversal is
used. Let’s say we want to print out just the names of books in our library.
One way to do this is to write code to iterate through every node recursively
and look for book elements. This will work, but we don’t need to do all that
work ourselves. Instead, we can use NodeIterator to iterate through all the nodes and define a NodeFilter to select only the nodes with
the name “book.” When we find a book node, we can get the value of the text
content and print it out.
There are two classes we need to define. The first
one, IteratorApp.java,
contains the application code. The second one, NameNodeFilter.java, selects nodes with a given name. The source code for IteratorApp.java is shown in Listing 7.9, and the
source code for NameNodeFilter.java is shown in Listing 7.10. Both source files must import org.w3c.dom.traversal in order to reference the
traversal interfaces.
LISTING 7.9 IteratorApp.java
package
com.madhu.xml;
import
java.io.*;
import
org.w3c.dom.*;
import org.w3c.dom.traversal.*;
import javax.xml.parsers.*;
public
class IteratorApp {
protected DocumentBuilder docBuilder;
protected Document document;
protected Element root;
public
IteratorApp() throws Exception
{
DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
docBuilder = dbf.newDocumentBuilder();
DOMImplementation domImp =
docBuilder.getDOMImplementation();
if (domImp.hasFeature(“Traversal”, “2.0”)) {
System.out.println(“Parser supports
Traversal”);
}
}
public
void parse(String fileName)
throws Exception {
document = docBuilder.parse(new
FileInputStream(fileName));
root = document.getDocumentElement();
System.out.println(“Root element is “ +
root.getNodeName());
}
public void iterate() {
NodeIterator iter =
((DocumentTraversal)document).createNodeIterator(
root, NodeFilter.SHOW_ELEMENT,
new
NameNodeFilter(“book”), true);
Node n = iter.nextNode(); while (n != null) {
System.out.println(n.getFirstChild().getNodeValue());
n = iter.nextNode();
}
}
public static void main(String
args[]) throws Exception {
IteratorApp ia = new
IteratorApp();
ia.parse(args[0]);
ia.iterate();
}
}
LISTING 7.10 NameNodeFilter.java
package
com.madhu.xml;
import
org.w3c.dom.*;
import org.w3c.dom.traversal.*;
public class NameNodeFilter
implements NodeFilter {
protected String name;
public NameNodeFilter(String
inName) {
name = inName;
}
public
short acceptNode(Node n) {
if (n.getNodeName().equals(name)) {
return FILTER_ACCEPT;
} else {
return
FILTER_REJECT;
}
}
}
Looking at IteratorApp.java, you’ll see that the traversal code is found in the iterate() method. We can create an instance
of NodeIterator from the DocumentTraversal interface. But how do we get an
instance of a DocumentTraversal interface? It turns out that if traversal is supported, the Document instance will also implement DocumentTraversal. If you look carefully at the iterate() method, you will see that the
document is downcast into DocumentTraversal. The cast succeeds because traversal is supported by our implementation
(Xerces). If it wasn’t supported, a ClassCastException would be raised at runtime.
The method for creating a NodeIterator is createNodeIterator(...), which accepts four parameters: the root node, a flag determining which
nodes to show, a possible NodeFilter, and a flag determining whether entity references are to be expanded.
In our example,
we start at the document root, because we want to search the entire document.
Constants in the NodeFilter interface define which nodes will be visible. You can choose options
such as elements, attributes, text, and so on. The NodeFilter is optional. If you don’t want to use a NodeFilter, just supply “null” and no filter will be applied.
In our example, we define a node filter that looks for nodes with a
given name. To define a node filter, we need to implement NodeFilter and fill in one method: acceptNode(). As we iterate through nodes,
the traversal API will call our acceptNode() method, which can return either FILTER_ACCEPT, FILTER_REJECT, or FILTER_SKIP. For node iterators, FILTER_REJECT and FILTER_SKIP do the same thing. The behavior is slightly different for TreeWalker interfaces (refer to the
documentation for the details). In our acceptNode()
method, we just compare the name of the node and
return FILTER_ACCEPT if the node name matches the name supplied when NameNodeFilter was created. We created an
instance of NameNodeFilter with the name “book,” so we should find only book elements.
Going back to the iterate() method in the IteratorApp class, we can use a while loop to go through the nodes. The method nextNode() will return null when we get to the end of the list. Only element nodes
with name “book” are returned. Once we find a book ele-ment, we can obtain the
text content node by calling getFirstChild() and then calling getNodeValue() on that node. The input XML file is shown in Listing 7.11.
LISTING 7.11 library.xml—Input XML Document
<?xml version=”1.0” encoding=”UTF-8”?> <library>
<fiction>
<book>Moby Dick</book>
<book>The Last Trail</book>
</fiction>
<biography>
<book>The Last Lion,
Winston Spencer Churchill</book>
</biography>
</library>
Here’s the output from IteratorApp:
Parser supports MutationEvents
Root element is
library
Moby Dick
The Last Trail
The Last Lion,
Winston Spencer Churchill
The TreeWalker interface provides many of the same benefits as NodeIterator. The main difference is that TreeWalker presents a tree-oriented view of
the nodes instead of a list-oriented view. An iterator allows you to move
forward and backward, but a TreeWalker interface allows you to also move to the parent of a node, to one of its
chil-dren, or to a sibling. The DOM specification explains this in greater
detail.
Range
Range interfaces provide a convenient way to select, delete, extract,
and insert content. You can determine whether range is supported by calling the
hasFeature(...) method
of the DOMImplementation interface. You can use the arguments “Range” and “2.0” for feature and
version. There are a number of applications for which the range interfaces are
useful.
A range consists of two boundary points corresponding to the start and
the end of the range. A boundary point’s position in a Document or DocumentFragment tree can be char-acterized by a
node and an offset. The node is the container of the boundary point and its
position. The container and its ancestors are the ancestor containers of the
boundary point and its position. The offset within the node is the offset of
the boundary point and its position. If the container is an Attr, Document, DocumentFragment, Element, or EntityReference node, the offset is between its child nodes. If the container is a
CharacterData, Comment, or ProcessingInstruction node, the offset is between the 16-bit units of the UTF-16 encoded string contained by it.
The boundary points of a range must have a common ancestor container
that is either a Document, DocumentFragment, or Attr node. That is, the content of a range must be entirely within the subtree
rooted by a single Document, DocumentFragment, or Attr node. This common ancestor container is known as the root container of the range. The tree
rooted by the root container is known as the range’s context tree.
The container of a boundary point of a range must be an Element, Comment,
ProcessingInstruction,
EntityReference,
CDATASection, Document,
DocumentFragment, Attr, or Text node. None of the ancestor containers of the boundary point of a range can be
a DocumentType, Entity, or Notation node.
Range Interfaces
The range interfaces are listed in Table 7.5, along with a brief
description of each.
TABLE 7.5 Summary of Range Interfaces
Range Example
Let’s look at an example in which range is used. Let’s say we want to
delete the first child node under the root. One way to do this is to write code
to iterate through every node under the first child and remove it. However, we
can accomplish the same operation with less code using ranges.
The source code for RangeApp.java is shown in Listing 7.12. We must import org.w3c. dom.range in order to refer to the range interfaces.
LISTING 7.12 RangeApp.java
package com.madhu.xml;
import java.io.*;
import org.w3c.dom.*;
import org.w3c.dom.ranges.*;
import javax.xml.parsers.*;
public
class RangeApp {
protected DocumentBuilder docBuilder;
protected Document document;
protected Element root;
public
RangeApp() throws Exception
{
DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
docBuilder = dbf.newDocumentBuilder();
DOMImplementation domImp =
docBuilder.getDOMImplementation();
if (domImp.hasFeature(“Range”, “2.0”)) {
System.out.println(“Parser supports
Range”);
}
}
public
void parse(String fileName)
throws Exception {
document = docBuilder.parse(new
FileInputStream(fileName));
root = document.getDocumentElement();
System.out.println(“Root element is “ +
root.getNodeName());
}
public
void deleteRange() {
Range r = ((DocumentRange)document).createRange();
r.selectNodeContents(root.getFirstChild());
r.deleteContents();
}
public static void main(String
args[]) throws Exception {
RangeApp ra = new RangeApp();
ra.parse(args[0]);
ra.deleteRange();
}
}
Looking at RangeApp.java, you’ll see that the traversal code is found in the deleteRange() method. We can create an instance
of a range from the DocumentRange interface. We obtain a DocumentRange instance similar to the traversal example. If range is supported, the Document instance will also implement DocumentRange. In the deleteRange() method, you will see that the
document is downcast into DocumentRange. The cast
succeeds because range is supported by our implementation (Xerces). If it
wasn’t supported, a ClassCastException would be raised at runtime.
The method for creating a range is createRange(), with no arguments. A number of methods in the Range interface set the range. In the
example, we used selectNodeContents() to select all the content under the first child node under he root. We can delete this
content using deleteContents().
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.