To illustrate how SAX works, let’s say you have a simple document, like this one:
<?xml version=”1.0” encoding=”UTF-8”?> <fiction>
<book author=”Herman Melville”>Moby Dick</book> </fiction>
If you want to parse this document using SAX, you would build a content handler by creating a Java class that implements the ContentHandler interface in the org.xml.sax package. Convenience adapters are available that simplify some of this.
Once you have a content handler, you simply register it with a SAX XMLReader, set up the input source, and start the parser. Next, the methods in your content handler will be called when the parser encounters elements, text, and other data. Specifically, the events generated by the preceding example will look something like this:
start element: fiction
start element: book (including attributes) characters: Moby Dick
end element: book end element: fiction end document
As you can see, the events reported follow the content of the document in a linear sequence. There are a number of other events that might be generated in response to processing instructions, errors, and comments. We will look at these in the examples that follow.
The SAX 2.0 API is comprised of two standard packages and one extension package. The standard packages are org.xml.sax and org.xml.helpers. The org.xml.sax pack-age contains the basic classes, interfaces, and exceptions needed for parsing documents. There, you will find most of the interfaces needed to create handlers for various types of events. We will use many of these classes and interfaces in the sample code later in this chapter. A summary of the org.xml.sax package is shown in Table 8.1.
TABLE 8.1 The org.xml.sax Package
The org.xml.sax.helpers package contains additional classes that can simplify some of your coding and make it more portable. You will find a number of adapters that imple-ment many of the handler interfaces, so you don’t need to fill in all the methods defined in the interfaces. Factory classes provide a mechanism for obtaining a parser independent of the implementation. We will use many of these classes and interfaces in the sample code later in this chapter. A summary of the org.xml.sax.helpers package is shown in Table 8.2.
TABLE 8.2 The org.xml.sax.helpers Package
Class : Description
AttributeListImpl : Deprecated. This class implements a deprecated interface, AttributeList that has been replaced by Attributes, which is implemented in the AttributesImpl helper class.
AttributesImpl : Default implementation of the Attributes interface.
DefaultHandler : Default base class for SAX2 event handlers.
LocatorImpl : Provides an optional convenience implementation of Locator.
NamespaceSupport : Encapsulate namespace logic for use by SAX drivers.
ParserAdapter : Adapts a SAX1 Parser as a SAX2 XMLReader.
ParserFactory : Deprecated. This class works with the deprecated Parser interface.
XMLFilterImpl : Base class for deriving an XML filter.
XMLReaderAdapter : Adapts a SAX2 XMLReader as a SAX1 Parser.
XMLReaderFactory : Factory for creating an XML reader.
The org.xml.sax.ext package is an extension that is not shipped with all implementa-tions. It contains two handler interfaces for capturing declaration and lexical events. We will use some of these classes and interfaces in the sample code later in this chapter. A summary of the org.xml.sax.ext package is shown in Table 8.3.
TABLE 8.3 The org.xml.sax.ext Package
Interface : Description
DeclHandler : SAX2 extension handler for DTD declaration events
LexicalHandler : SAX2 extension handler for lexical events
As mentioned earlier, a number of SAX implementations exist. SAX implementations include all the underlying classes needed to parse documents. The SAX API by itself does not include these underlying classes, so you will need to obtain an implementation. You can find a list of implementations at http://www.megginson.com/SAX/applica-tions.html. When looking for an implementation, you might want to consider several factors, such as version support, validating/nonvalidating, DTD/XML Schema support, and so on.
As in the case of DOM, several high-quality free implementations exist, so cost is not an issue. If you want to validate documents while parsing XML, you will need a validating SAX implementation. Most validating implementations support DTDs, and some even support XML Schema.
In terms of performance, there is not much hard data. You might have to do some bench-marking yourself to determine whether it’s fast enough for you. For the examples in this chapter, we will use Xerces, developed by the Apache XML group. Xerces is a validating parser with full support for SAX 2.0. Xerces is very popular and widely regarded as a high-quality parser. It is freely available at http://xml.apache.org.