SAX Basics
To illustrate how SAX works, let’s say you have a simple document, like
this one:
<?xml version=”1.0” encoding=”UTF-8”?> <fiction>
<book author=”Herman Melville”>Moby
Dick</book> </fiction>
If you want to parse this document using SAX, you would build a content handler by creating a Java class
that implements the ContentHandler interface in the org.xml.sax package. Convenience adapters are available that simplify some of this.
Once you have a content handler, you simply register it with a SAX XMLReader, set up the input source, and
start the parser. Next, the methods in your content handler will be called when
the parser encounters elements, text, and other data. Specifically, the events
generated by the preceding example will look something like this:
start document
start element: fiction
start element: book (including attributes) characters: Moby Dick
end element: book end element: fiction end document
As you can see, the events reported follow the content of the document
in a linear sequence. There are a number of other events that might be
generated in response to processing instructions, errors, and comments. We will
look at these in the examples that follow.
SAX Packages
The SAX 2.0 API is comprised of two standard packages and one extension
package. The standard packages are org.xml.sax and org.xml.helpers. The org.xml.sax pack-age contains the basic classes, interfaces, and exceptions needed
for parsing documents. There, you will find most of the interfaces needed to
create handlers for various types of events. We will use many of these classes
and interfaces in the sample code later in this chapter. A summary of the org.xml.sax package is shown in Table 8.1.
TABLE 8.1 The org.xml.sax
Package
The org.xml.sax.helpers package contains additional classes that can simplify some of your
coding and make it more portable. You will find a number of adapters that
imple-ment many of the handler interfaces, so you don’t need to fill in all the
methods defined in the interfaces. Factory classes provide a mechanism for
obtaining a parser independent of the implementation. We will use many of these
classes and interfaces in the sample code later in this chapter. A summary of
the org.xml.sax.helpers package
is shown in Table 8.2.
TABLE 8.2 The org.xml.sax.helpers Package
Class : Description
AttributeListImpl :
Deprecated. This class implements a
deprecated interface, AttributeList
that has been replaced by Attributes, which is implemented
in the AttributesImpl helper class.
AttributesImpl :
Default implementation of the Attributes interface.
DefaultHandler :
Default base class for SAX2 event handlers.
LocatorImpl : Provides
an optional convenience implementation of Locator.
NamespaceSupport :
Encapsulate namespace logic for use by SAX drivers.
ParserAdapter : Adapts
a SAX1 Parser as a SAX2 XMLReader.
ParserFactory :
Deprecated. This class works with the
deprecated Parser interface.
XMLFilterImpl : Base
class for deriving an XML filter.
XMLReaderAdapter :
Adapts a SAX2 XMLReader as a SAX1 Parser.
XMLReaderFactory :
Factory for creating an XML reader.
The org.xml.sax.ext package is an extension that is not shipped with all implementa-tions.
It contains two handler interfaces for capturing declaration and lexical
events. We will use some of these classes and interfaces in the sample code
later in this chapter. A summary of the org.xml.sax.ext package is shown in Table 8.3.
TABLE 8.3 The org.xml.sax.ext Package
Interface : Description
DeclHandler : SAX2
extension handler for DTD declaration events
LexicalHandler : SAX2
extension handler for lexical events
SAX Implementations
As mentioned earlier, a number of SAX implementations exist. SAX
implementations include all the underlying classes needed to parse documents.
The SAX API by itself does not include these underlying classes, so you will
need to obtain an implementation. You can find a list of implementations at http://www.megginson.com/SAX/applica-tions.html. When looking for an implementation, you might want to consider several factors, such as version support,
validating/nonvalidating, DTD/XML Schema support, and so on.
As in the case of DOM, several high-quality free implementations exist,
so cost is not an issue. If you want to validate documents while parsing XML,
you will need a validating SAX implementation. Most validating implementations
support DTDs, and some even support XML Schema.
In terms of performance, there is not much hard data. You might have to
do some bench-marking yourself to determine whether it’s fast enough for you.
For the examples in this chapter, we will use Xerces, developed by the Apache
XML group. Xerces is a validating parser with full support for SAX 2.0. Xerces
is very popular and widely regarded as a high-quality parser. It is freely
available at http://xml.apache.org.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.