Basics
of Reading and Processing XML
Now that you have learned the
basics of how to write well-formed XML documents (learning how to write valid
XML documents is covered in the chapters on DTDs and the XML Schema), it is now
important that you learn how to process and handle these XML documents. After
all, the value of XML is not in its creation but in its use.
Along these lines, processing
XML follows a few major lines: parsing the XML docu-ment, processing and making
use of the parsed elements, and integrating with other sys-tems and programming
languages. Because XML is just a text document format and not a programming
language, it provides no mechanism to instruct machines how to process the
content contained within it. That’s actually a good thing. Because there are no
spe-cific processing requirements, XML documents can be processed by all types
of devices, operating systems, clients, servers, and other information
consumers, all which only need to understand how to read XML. XML not only has
separated the presentation from data, it has separated the strict processing
requirements from data. In essence, XML is as pure a data format as possible.
The following sections
explore the various steps of processing XML and the tools available to
accomplish these tasks.
Parsers
The first step for any system
that plans to make use of XML documents is to actually read the documents into
memory. Although this may seem like a simple task, the struc-tured nature of
XML imposes several requirements on parsers. In addition, the behavior of
parsing applications needs to be consistent so that XML documents can be
reliably exchanged between disparate systems. As a result, XML parsers must
adhere to a certain accepted level of compliance.
Because an XML document is
just a text file, any user can write his or her own program to read in the XML
text file and take it apart for use in a programming application. However, the
amount of time and complexity it would take to write such an XML docu-ment
reader (which, by the way, would have to be written over and over again for the
dif-ferent programs that need access to the information in XML documents) would
make the adoption of XML an onerous task. The WC3 (the XML standardization
body) came to the realization that a standard mechanism was needed to parse
these XML documents and promoted the use of compliant XML parsers. As a result,
a number of widely avail-able XML parsers exist that allow the application
developer to focus on application-spe-cific code rather than on XML document
reading or processing.
In actuality, there are
really two types of XML parsers: validating parsers and nonvalidat-ing parsers.
Nonvalidating parsers merely read XML documents and verify that the docu-ments
are well formed. Validating parsers read well-formed documents in addition to
checking their compliance against a DTD, XML Schema, or other validation set.
Obviously, nonvalidating parsers are much easier to program and can be made
extremely efficient and space conserving. The first iteration of XML parsers
were nonvalidating because the DTD and XML Schema proposals were far from
stable. As the specifications became more stable, the number of validating
parsers likewise increased. As a result, many of the parsers currently on the
market (commercial or open source) are validating parsers that have
progressively become more robust and efficient.
Because of the added
complexity of ensuring validity and compliance with a DTD or schema, validating
parsers tend to be much larger in memory and processing footprint than
nonvalidating parsers. If most of the XML in a particular system is well formed
and doesn’t need to be checked for validity, the use of a nonvalidating parser
may be a better idea.
Examples of nonvalidating parsers
include James Clark’s expat, XP, and Lark. Examples of validating parsers
include IBM’s XML for Java, the DataChannel XML Parser (DXP), Daniel Veillard’s
libXML, and Apache’s Xerces. Microsoft’s MSXML includes both vali-dating and
nonvalidating parsers that support a variety of platforms. These parsers run
the gamut from open source efforts to commercial products, from extremely tiny
imple-mentations to large, robust efforts. Information about these tools and
links to find out more information are included in the chapters that cover them
in more detail.
Event-based parsers such as
SAX provide a view of XML documents that is data centric and event driven. When
a user reads an XML document using SAX, elements that are encountered by the
parser are read, processed, and then forgotten. The event-based parser reads
the elements from the document and returns them to the application with a list
of attributes and content. By taking this approach, a user obtains a more
efficient means of processing XML documents because the search time is greatly
optimized, requiring less code and memory. The primary reason for this is that
an in-memory tree representation of the XML document is not required.
Event-based APIs merely report parsing events such as the start and end of XML
markup, which are processed by application event han-dlers through callbacks.
This mechanism is widely used in many “process-and-forget” systems and is
especially appropriate for XML-based messaging and transaction systems, where
keeping the XML tree in memory is simply not appropriate.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.