Storing and Extracting XML Documents from Databases
Several approaches to organizing the contents of XML documents to
facilitate their subsequent querying and retrieval have been proposed. The
following are the most common approaches:
Using a DBMS to store the documents as text. A relational or object DBMS can
be used to store whole XML documents as text fields within the DBMS records or
objects. This approach can be used if the DBMS has a special module for
document processing, and would work for storing schemaless and documentcentric
Using a DBMS to store the document contents as data elements. This approach would work for
storing a collection of documents that follow a specific XML DTD or XML schema.
Because all the documents have the same structure, one can design a relational
(or object) database to store the leaf-level data elements within the XML
documents. This approach would require mapping algorithms to design a database
schema that is compatible with the XML document structure as specified in the
XML schema or DTD and to recreate the XML documents from the stored data. These
algorithms can be implemented either as an internal DBMS module or as separate
middleware that is not part of the DBMS.
Designing a specialized system for storing native XML data. A new type of database system
based on the hierarchical (tree) model could be designed and implemented. Such
systems are being called Native XML DBMSs.
The system would include specialized indexing and querying techniques, and
would work for all types of XML documents. It could also include data
compression techniques to reduce the size of the documents for storage. Tamino
by Software AG and the Dynamic Application Platform of eXcelon are two popular
products that offer native XML DBMS capability. Oracle also offers a native XML
Creating or publishing customized
XML documents from preexisting relational databases. Because there are enormous amounts of data already stored in relational databases, parts of this data may need to be
formatted as documents for exchanging or displaying over the Web. This approach
would use a separate middleware software layer to handle the conversions needed
between the XML documents and the relational database. Section 12.6 dis-cusses
this approach, in which datacentric XML documents are extracted from existing
databases, in more detail. In particular, we show how tree structured documents
can be created from graph-structured databases. Section 12.6.2 discusses the
problem of cycles and how to deal with it.
All of these approaches have received considerable attention. We focus
on the fourth approach in Section 12.6, because it gives a good conceptual
understanding of the differences between the XML tree data model and the
traditional database models based on flat files (relational model) and graph
representations (ER model). But first we give an overview of XML query
languages in Section 12.5.