XML Hierarchical (Tree) Data Model
We now introduce the data model used in XML. The basic object in XML is the XML document. Two main structuring concepts are used to construct an XML document: elements and attributes. It is important to note that the term attribute in XML is not used in the same manner as is customary in database terminology, but rather as it is used in document description languages such as HTML and SGML. Attributes in XML provide additional information that describes elements, as we will see. There are additional concepts in XML, such as entities, identifiers, and references, but first we concentrate on describing elements and attributes to show the essence of the XML model.
Figure 12.3 shows an example of an XML element called <Projects>. As in HTML, elements are identified in a document by their start tag and end tag. The tag names are enclosed between angled brackets < ... >, and end tags are further identified by a slash, </ ... >.
<?xml version= “1.0” standalone=“yes”?> <Projects>
<Location>Bellaire</Location> <Dept_no>5</Dept_no> <Worker>
<Ssn>123456789</Ssn> <Last_name>Smith</Last_name> <Hours>32.5</Hours>
<Ssn>453453453</Ssn> <First_name>Joyce</First_name> <Hours>20.0</Hours>
<Location>Sugarland</Location> <Dept_no>5</Dept_no> <Worker>
Figure 12.3 A complex XML element called <Projects>
Complex elements are constructed from other elements hierarchically, whereas simple elements contain data values. A major difference between XML and HTML is that XML tag names are defined to describe the meaning of the data elements in the document, rather than to describe how the text is to be displayed. This makes it possible to process the data elements in the XML document automatically by computer programs. Also, the XML tag (element) names can be defined in another document, known as the schema document, to give a semantic meaning to the tag names that can be exchanged among multiple users. In HTML, all tag names are predefined and fixed; that is why they are not extendible.
It is straightforward to see the correspondence between the XML textual representation shown in Figure 12.3 and the tree structure shown in Figure 12.1. In the tree representation, internal nodes represent complex elements, whereas leaf nodes rep-resent simple elements. That is why the XML model is called a tree model or a hierarchical model. In Figure 12.3, the simple elements are the ones with the tag names <Name>, <Number>, <Location>, <Dept_no>, <Ssn>, <Last_name>,
<First_name>, and <Hours>. The complex elements are the ones with the tag names <Projects>, <Project>, and <Worker>. In general, there is no limit on the levels of nesting of elements.
It is possible to characterize three main types of XML documents:
Data-centric XML documents. These documents have many small data items that follow a specific structure and hence may be extracted from a structured database. They are formatted as XML documents in order to exchange them over or display them on the Web. These usually follow a predefined schema that defines the tag names.
Document-centric XML documents. These are documents with large amounts of text, such as news articles or books. There are few or no struc-tured data elements in these documents.
Hybrid XML documents. These documents may have parts that contain structured data and other parts that are predominantly textual or unstruc-tured. They may or may not have a predefined schema.
XML documents that do not follow a predefined schema of element names and cor-responding tree structure are known as schemaless XML documents. It is important to note that datacentric XML documents can be considered either as semistructured data or as structured data as defined in Section 12.1. If an XML document conforms to a predefined XML schema or DTD (see Section 12.3), then the document can be considered as structured data. On the other hand, XML allows documents that do not conform to any schema; these would be considered as semistructured data and are schemaless XML documents. When the value of the standalone attribute in an XML document is yes, as in the first line in Figure 12.3, the document is standalone and schemaless.
XML attributes are generally used in a manner similar to how they are used in HTML (see Figure 12.2), namely, to describe properties and characteristics of the elements (tags) within which they appear. It is also possible to use XML attributes to hold the values of simple data elements; however, this is generally not recommended. An exception to this rule is in cases that need to reference another element in another part of the XML document. To do this, it is common to use attribute values in one element as the references. This resembles the concept of foreign keys in relational databases, and is a way to get around the strict hierarchical model that the XML tree model implies. We discuss XML attributes further in Section 12.3 when we discuss XML schema and DTD.