XML Hierarchical (Tree) Data Model
We now introduce the data model used in XML. The basic object in XML is
the XML document. Two main structuring concepts are used to construct an XML
document: elements and attributes. It is important to note
that the term attribute in XML is not
used in the same manner as is customary in database terminology, but rather as
it is used in document description languages such as HTML and SGML. Attributes in XML provide additional
information that describes elements, as we will see. There are additional
concepts in XML, such as entities, identifiers, and references, but first we
concentrate on describing elements and attributes to show the essence of the
XML model.
Figure 12.3 shows an example of an XML element called <Projects>. As in HTML, elements are identified in a document by their start
tag and end tag. The tag names are enclosed between angled brackets < ...
>, and end tags are further identified by a slash, </ ... >.
<?xml version= “1.0”
standalone=“yes”?> <Projects>
<Project>
<Name>ProductX</Name>
<Number>1</Number>
<Location>Bellaire</Location>
<Dept_no>5</Dept_no> <Worker>
<Ssn>123456789</Ssn>
<Last_name>Smith</Last_name> <Hours>32.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<First_name>Joyce</First_name> <Hours>20.0</Hours>
</Worker>
</Project>
<Project>
<Name>ProductY</Name>
<Number>2</Number>
<Location>Sugarland</Location>
<Dept_no>5</Dept_no> <Worker>
<Ssn>123456789</Ssn>
<Hours>7.5</Hours>
</Worker>
<Worker>
<Ssn>453453453</Ssn>
<Hours>20.0</Hours>
</Worker>
<Worker>
<Ssn>333445555</Ssn>
<Hours>10.0</Hours>
</Worker>
</Project>
...
</Projects>
Figure 12.3 A complex XML element called <Projects>
Complex elements are constructed from other elements hierarchically, whereas simple elements contain data values. A
major difference between XML and HTML is
that XML tag names are defined to describe the meaning of the data elements in
the document, rather than to describe how the text is to be displayed. This
makes it possible to process the data elements in the XML document
automatically by computer programs. Also, the XML tag (element) names can be
defined in another document, known as the schema
document, to give a semantic meaning to the tag names that can be exchanged
among multiple users. In HTML, all tag names are predefined and fixed; that is
why they are not extendible.
It is straightforward to see the correspondence between the XML textual
representation shown in Figure 12.3 and the tree structure shown in Figure
12.1. In the tree representation, internal nodes represent complex elements,
whereas leaf nodes rep-resent simple elements. That is why the XML model is
called a tree model or a hierarchical model. In Figure 12.3, the
simple elements are the ones with the tag
names <Name>, <Number>, <Location>, <Dept_no>, <Ssn>, <Last_name>,
<First_name>, and
<Hours>. The complex elements are the
ones with the tag names <Projects>, <Project>, and <Worker>. In general, there is no limit on the levels of nesting of elements.
It is possible to characterize three main types of XML documents:
Data-centric XML documents. These
documents have many small data items
that follow a specific structure and hence may be extracted from a structured
database. They are formatted as XML documents in order to exchange them over or
display them on the Web. These usually follow a predefined schema that defines the tag names.
Document-centric XML documents. These are
documents with large amounts of text,
such as news articles or books. There are few or no struc-tured data elements
in these documents.
Hybrid XML documents. These
documents may have parts that contain structured
data and other parts that are predominantly textual or unstruc-tured. They may
or may not have a predefined schema.
XML documents that do not follow a predefined schema of element names
and cor-responding tree structure are known as schemaless XML documents. It is important to note that
datacentric XML documents can be considered either as semistructured data or
as structured data as defined in Section 12.1. If an XML document conforms to
a predefined XML schema or DTD (see Section 12.3), then the document can be
considered as structured data. On the
other hand, XML allows documents that do not conform to any schema; these would
be considered as semistructured data and
are schemaless XML documents. When
the value of the standalone attribute in an XML document is yes, as in
the first line in Figure 12.3, the document
is standalone and schemaless.
XML attributes are generally used in a manner similar to how they are
used in HTML (see Figure 12.2), namely, to describe properties and
characteristics of the elements (tags) within which they appear. It is also
possible to use XML attributes to hold the values of simple data elements;
however, this is generally not recommended. An exception to this rule is in
cases that need to reference another
element in another part of the XML document. To do this, it is common to use
attribute values in one element as the references. This resembles the concept
of foreign keys in relational databases, and is a way to get around the strict
hierarchical model that the XML tree model implies. We discuss XML attributes
further in Section 12.3 when we discuss XML schema and DTD.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.