of XML Structure
We have explored the
structure of XML documents, but there are various rules that XML documents must
comply with in order for them to be appropriately processed and parsed. Some of
these rules enforce the hierarchical, structured nature of XML, whereas others
impose restrictions to simplify the task of XML processing for applications.
XML Elements Must Have a Closing Tag
Even though other markup
languages such as HTML allow their markup tags to remain “open” or contain only
a beginning element tag, XML requires all tags to be closed. They can be closed
by matching a beginning element tag with a closing tag, or they can be closed
by the use of empty elements. In either case, no tag may be left unclosed.
Listing 2.11 shows this incorrect use of XML.
LISTING 2.11 Incorrect
XML Due to Unclosed Tags
<markup>This is not
valid XML <markup>Since there is no closing tag
Tags Are Case Sensitive
In XML, the use of capitalization
is incredibly important. XML elements and attributes are case sensitive. This
means that differences in capitalization will be interpreted as dif-ferent
elements or attributes. This differs from HTML, where tags are not case
sensitive and arbitrary capitalization is allowed. In XML, the elements <shirt> and <Shirt> are as different as <egg> and <house>. Listing 2.12 shows an
example of the incorrect matching of element capitalization.
LISTING 2.12 Incorrect
XML Due to Capitalization Mismatch
tags are very
XML Elements Must Have Proper Nesting
Unlike languages such as
HTML, XML requires that elements be nested in proper hierar-chical order. Tags
must be closed in the reverse order in which they are opened. A proper analogy
is to think of XML tags as envelopes. There must never be a case where one
envelope is closed when an envelope contained within it is still open. Listing
2.13 shows an incorrect nesting order of XML elements.
LISTING 2.13 Incorrect
XML Due to Improper Element Nesting
are improperly nested</oxygen></nitrogen>
XML Documents Must Contain a
XML documents must contain a
single root element—no less, and certainly no more. All other elements in the
XML document are then nested within this root element. Once the root element is
defined, any number of child elements can branch off it as long as they follow
the other rules mentioned in this section. The root element is the most important
one in the document because it contains all the other elements and reflects the
document type as declared in the Document Type Declaration. Root elements can
be listed only once and not repeated, nor can there be multiple, different root
elements. Listing 2.14 illustrates the improper use of root elements.
LISTING 2.14 Incorrect
XML Due to Multiple Root Elements
Values Must Be Quoted
When attributes are used
within XML elements, their values must be surrounded by quotes. Although most
systems accept single or double quotes for attribute values, it is generally
accepted to use double quotes around attribute values. If you need to use the
quote literal within an attribute value, you can use the quote entity " or ' to insert the required quote
character. Listing 2.15 illustrates the improper use of non-quoted attributes.
LISTING 2.15 Incorrect
XML Due to Improper Quoting of Attributes
May Only Appear Once in the Same Start Tag
Even though attributes may be
optional, when they are present, they can only appear once. This simple
restriction prevents ambiguity when multiple, conflicting attribute name/value
pairs are present. By only allowing a single attribute name/value pair to be
present, the system avoids any conflicts or other errors. Listing 2.16 shows
the improper use of multiple attributes within a single element.
LISTING 2.16 Incorrect
XML Due to Multiple Attribute Names in Start Tag
Values Cannot Contain References to External Entities
Although external entities
may be allowed for general markup text, attribute values can-not contain
references to external entities. However, attribute values can make use of
internally defined entities and generally available entities, such as < and ".
Entities Except amp, lt, gt, apos, and quot
Must Be Declared Before They Are Used
Although this goes without
saying, entities cannot be used before they are properly declared. Referring to
an undeclared entity would obviously result in an XML document that is not well
formed and proper. However, there are a number of entities that can be
assumed to be defined by XML
processors. So far, these are limited to the entities
<, >, ', and ".
Rules of XML Structure
Other rules exist for
well-formed XML. For example, binary entities cannot be refer-enced in the
general content of an XML document. Rather, these entities can only be used in
an attribute declared as ENTITY or ENTITIES. Also, text and parameter entities are not allowed to be directly
or indirectly recursive, and the replacement text for all parame-ter entities
referenced inside a markup declaration must be complete markup declarations.