XML
Content Models
Because elements, attributes,
and content are the most important parts of the XML docu-ment, figuring out the
restrictions on how those elements and attributes can be created, modified, or
removed from a document is of extreme importance. Should an XML docu-ment
creator allow additional, unforeseen elements to be added to the document in an
arbitrary fashion, or should the creator restrict elements to only those that
are allowed by the DTD or XML Schema? These questions are the main concepts
behind the use of XML content models. A content model provides a framework
around which the extensi-bility features of XML can be taken advantage of, if
at all. At the very least, the model provides an indication of the intent of
the document creator as to the explicit extensibility of the document, because
users can extend a document using an internal DTD subset if they are so
inclined. However, by doing so, the users are “overriding” the content model as
intended by the document creator.
An “open” content model
enables a user to add additional elements and attributes to a document without
them having to be explicitly declared in a DTD or schema. In an open content
model, users can take full advantage of the extensibility of XML without having
to make changes to a DTD. As expected, the use of a DTD precludes an open
content model. In fact, you cannot have an open content model when using a DTD,
except if a user chooses to override the DTD by using an internal DTD subset.
However, new schema formats, such as XML Schema, provide this mechanism. Also,
the use of an open content model isn’t completely freeform. For example, you
cannot add or remove content that will result in the existing content model
being broken. In an open content model, all required elements must be present,
but it is not invalid for additional elements to also be present. This means
that content must follow the rules of the schema before extensibility features
can be taken advantage of. If these rules are not followed, XML validation will
fail. In addition, you can add undeclared XML elements in an open con-tent
model as long as they are defined in a different namespace. By definition,
well-formed XML documents that have no validity constraints are open content
models.
On the other hand, a “closed”
content model restricts elements and attributes to only those that are
specified in the DTD or schema. By definition, a DTD is a closed content model
because it describes what may exclusively appear in the content of the element.
In a closed model, the XML document creator maintains strict control of
specifically which elements and attributes as well as the order that markup may
appear in a given compliant document. Closed models are helpful when you’re
enforcing strict document exchange and provide a means to guarantee that
incoming data complies with data requirements.
A more focused content model
is a “mixed” content model, which enables individual ele-ments to allow an
arbitrary mixture of text and additional elements. These mixed ele-ments are
useful when freeform fields, with possible XML or other markup data are to be
included. This allows the majority of the document to remain closed while
portions of the document are noted as extensible. Mixed models represent a good
compromise that can allow for strictness while providing a limited means for
extensibility.
Handling
Whitespace in XML
Whitespace is the term used for character spaces, tabs, linefeeds, and
carriage returns in documents. Issues
around the handling of these seemingly “invisible” characters are important for
many reasons. It is hard to tell whether whitespace should be ignored or passed
“as is” to documents. Listing 2.10 illustrates our shirt example with various
whitespace issues.
LISTING 2.10 Shirt
Example with Whitespace
<?xml version=”1.0”?>
<!DOCTYPE shirt SYSTEM
“shirt.dtd”>
<shirt>
<model>Zippy
Tee</model> <brand>Tommy
Hilbunger</brand>
<price
currency=”USD”>14.99</price> <on_sale/>
<fabric
content=”60%”>cotton</fabric>
<fabric
content=”40%”>polyester</fabric>
<options>
<colorOptions>
<color>red</color>
<color>white</color>
</colorOptions>
<sizeOptions>
<size>Medium</size>
<size>Large</size>
</sizeOptions>
</options>
<description>
This is
a <b>funky</b> Tee
shirt similar
to the
Floppy Tee shirt
</description> </shirt>
Are these various whitespace
issues significant? The whitespace between the initial <shirt> element and the <model> element may not be
significant, but the whitespace within the <description> tag might be. How are we to know?
It turns out that the only
way XML processors can determine whether whitespace is sig-nificant is by
knowing the content model of the XML document. Basically, in a mixed content
model, whitespace is significant because the application is not sure as to
whether or not the whitespace will be used in processing, but in an open or
closed model, it is not. However, the rule for XML processors is that they must
pass all characters that are not markup intact to the application. Validating
processors also inform applications about the significance of the various
whitespace characters. In addition, a special attribute called xml:space with the value preserve or default can be used to explicitly
indicate that the whitespace contained within the element is significant. For
example, xml:space=’preserve’ indicates that all whitespace contained in the
element is signifi-cant. Of course, the xml:space attribute must be defined in the DTD as an
enumerated type with only those two values.
Also, XML processors simplify
cross-platform portability issues by normalizing all end-of-line characters to
the single linefeed character “&#A;”.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.