XML Content Models
Because elements, attributes, and content are the most important parts of the XML docu-ment, figuring out the restrictions on how those elements and attributes can be created, modified, or removed from a document is of extreme importance. Should an XML docu-ment creator allow additional, unforeseen elements to be added to the document in an arbitrary fashion, or should the creator restrict elements to only those that are allowed by the DTD or XML Schema? These questions are the main concepts behind the use of XML content models. A content model provides a framework around which the extensi-bility features of XML can be taken advantage of, if at all. At the very least, the model provides an indication of the intent of the document creator as to the explicit extensibility of the document, because users can extend a document using an internal DTD subset if they are so inclined. However, by doing so, the users are “overriding” the content model as intended by the document creator.
An “open” content model enables a user to add additional elements and attributes to a document without them having to be explicitly declared in a DTD or schema. In an open content model, users can take full advantage of the extensibility of XML without having to make changes to a DTD. As expected, the use of a DTD precludes an open content model. In fact, you cannot have an open content model when using a DTD, except if a user chooses to override the DTD by using an internal DTD subset. However, new schema formats, such as XML Schema, provide this mechanism. Also, the use of an open content model isn’t completely freeform. For example, you cannot add or remove content that will result in the existing content model being broken. In an open content model, all required elements must be present, but it is not invalid for additional elements to also be present. This means that content must follow the rules of the schema before extensibility features can be taken advantage of. If these rules are not followed, XML validation will fail. In addition, you can add undeclared XML elements in an open con-tent model as long as they are defined in a different namespace. By definition, well-formed XML documents that have no validity constraints are open content models.
On the other hand, a “closed” content model restricts elements and attributes to only those that are specified in the DTD or schema. By definition, a DTD is a closed content model because it describes what may exclusively appear in the content of the element. In a closed model, the XML document creator maintains strict control of specifically which elements and attributes as well as the order that markup may appear in a given compliant document. Closed models are helpful when you’re enforcing strict document exchange and provide a means to guarantee that incoming data complies with data requirements.
A more focused content model is a “mixed” content model, which enables individual ele-ments to allow an arbitrary mixture of text and additional elements. These mixed ele-ments are useful when freeform fields, with possible XML or other markup data are to be included. This allows the majority of the document to remain closed while portions of the document are noted as extensible. Mixed models represent a good compromise that can allow for strictness while providing a limited means for extensibility.
Handling Whitespace in XML
Whitespace is the term used for character spaces, tabs, linefeeds, and carriage returns in documents. Issues around the handling of these seemingly “invisible” characters are important for many reasons. It is hard to tell whether whitespace should be ignored or passed “as is” to documents. Listing 2.10 illustrates our shirt example with various whitespace issues.
LISTING 2.10 Shirt Example with Whitespace
<!DOCTYPE shirt SYSTEM “shirt.dtd”>
<model>Zippy Tee</model> <brand>Tommy
<price currency=”USD”>14.99</price> <on_sale/>
This is a <b>funky</b> Tee shirt similar
Floppy Tee shirt </description> </shirt>
Are these various whitespace issues significant? The whitespace between the initial <shirt> element and the <model> element may not be significant, but the whitespace within the <description> tag might be. How are we to know?
It turns out that the only way XML processors can determine whether whitespace is sig-nificant is by knowing the content model of the XML document. Basically, in a mixed content model, whitespace is significant because the application is not sure as to whether or not the whitespace will be used in processing, but in an open or closed model, it is not. However, the rule for XML processors is that they must pass all characters that are not markup intact to the application. Validating processors also inform applications about the significance of the various whitespace characters. In addition, a special attribute called xml:space with the value preserve or default can be used to explicitly indicate that the whitespace contained within the element is significant. For example, xml:space=’preserve’ indicates that all whitespace contained in the element is signifi-cant. Of course, the xml:space attribute must be defined in the DTD as an enumerated type with only those two values.
Also, XML processors simplify cross-platform portability issues by normalizing all end-of-line characters to the single linefeed character “&#A;”.