Document Type Definitions (DTDs)
Document Type Definitions (DTDs) are one of two main types of documents you can use to specify XML document structure. Section 14.6 presents W3C XML Schema docu-ments, which provide an improved method of specifying XML document structure.
Creating a Document Type Definition
Figure 14.4 presented a simple business letter marked up with XML. Recall that line 5 of letter.xml references a DTD—letter.dtd (Fig. 14.9). This DTD specifies the business letter’s element types and attributes, and their relationships to one another.
A DTD describes the structure of an XML document and enables an XML parser to verify whether an XML document is valid (i.e., whether its elements contain the proper attributes and appear in the proper sequence). DTDs allow users to check document struc-ture and to exchange data in a standardized format. A DTD expresses the set of rules for document structure using an EBNF (Extended Backus-Naur Form) grammar. DTDs are not themselves XML documents. [Note: EBNF grammars are commonly used to define programming languages. To learn more about EBNF grammars, visit en.wikipedia.org/
wiki/EBNF or www.garshol.priv.no/download/text/bnf.html.]
1 <!-- Fig. 14.9: letter.dtd -->
2 <!-- DTD document for letter.xml -->
4 <!ELEMENT letter ( contact+, salutation, paragraph+,
5 closing, signature )>
7 <!ELEMENT contact ( name, address1, address2, city, state,
8 zip, phone, flag )>
9 <!ATTLIST contact type CDATA #IMPLIED>
11 <!ELEMENT name ( #PCDATA )>
12 <!ELEMENT address1 ( #PCDATA )>
13 <!ELEMENT address2 ( #PCDATA )>
14 <!ELEMENT city ( #PCDATA )>
15 <!ELEMENT state ( #PCDATA )>
16 <!ELEMENT zip ( #PCDATA )>
17 <!ELEMENT phone ( #PCDATA )>
18 <!ELEMENT flag EMPTY>
19 <!ATTLIST flag gender (M | F) "M">
21 <!ELEMENT salutation ( #PCDATA )>
22 <!ELEMENT closing ( #PCDATA )>
23 <!ELEMENT paragraph ( #PCDATA )>
24 <!ELEMENT signature ( #PCDATA )>
Fig. 14.9 | Document Type Definition (DTD) for a business letter.
Defining Elements in a DTD
The ELEMENT element type declaration in lines 4–5 defines the rules for element letter. In this case, letter contains one or more contact elements, one salutation element, one or more paragraph elements, one closing element and one signature element, in that sequence. The plus sign (+) occurrence indicator specifies that the DTD requires one or more occurrences of an element. Other occurence indicators include the asterisk (*), which indicates an optional element that can occur zero or more times, and the question mark (?), which indicates an optional element that can occur at most once (i.e., zero or one occurrence). If an element does not have an occurrence indicator, the DTD requires exactly one occurrence.
The contact element type declaration (lines 7–8) specifies that a contact element contains child elements name, address1, address2, city, state, zip, phone and flag—in that order. The DTD requires exactly one occurrence of each of these elements.
Defining Attributes in a DTD
Line 9 uses the ATTLIST attribute-list declaration to define an attribute named type for the contact element. Keyword #IMPLIED specifies that if the parser finds a contact ele-ment without a type attribute, the parser can choose an arbitrary value for the attribute or can ignore the attribute. Either way the document will still be valid (if the rest of the doc-ument is valid)—a missing type attribute will not invalidate the document. Other key-words that can be used in place of #IMPLIED in an ATTLIST declaration include #REQUIRED and #FIXED. Keyword #REQUIRED specifies that the attribute must be present in the ele-ment, and keyword #FIXED specifies that the attribute (if present) must have the given fixed value. For example,
<!ATTLIST address zip CDATA #FIXED "01757">
indicates that attribute zip (if present in element address) must have the value 01757 for the document to be valid. If the attribute is not present, then the parser, by default, uses the fixed value that the ATTLIST declaration specifies.
Character Data vs. Parsed Character Data
Keyword CDATA (line 9) specifies that attribute type contains character data (i.e., a string). A parser will pass such data to an application without modification.
Keyword #PCDATA (line 11) specifies that an element (e.g., name) may contain parsed character data (i.e., data that is processed by an XML parser). Elements with parsed char-acter data cannot contain markup characters, such as less than (<), greater than (>) or ampersand (&). The document author should replace any markup character in a #PCDATA element with the character’s corresponding character entity reference. For example, the character entity reference < should be used in place of the less-than symbol (<), and the character entity reference > should be used in place of the greater-than symbol (>). A document author who wishes to use a literal ampersand should use the entity reference & instead—parsed character data can contain ampersands (&) only for inserting enti-ties. See Appendix A, XHTML Special Characters, for a list of other character entity ref-erences.
Defining Empty Elements in a DTD
Line 18 defines an empty element named flag. Keyword EMPTY specifies that the element does not contain any data between its start and end tags. Empty elements commonly de-scribe data via attributes. For example, flag’s data appears in its gender attribute (line 19). Line 19 specifies that the gender attribute’s value must be one of the enumerated values (M or F) enclosed in parentheses and delimited by a vertical bar (|) meaning “or.” Note that line 19 also indicates that gender has a default value of M.
Well-Formed Documents vs. Valid Documents
In Section 14.3, we demonstrated how to use the Microsoft XML Validator to validate an XML document against its specified DTD. The validation revealed that the XML docu-ment letter.xml (Fig. 14.4) is well-formed and valid—it conforms to letter.dtd (Fig. 14.9). Recall that a well-formed document is syntactically correct (i.e., each start tag has a corresponding end tag, the document contains only one root element, etc.), and a valid document contains the proper elements with the proper attributes in the proper se-quence. An XML document cannot be valid unless it is well-formed.
When a document fails to conform to a DTD or a schema, the Microsoft XML Val-idator displays an error message. For example, the DTD in Fig. 14.9 indicates that a con-tact element must contain the child element name. A document that omits this child element is still well-formed, but is not valid. In such a scenario, Microsoft XML Validator displays the error message shown in Fig. 14.10.
Copyright © 2018-2020 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.