In this section and throughout this chapter, we create our own XML markup. XML allows you to describe data precisely in a well-structured format.
XML Markup for an Article
In Fig. 14.2, we present an XML document that marks up a simple article using XML. The line numbers shown are for reference only and are not part of the XML document.
This document begins with an XML declaration (line 1), which identifies the docu-ment as an XML document. The version attribute specifies the XML version to which the document conforms. The current XML standard is version 1.0. Though the W3C released a version 1.1 specification in February 2004, this newer version is not yet widely supported. The W3C may continue to release new versions as XML evolves to meet the requirements of different fields.
1 <?xml version = "1.0"?>
3 <!-- Fig. 14.2: article.xml -->
4 <!-- Article structured with XML -->
6 <title>Simple XML</title>
7 <date>July 4, 2007</date>
12 <summary>XML is pretty easy.</summary>
13 <content>This chapter presents examples that use XML.</content>
Fig. 14.2 | XML used to mark up an article.
As in most markup languages, blank lines (line 2), white spaces and indentation help improve readability. Blank lines are normally ignored by XML parsers. XML comments (lines 3–4), which begin with <!-- and end with -->, can be placed almost anywhere in an XML document and can span multiple lines. There must be exactly one end marker (-->) for each begin marker (<!--).
In Fig. 14.2, article (lines 5–14) is the root element. The lines that precede the root element (lines 1–4) are the XML prolog. In an XML prolog, the XML declaration must appear before the comments and any other markup.
The elements we use in the example do not come from any specific markup language. Instead, we chose the element names and markup structure that best describe our partic-ular data. You can invent elements to mark up your data. For example, element title (line 6) contains text that describes the article’s title (e.g., Simple XML). Similarly, date (line 7), author (lines 8–11), firstName (line 9), lastName (line 10), summary (line 12) and content (line 13) contain text that describes the date, author, the author’s first name, the author’s last name, a summary and the content of the document, respectively. XML ele-ment names can be of any length and may contain letters, digits, underscores, hyphens and periods. However, they must begin with either a letter or an underscore, and they should not begin with “xml” in any combination of uppercase and lowercase letters (e.g., XML, Xml, xMl), as this is reserved for use in the XML standards.
XML elements are nested to form hierarchies—with the root element at the top of the hierarchy. This allows document authors to create parent/child relationships between
data. For example, elements title, date, author, summary and content are nested within article. Elements firstName and lastName are nested within author. We discuss the hierarchy of Fig. 14.2 later in this chapter (Fig. 14.25).
Any element that contains other elements (e.g., article or author) is a container ele-ment. Container elements also are called parent elements. Elements nested inside a con-tainer element are child elements (or children) of that container element. If those child elements are at the same nesting level, they are siblings of one another.
Viewing an XML Document in Internet Explorer and Firefox
The XML document in Fig. 14.2 is simply a text file named article.xml. This document does not contain formatting information for the article. This is because XML is a technol-ogy for describing the structure of data. Formatting and displaying data from an XML document are application-specific issues. For example, when the user loads article.xml in Internet Explorer, MSXML (Microsoft XML Core Services) parses and displays the document’s data. Firefox has a similar capability. Each browser has a built-in style sheet to format the data. Note that the resulting format of the data (Fig. 14.3) is similar to the format of the listing in Fig. 14.2. In Section 14.8, we show how to create style sheets to transform your XML data into various formats suitable for display.
Note the minus sign (–) and plus sign (+) in the screen shots of Fig. 14.3. Although these symbols are not part of the XML document, both browsers place them next to every container element. A minus sign indicates that the browser is displaying the container ele-ment’s child elements. Clicking the minus sign next to an element collapses that element
Fig. 14.3 | article.xml displayed by Internet Explorer 7 and Firefox 2.
(i.e., causes the browser to hide the container element’s children and replace the minus sign with a plus sign). Conversely, clicking the plus sign next to an element expands that element (i.e., causes the browser to display the container element’s children and replace the plus sign with a minus sign). This behavior is similar to viewing the directory structure on your system in Windows Explorer or another similar directory viewer. In fact, a direc-tory structure often is modeled as a series of tree structures, in which the root of a tree rep-resents a disk drive (e.g., C:), and nodes in the tree represent directories. Parsers often store XML data as tree structures to facilitate efficient manipulation, as discussed in Section 14.9.
[Note: In Windows XP Service Pack 2 and Windows Vista, by default Internet Explorer displays all the XML elements in expanded view, and clicking the minus sign (Fig. 14.3(a)) does not do anything. To enable collapsing and expanding, right click the Information Bar that appears just below the Address field and select Allow Blocked Con-tent.... Then click Yes in the pop-up window that appears.]
XML Markup for a Business Letter
Now that you’ve seen a simple XML document, let’s examine a more complex XML doc-ument that marks up a business letter (Fig. 14.4). Again, we begin the document with the XML declaration (line 1) that states the XML version to which the document conforms.
1 <?xml version = "1.0"?>
3 <!-- Fig. 14.4: letter.xml -->
4 <!-- Business letter marked up as XML -->
5 <!DOCTYPE letter SYSTEM "letter.dtd">
8 <contact type = "sender">
9 <name>Jane Doe</name>
10 <address1>Box 12345</address1>
11 <address2>15 Any Ave.</address2>
16 <flag gender = "F" />
19 <contact type = "receiver">
20 <name>John Doe</name>
21 <address1>123 Main St.</address1>
27 <flag gender = "M" />
30 <salutation>Dear Sir:</salutation>
32 <paragraph>It is our privilege to inform you about our new database
33 managed with XML. This new system allows you to reduce the
34 load on your inventory list server by having the client machine
35 perform the work of sorting and filtering the data.
38 <paragraph>Please visit our website for availability and pricing.
42 <signature>Ms. Jane Doe</signature>
Fig. 14.4 | Business letter marked up as XML.
Line 5 specifies that this XML document references a DTD. Recall from Section 14.2 that DTDs define the structure of the data for an XML document. For example, a DTD specifies the elements and parent/child relationships between elements permitted in an XML document.
The DOCTYPE reference (line 5) contains three items, the name of the root element that the DTD specifies (letter); the keyword SYSTEM (which denotes an external DTD—a DTD declared in a separate file, as opposed to a DTD declared locally in the same file); and the DTD’s name and location (i.e., letter.dtd in the current directory; this could also be a fully qualified URL). DTD document filenames typically end with the .dtd extension. We discuss DTDs and letter.dtd in detail in Section 14.5.
Several tools (many of which are free) validate documents against DTDs (discussed in Section 14.5) and schemas (discussed in Section 14.6). Microsoft’s XML Validator is available free of charge from the Download sample link at
This validator can validate XML documents against both DTDs and schemas. To install it, run the downloaded executable file xml_validator.exe and follow the steps to com-plete the installation. Once the installation is successful, open the validate_js.htm file located in your XML Validator installation directory in IE to validate your XML docu-ments. We installed the XML Validator at C:\XMLValidator (Fig. 14.5). The output (Fig. 14.6) shows the results of validating the document using Microsoft’s XML Validator. You can click a node to expand it and see its contents. Visit www.w3.org/XML/Schema for a list of additional validation tools.
Root element letter (lines 7–43 of Fig. 14.4) contains the child elements contact, contact, salutation, paragraph, paragraph, closing and signature. Data can be placed between an elements’ tags or as attributes—name/value pairs that appear within the angle brackets of an element’s start tag. Elements can have any number of attributes (separated by spaces) in their start tags. The first contact element (lines 8–17) has an attribute named type with attribute value "sender", which indicates that this contact element identifies the letter’s sender. The second contact element (lines 19–28) has attribute type with value "receiver", which indicates that this contact element identifies the letter’s recipient. Like element names, attribute names are case sensitive, can be any
length, may contain letters, digits, underscores, hyphens and periods, and must begin with either a letter or an underscore character. A contact element stores various items of infor-mation about a contact, such as the contact’s name (represented by element name), address (represented by elements address1, address2, city, state and zip), phone number (rep-resented by element phone) and gender (represented by attribute gender of element flag). Element salutation (line 30) marks up the letter’s salutation. Lines 32–39 mark up the letter’s body using two paragraph elements. Elements closing (line 41) and signature (line 42) mark up the closing sentence and the author’s “signature,” respectively.
Line 16 introduces the empty element flag. An empty element is one that does not have any content. Instead, an empty element sometimes places data in attributes. Empty element flag has one attribute that indicates the gender of the contact (represented by the parent contact element). Document authors can close an empty element either by placing a slash immediately preceding the right angle bracket, as shown in line 16, or by explicitly writing an end tag, as in line 22
Note that the address2 element in line 22 is empty because there is no second part to this contact’s address. However, we must include this element to conform to the structural rules specified in the XML document’s DTD—letter.dtd (which we present in Section 14.5). This DTD specifies that each contact element must have an address2 child element (even if it is empty). In Section 14.5, you will learn how DTDs indicate re-quired and optional elements.
Copyright © 2018-2020 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.