XML permits document authors to create markup (i.e., a text-based notation for describing data) for virtually any type of information. This enables document authors to create entirely new markup languages for describing any type of data, such as mathematical for-mulas, software-configuration instructions, chemical molecular structures, music, news, recipes and financial reports. XML describes data in a way that both human beings and computers can understand.
Figure 14.1 is a simple XML document that describes information for a baseball player. We focus on lines 5–9 to introduce basic XML syntax. You will learn about the other elements of this document in Section 14.3.
1 <?xml version = "1.0"?>
3 <!-- Fig. 14.1: player.xml -->
4 <!-- Baseball player structured with XML -->
Fig. 14.1 | XML that describes a baseball player’s information.
XML documents contain text that represents content (i.e., data), such as John (line 6 of Fig. 14.1), and elements that specify the document’s structure, such as firstName (line 6 of Fig. 14.1). XML documents delimit elements with start tags and end tags. A start tag consists of the element name in angle brackets (e.g., <player> and <firstName> in lines 5 and 6, respectively). An end tag consists of the element name preceded by a forward slash (/) in angle brackets (e.g., </firstName> and </player> in lines 6 and 9, respec-tively). An element’s start and end tags enclose text that represents a piece of data (e.g., the player’s firstName—John—in line 6, which is enclosed by the <firstName> start tag and </firstName> end tag). Every XML document must have exactly one root element that contains all the other elements. In Fig. 14.1, the root element is player (lines 5–9).
XML-based markup languages—called XML vocabularies—provide a means for describing particular types of data in standardized, structured ways. Some XML vocabu-laries include XHTML (Extensible HyperText Markup Language), MathML (for mathe-matics), VoiceXML™ (for speech), CML (Chemical Markup Language—for chemistry), XBRL (Extensible Business Reporting Language—for financial data exchange) and others that we discuss in Section 14.7.
Massive amounts of data are currently stored on the Internet in many formats (e.g., databases, web pages, text files). Much of this data, especially that which is passed between systems, will soon take the form of XML. Organizations see XML as the future of data encoding. Information technology groups are planning ways to integrate XML into their systems. Industry groups are developing custom XML vocabularies for most major indus-tries that will allow business applications to communicate in common languages. For example, many web services allow web-based applications to exchange data seamlessly through standard protocols based on XML. We discuss web services in Chapter 28.
The next generation of the web is being built on an XML foundation, enabling you to develop more sophisticated web-based applications. XML allows you to assign meaning to what would otherwise be random pieces of data. As a result, programs can “understand” the data they manipulate. For example, a web browser might view a street address in a simple web page as a string of characters without any real meaning. In an XML document, however, this data can be clearly identified (i.e., marked up) as an address. A program that uses the document can recognize this data as an address and provide links to a map of that location, driving directions from that location or other location-specific information. Likewise, an application can recognize names of people, dates, ISBN numbers and any other type of XML-encoded data. The application can then present users with other related information, providing a richer, more meaningful user experience.
Viewing and Modifying XML Documents
XML documents are highly portable. Viewing or modifying an XML document—which is a text file that usually ends with the .xml filename extension—does not require special software, although many software tools exist, and new ones are frequently released that make it more convenient to develop XML-based applications. Any text editor that sup-ports ASCII/Unicode characters can open XML documents for viewing and editing. Also, most web browsers can display XML documents in a formatted manner that shows the XML’s structure. Section 14.3 demonstrates this in Internet Explorer and Firefox. An im-portant characteristic of XML is that it is both human and machine readable.
Processing XML Documents
Processing an XML document requires software called an XML parser (or XML proces-sor). A parser makes the document’s data available to applications. While reading an XML document’s contents, a parser checks that the document follows the syntax rules specified by the W3C’s XML Recommendation ( www.w3.org/XML). XML syntax requires a single root element, a start tag and end tag for each element, and properly nested tags (i.e., the end tag for a nested element must appear before the end tag of the enclosing element). Fur-thermore, XML is case sensitive, so the proper capitalization must be used in elements. A document that conforms to this syntax is a well-formed XML document and is syntacti-cally correct. We present fundamental XML syntax in Section 14.3. If an XML parser can process an XML document successfully, that XML document is well-formed. Parsers can provide access to XML-encoded data in well-formed documents only.
Often, XML parsers are built into software or available for download over the Internet. Some popular parsers include Microsoft XML Core Services (MSXML)— which is included with Internet Explorer, the Apache Software Foundation’s Xerces
(xml.apache.org) and the open-source Expat XML Parser (expat.sourceforge.net).
Validating XML Documents
An XML document can reference a Document Type Definition (DTD) or a schema that defines the proper structure of the XML document. When an XML document references a DTD or a schema, some parsers (called validating parsers) can read the DTD/schema and check that the XML document follows the structure defined by the DTD/schema. If the XML document conforms to the DTD/schema (i.e., the document has the appropriate structure), the XML document is valid. For example, if in Fig. 14.1 we were referencing a DTD that specified that a player element must have firstName, lastName and battingAverage elements, then omitting the lastName element (line 7 in Fig. 14.1) would inval-idate the XML document player.xml. However, the XML document would still be well-formed, because it follows proper XML syntax (i.e., it has one root element, each element has a start tag and an end tag, and the elements are nested properly). By definition, a valid XML document is well-formed. Parsers that cannot check for document conformity against DTDs/schemas are nonvalidating parsers—they determine only whether an XML document is well-formed, not whether it is valid.
We discuss validation, DTDs and schemas, as well as the key differences between these two types of structural specifications, in Sections 14.5–14.6. For now, note that schemas are XML documents themselves, whereas DTDs are not. As you will learn in Section 14.6, this difference presents several advantages in using schemas over DTDs.
Formatting and Manipulating XML Documents
Most XML documents contain only data, not formatting instructions, so applications that process XML documents must decide how to manipulate or display the data. For example, a PDA (personal digital assistant) may render an XML document differently than a wire-less phone or a desktop computer. You can use Extensible Stylesheet Language (XSL) to specify rendering instructions for different platforms. We discuss XSL in Section 14.8.
XML-processing programs can also search, sort and manipulate XML data using XSL. Some other XML-related technologies are XPath (XML Path Language—a language for accessing parts of an XML document), XSL-FO (XSL Formatting Objects—an XML vocabulary used to describe document formatting) and XSLT (XSL Transformations—a language for transforming XML documents into other documents). We present XSLT and XPath in Section 14.8.
Copyright © 2018-2020 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.