W3C XML Schema Documents
In this section, we introduce schemas for specifying XML document structure and validating XML documents. Many developers in the XML community believe that DTDs are not flexible enough to meet today’s programming needs. For example, DTDs lack a way of indicating what specific type of data (e.g., numeric, text) an element can contain, and DTDs are not themselves XML documents, forcing developers to learn multiple gram-mars and developers to create multiple types of parsers. These and other limitations have led to the development of schemas.
Unlike DTDs, schemas do not use EBNF grammar. Instead, schemas use XML syntax and are actually XML documents that programs can manipulate. Like DTDs, schemas are used by validating parsers to validate documents.
In this section, we focus on the W3C’s XML Schema vocabulary (note the capital “S” in “Schema”). We use the term XML Schema in the rest of the chapter whenever we refer to W3C’s XML Schema vocabulary. For the latest information on XML Schema, visit www.w3.org/XML/Schema. For tutorials on XML Schema concepts beyond what we present here, visit www.w3schools.com/schema/default.asp.
Recall that a DTD describes an XML document’s structure, not the content of its ele-ments. For example,
contains character data. If the document that contains element quantity references a DTD, an XML parser can validate the document to confirm that this element indeed does contain PCDATA content. However, the parser cannot validate that the content is numeric; DTDs do not provide this capability. So, unfortunately, the parser also considers
to be valid. An application that uses the XML document containing this markup should test that the data in element quantity is numeric and take appropriate action if it is not.
XML Schema enables schema authors to specify that element quantity’s data must be numeric or, even more specifically, an integer. A parser validating the XML document against this schema can determine that 5 conforms and hello does not. An XML docu-ment that conforms to a schema document is schema valid, and one that does not conform is schema invalid. Schemas are XML documents and therefore must themselves be valid.
Validating Against an XML Schema Document
Figure 14.11 shows a schema-valid XML document named book.xml, and Fig. 14.12 shows the pertinent XML Schema document (book.xsd) that defines the structure for book.xml. By convention, schemas use the .xsd extension. We used an online XSD sche-ma validator provided at
to ensure that the XML document in Fig. 14.11 conforms to the schema in Fig. 14.12. To validate the schema document itself (i.e., book.xsd) and produce the output shown in Fig. 14.12, we used an online XSV (XML Schema Validator) provided by the W3C at www.w3.org/2001/03/webdata/xsv
These tools are free and enforce the W3C’s specifications regarding XML Schemas and schema validation.
1 <?xml version = "1.0"?>
3 <!-- Fig. 14.11: book.xml -->
4 <!-- Book list marked up as XML -->
5 <deitel:books xmlns:deitel = "http://www.deitel.com/booklist">
7 <title>Visual Basic 2005 How to Program, 3/e</title>
10 <title>Visual C# 2005 How to Program, 2/e</title>
13 <title>Java How to Program, 7/e</title>
16 <title>C++ How to Program, 6/e</title>
19 <title>Internet and World Wide Web How to Program, 4/e</title>
Fig. 14.11 | Schema-valid XML document describing a list of books.
1 <?xml version = "1.0"?>
3 <!-- Fig. 14.12: book.xsd -->
4 <!-- Simple W3C XML Schema document -->
5 <schema xmlns = "http://www.w3.org/2001/XMLSchema"
6 xmlns:deitel = "http://www.deitel.com/booklist"
7 targetNamespace = "http://www.deitel.com/booklist">
9 <element name = "books" type = "deitel:BooksType"/>
11 <complexType name = "BooksType">
13 <element name = "book" type = "deitel:SingleBookType"
14 minOccurs = "1" maxOccurs = "unbounded"/>
18 <complexType name = "SingleBookType">
20 <element name = "title" type = "string"/>
Fig. 14.12 | XML Schema document for book.xml.
Figure 14.11 contains markup describing several Deitel books. The books element (line 5) has the namespace prefix deitel, indicating that the books element is a part of the http://www.deitel.com/booklist namespace.
Creating an XML Schema Document
Figure 14.12 presents the XML Schema document that specifies the structure of book.xml (Fig. 14.11). This document defines an XML-based language (i.e., a vocabulary) for writ-ing XML documents about collections of books. The schema defines the elements, at-tributes and parent/child relationships that such a document can (or must) include. The schema also specifies the type of data that these elements and attributes may contain.
Root element schema (Fig. 14.12, lines 5–23) contains elements that define the struc-ture of an XML document such as book.xml. Line 5 specifies as the default namespace the standard W3C XML Schema namespace URI— http://www.w3.org/2001/XMLSchema.
This namespace contains predefined elements (e.g., root-element schema) that comprise the XML Schema vocabulary—the language used to write an XML Schema document.
Line 6 binds the URI http://www.deitel.com/booklist to namespace prefix deitel. As we discuss momentarily, the schema uses this namespace to differentiate names created by us from names that are part of the XML Schema namespace. Line 7 also specifies http://www.deitel.com/booklist as the targetNamespace of the schema. This attribute identifies the namespace of the XML vocabulary that this schema defines. Note that the targetNamespace of book.xsd is the same as the namespace referenced in line 5 of book.xml (Fig. 14.11). This is what “connects” the XML document with the schema that defines its structure. When an XML schema validator examines book.xml and book.xsd, it will recognize that book.xml uses elements and attributes from the http:// www.deitel.com/booklist namespace. The validator also will recognize that this namespace is the namespace defined in book.xsd (i.e., the schema’s targetNamespace).
Thus the validator knows where to look for the structural rules for the elements and attributes used in book.xml.
Defining an Element in XML Schema
In XML Schema, the element tag (line 9) defines an element to be included in an XML document that conforms to the schema. In other words, element specifies the actual ele-ments that can be used to mark up data. Line 9 defines the books element, which we use as the root element in book.xml (Fig. 14.11). Attributes name and type specify the ele-ment’s name and type, respectively. An element’s type indicates the data that the element may contain. Possible types include XML Schema-defined types (e.g., string, double) and user-defined types (e.g., BooksType, which is defined in lines 11–16). Figure 14.13 lists several of XML Schema’s many built-in types. For a complete list of built-in types, see
Section 3 of the specification found at www.w3.org/TR/xmlschema-2.
Fig. 14.13 | Some XML Schema types.
In this example, books is defined as an element of type deitel:BooksType (line 9).
BooksType is a user-defined type (lines 11–16) in the http://www.deitel.com/booklist namespace and therefore must have the namespace prefix deitel. It is not an existing XML Schema type.
Two categories of type exist in XML Schema—simple types and complex types. Simple and complex types differ only in that simple types cannot contain attributes or child elements and complex types can.
A user-defined type that contains attributes or child elements must be defined as a complex type. Lines 11–16 use element complexType to define BooksType as a complex type that has a child element named book. The sequence element (lines 12–15) allows you to specify the sequential order in which child elements must appear. The element (lines 13–14) nested within the complexType element indicates that a BooksType element (e.g., books) can contain child elements named book of type deitel:SingleBookType (defined in lines 18–22). Attribute minOccurs (line 14), with value 1, specifies that elements of type BooksType must contain a minimum of one book element. Attribute maxOccurs (line 14), with value unbounded, specifies that elements of type BooksType may have any number of book child elements.
Lines 18–22 define the complex type SingleBookType. An element of this type con-tains a child element named title. Line 20 defines element title to be of simple type string. Recall that elements of a simple type cannot contain attributes or child elements. The schema end tag (</schema>, line 23) declares the end of the XML Schema document.
A Closer Look at Types in XML Schema
Every element in XML Schema has a type. Types include the built-in types provided by XML Schema (Fig. 14.13) or user-defined types (e.g., SingleBookType in Fig. 14.12).
Every simple type defines a restriction on an XML Schema-defined type or a restric-tion on a user-defined type. Restrictions limit the possible values that an element can hold.
Complex types are divided into two groups—those with simple content and those with complex content. Both can contain attributes, but only complex content can contain child elements. Complex types with simple content must extend or restrict some other existing type. Complex types with complex content do not have this limitation. We dem-onstrate complex types with each kind of content in the next example.
The schema document in Fig. 14.14 creates both simple types and complex types. The XML document in Fig. 14.15 (laptop.xml) follows the structure defined in Fig. 14.14 to describe parts of a laptop computer. A document such as laptop.xml that conforms to a schema is known as an XML instance document—the document is an instance (i.e., example) of the schema.
Line 5 declares the default namespace to be the standard XML Schema namespace— any elements without a prefix are assumed to be in the XML Schema namespace. Line 6 binds the namespace prefix computer to the namespace http://www.deitel.com/ computer. Line 7 identifies this namespace as the targetNamespace—the namespace being defined by the current XML Schema document.
To design the XML elements for describing laptop computers, we first create a simple type in lines 9–13 using the simpleType element. We name this simpleType gigahertz
1 <?xml version = "1.0"?>
2 <!-- Fig. 14.14: computer.xsd -->
3 <!-- W3C XML Schema document -->
5 <schema xmlns = "http://www.w3.org/2001/XMLSchema"
6 xmlns:computer = "http://www.deitel.com/computer"
7 targetNamespace = "http://www.deitel.com/computer">
9 <simpleType name = "gigahertz">
10 <restriction base = "decimal">
11 <minInclusive value = "2.1"/>
15 <complexType name = "CPU">
17 <extension base = "string">
18 <attribute name = "model" type = "string"/>
23 <complexType name = "portable">
25 <element name = "processor" type = "computer:CPU"/>
26 <element name = "monitor" type = "int"/>
27 <element name = "CPUSpeed" type = "computer:gigahertz"/>
28 <element name = "RAM" type = "int"/>
30 <attribute name = "manufacturer" type = "string"/>
33 <element name = "laptop" type = "computer:portable"/>
Fig. 14.14 | XML Schema document defining simple and complex types.
because it will be used to describe the clock speed of the processor in gigahertz. Simple types are restrictions of a type typically called a base type. For this simpleType, line 10 declares the base type as decimal, and we restrict the value to be at least 2.1 by using the minInclusive element in line 11.
Next, we declare a complexType named CPU that has simpleContent (lines 16–20).
Remember that a complex type with simple content can have attributes but not child ele-ments. Also recall that complex types with simple content must extend or restrict some XML Schema type or user-defined type. The extension element with attribute base (line 17) sets the base type to string. In this complexType, we extend the base type string with an attribute. The attribute element (line 18) gives the complexType an attribute of type string named model. Thus an element of type CPU must contain string text (because the base type is string) and may contain a model attribute that is also of type string.
Last, we define type portable, which is a complexType with complex content (lines 23–31). Such types are allowed to have child elements and attributes. The element all (lines 24–29) encloses elements that must each be included once in the corresponding XML instance document. These elements can be included in any order. This complex type holds four elements—processor, monitor, CPUSpeed and RAM. They are given types CPU, int, gigahertz and int, respectively. When using types CPU and gigahertz, we must include the namespace prefix computer, because these user-defined types are part of the computer namespace ( http://www.deitel.com/computer)—the namespace defined in the current document (line 7). Also, portable contains an attribute defined in line 30. The attribute element indicates that elements of type portable contain an attribute of
type string named manufacturer.
Line 33 declares the actual element that uses the three types defined in the schema. The element is called laptop and is of type portable. We must use the namespace prefix
computer in front of portable.
We have now created an element named laptop that contains child elements processor, monitor, CPUSpeed and RAM, and an attribute manufacturer. Figure 14.15 uses the laptop element defined in the computer.xsd schema. Once again, we used an online XSD schema validator ( www.xmlforasp.net/SchemaValidator.aspx) to ensure that this XML instance document adheres to the schema’s structural rules.
Line 5 declares namespace prefix computer. The laptop element requires this prefix because it is part of the http://www.deitel.com/computer namespace. Line 6 sets the
1 <?xml version = "1.0"?>
3 <!-- Fig. 14.15: laptop.xml -->
4 <!-- Laptop components marked up as XML -->
5 <computer:laptop xmlns:computer = "http://www.deitel.com/computer"
6 manufacturer = "IBM">
8 <processor model = "Centrino">Intel</processor>
Fig. 14.15 | XML document using the laptop element defined in computer.xsd.
laptop’s manufacturer attribute, and lines 8–11 use the elements defined in the schema to describe the laptop’s characteristics.
This section introduced W3C XML Schema documents for defining the structure of XML documents, and we validated XML instance documents against schemas using an online XSD schema validator. Section 14.7 discusses several XML vocabularies and dem-onstrates the MathML vocabulary. Section 14.10 demonstrates the RSS vocabulary.
Copyright © 2018-2020 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.