Chapter: XML and Web Services : Essentials of XML : Validating XML with the Document Type Definition (DTD)

DTD Entities

Entities in DTDs are storage units. They can also be considered placeholders. Entities are special markups that contain content for insertion into the XML document.

DTD Entities

Entities in DTDs are storage units. They can also be considered placeholders. Entities are special markups that contain content for insertion into the XML document. Usually this will be some type of information that is bulky or repetitive. Entities make this type of information more easily handled because the DTD author can use them to indicate where the information should be inserted in the XML document. This is much better than hav-ing to retype the same information over and over.

 

An entity’s content could be well-formed XML, normal text, binary data, a database record, and so on. The main purpose of an entity is to hold content, and there is virtually no limit on the type of content an entity can hold.

 

The general syntax of an entity is as follows:

 

<!ENTITY  entityname  [SYSTEM  |  PUBLIC]  entitycontent>

ENTITY is the tag name that specifies that this definition will be for an entity.

 

entityname is the name by which the entity will be referred in the XML document.

 

•     entitycontent is the actual contents of the entity—the data for which the entity is serving as a placeholder.

 

•        SYSTEM and PUBLIC are optional keywords. Either one can be added to the defini-tion of an entity to indicate that the entity refers to external content.

Entities may either point to internal data or external data. Internal entities represent data that is contained completely within the DTD. External entities point to content in another

 

location via a URL. External data could be anything from normal parsed text in another file, to a graphics or audio file, to an Excel spreadsheet. The type of data to which an external entity can refer is virtually unlimited.

 

An entity is referenced in an XML document by inserting the name of the entity prefixed by & and suffixed by ;. When referenced in this manner, the content of the entity will be placed into the XML document when the document is parsed and validated. Let’s take a look at an example of how this works (see Listing 3.14).

 

LISTING 3.14    Using Internal Entities

<?xml version=”1.0”?>

<!DOCTYPE library [

 

<!ENTITY cpy “Copyright 2000”> <!ELEMENT library (book+)>

 

<!ELEMENT book (title,author,copyright)> <!ELEMENT title (#PCDATA)>

 

<!ELEMENT author (#PCDATA)> <!ELEMENT copyright (#PCDATA)> ]>

 

<library>

 

<book>

 

<title>How to Win Friends</title> <author>Joe Charisma</author> <copyright>&cpy;</copyright> </book>

 

<book>

 

<title>Make Money Fast</title> <author>Jimmy QuickBuck</author> <copyright>&cpy;</copyright> </book>

 

</library>

Listing 3.14 uses an internal DTD. In the DTD, an entity called cpy is declared that con-tains the content “Copyright 2000”. In the copyright element of the XML document, this entity is referenced by using &cpy;. When this document is parsed, &cpy; will be replaced with “Copyright 2000” in each instance in which it is used. Using the entity &cpy; saves the XML document author from having to type in “Copyright 2000” over and over. This is a fairly simple example, but imagine if the entity contained a string of data that was several hundred characters long. It is much more convenient (and easier on the fingers) to be able to reference a three- or four-character entity in an XML document than to type in all that content.

Predefined Entities

 

There are five predefined entities, as shown in Table 3.4. These entities do not have to be declared in the DTD. When an XML parser encounters these entities (unless they are contained in a CDATA section), they will automatically be replaced with the content they represent.

 

TABLE 3.4      Predefined Entities


The XML fragment in Listing 3.15 demonstrates the use of a predefined entity.

 

LISTING 3.15    Using Predefined Entities

 

 

<icecream>

 

<flavor>Cherry Garcia</flavor> <vendor>Ben &amp; Jerry’s</vendor>

</icecream>

In this listing, the ampersand in “Ben & Jerry’s” is replaced with the predefined entity for an ampersand (&amp;) .

 

External Entities

 

External entities are used to reference external content. As stated previously, external entities get their content by referencing it via a URL placed in the entitycontent por-tion of the entity declaration. Either the SYSTEM keyword or the PUBLIC keyword is used here to let the XML parser know that the content is external.


XML is incredibly flexible. External entities can contain references to almost any type of data—even other XML documents. One well-formed XML document can contain another well-formed XML document through the use of an external entity reference. Taking this a step further, it can be easily extrapolated that a single XML document can be made up of references to many small XML documents. When the document is parsed, the XML parser will gather all the small XML documents, merging them into a whole.

The end-user application will only see one document and never know the difference. One useful way to apply the principle of combining XML documents through the use of external entities would be in an employee-tracking application, like the one shown in Listing 3.16.

 

LISTING 3.16    Using External Entities

<?xml version=”1.0”?> <!DOCTYPE employees [

 

<!ENTITY bob SYSTEM “http://srvr/emps/bob.xml”>

<!ENTITY nancy SYSTEM “http://srvr/emps/nancy.xml”>

<!ELEMENT employees (clerk)>

 

<!ELEMENT clerk (#PCDATA)> ]>

 

<employees>

 

<clerk>&bob;</clerk>

 

<clerk>&nancy;</clerk>

 

</employees>

 

In this listing, two external entity references are used to refer to XML documents outside the current document that contain the employee data on “bob” (bob.xml) and “nancy” (nancy.xml). The SYSTEM keyword is used here to let the XML parser know that this is external content. In order to insert the external content into the XML document, the enti-ties &bob; and &nancy; are used. It is useful to be able to contain the employee informa-tion in a separate file and “import” it using an entity reference. This is because this same information could be easily referenced by other XML documents, such as an employee directory and a payroll application. Defining logical units of data and separating them into multiple documents, as in this example, makes the data more extensible and reduces the need to reproduce redundant data from document to document.

Non-Text External Entities and Notations

 

Some external entities will contain non-text data, such as an image file. We do not want the XML parser to attempt to parse these types of files. In order to stop the XML parser, we use the NDATA keyword. Take a look at the following declaration:

 

<!ENTITY  myimage  SYSTEM  “myimage.gif”  NDATA  gif>

 

The NDATA keyword is used to alert the parser that the entity content should be sent unparsed to the output document.

 

The final part of the declaration, gif, is a reference to a notation. A notation is a special declaration that identifies the format of non-text external data so that the XML applica-tion will know how handle the data. Any time an external reference to non-text data is used, a notation identifying the data must be included and referenced. Notations are declared in the body of the DTD and have the following syntax:

 

<!NOTATION  notationname  [SYSTEM  |  PUBLIC  ]  dataformat>

ENTITY is the tag name that specifies that this definition will be for an entity.

 

notationname is the name by which the notation will be referred in the XML document.

 

•     SYSTEM is a keyword that is added to the definition of the notation to indicate that the format of external data is being defined. You could also use the keyword PUB-LIC here instead of SYSTEM. However, using PUBLIC requires you to provide a URL to the data format definition.

 

•     dataformat is a reference to a MIME type, ISO standard, or some other location that can provide a definition of the data being referenced.

Listing 3.17 is an example of using notation declarations for non-text external entities.

LISTING 3.17        Using External Non-Text Entities

 

<!NOTATION  gif  SYSTEM  “image/gif”  >

 

<!ENTITY employeephoto SYSTEM “images/employees/MichaelQ.gif” NDATA gif >

<!ELEMENT employee (name, sex, title, years) >

 

<!ATTLIST  employee  pic  ENTITY  #IMPLIED  >

 

 

<employee  pic=”employeephoto”>

 

 

</employee>

 

In this example, an ENTITY type of attribute, pic, is defined for the element employee. In the XML document, the pic attribute is given the value employeephoto, which is an ex-ternal entity that serves as a placeholder for the GIF file MichaelQ.gif. In order to aid the application process and display the GIF file, the external entity (using the NDATA keyword) references the notation gif, which points to the MIME type for GIF files.

 

Parameter Entities

 

The final type of entity we will look at is the parameter entity, which is very similar to the internal entity. The main difference between an internal entity and a parameter entity is that a parameter entity may only be referenced inside the DTD. Parameter entities are in effect entities specifically for DTDs.

 

Parameter entities can be useful when you have to use a lot of repetitive or lengthy text in a DTD. Use the following syntax for parameter entities:

 

<!ENTITY  %  entityname  entitycontent>

 The syntax for a parameter entity is almost identical to the syntax for a normal, internal entity. However, notice that in the syntax, after the declaration, there is a space, a percent sign, and another space before entityname. This alerts the XML parser that this is a parameter entity and will be used only in the DTD. These types of entities, when refer-enced, should begin with % and end with ;. Listing 3.18 shows an example of this.

 

LISTING 3.18         Using Parameter Entities

 

<!ENTITY % pc “(#PCDATA)”>

<!ELEMENT name %pc;>

<!ELEMENT age %pc;>

<!ELEMENT weight %pc;>

 

In this listing, pc is used as a parameter entity to reference (#PCDATA). All entities in the DTD that hold parsed character data use the entity reference %pc;. This saves the DTD author from having to type #PCDATA over and over. This particular example is somewhat trivial, but you can see where this can be extrapolated out to a situation where you have a long character string that you do not want to have to retype.

 

We are almost finished. Having covered the use of element, attribute, and entity declara-tions in DTDs, we have just a few more loose ends to tie up. In the next section, we will look at the use of the IGNORE and INCLUDE directives. Then we will discuss the use of comments in DTDs. In the final part of the chapter, we will look at the future of DTDs, some possible shortcomings of DTDs, and a possible alternative for DTD validation. Before moving on though, let’s pay one more quick visit to the Zippy Human Resources department in our mini case study.

Zippy Human Resources: XML for Employee Records, Part III

This is the final part of the mini case study on the use of XML in the Human Resources department at Zippy Delivery Service. In Part II, the Human Resources department decided to change the structure of their DTD by moving the employees’ personal data into attributes. This created a separation between personal data and contact data (which remained stored in elements).

 

At this point, the Human Resources department felt pretty satisfied with their work. Now, however, there are just a couple more minor areas where they feel the DTD (Employees2.dtd) could be improved. They’ve decided that they need to add several entities in order to speed the entry process for new records and to cut down on having to retype redundant information. First, they’ve added an entity for “Delivery Person”. This makes sense to them because all but a few of the employees of Zippy Delivery Service are delivery people, and this will save them from having to type it over and over. The second entity they’ve decided to add is a parameter entity to give them a shortcut for entering #PCDATA type elements.

 

Here’s the updated DTD (you can download Employees3.dtd from the Sams Web site):

 


<!ENTITY dp “Delivery Person”> <!ENTITY % pc “#PCDATA”> <!ELEMENT employees (employee+) >

 

<!ELEMENT employee (name, position, address1, address2?, city, state, zip, phone?, email?) >

 

<!ATTLIST employee serial ID #REQUIRED > <!ELEMENT name (%pc;) >

 

<!ATTLIST  name

 

age CDATA #REQUIRED sex CDATA #REQUIRED

race CDATA #IMPLIED m_status CDATA #REQUIRED >

<!ELEMENT position (%pc;) >

<!ELEMENT address1 (%pc;) >

<!ELEMENT address2 (%pc;) >

<!ELEMENT city (%pc;) >

<!ELEMENT state (%pc;) >

<!ELEMENT zip (%pc;) >

<!ELEMENT phone (%pc;) >

 <!ELEMENT email (%pc;) >

 

In the new DTD, the entity dp is declared first. This entity is used to insert the value “Delivery Person” into the XML document when it is referenced. Next, the entity pc is declared. This is a parameter entity that holds the value “#PCDATA” for insertion into the DTD when referenced.

 

The XML document Employees2.xml has been updated to reflect the addition of the dp entity (the whole XML document is not listed because only a few lines actually changed; data not shown here should be assumed to be the same as in Parts I and II of this case study). Here’s the code for Employees3.xml (which you can download from the Sams Web site):

 

<?xml  version=”1.0”?>

 

<!DOCTYPE employees SYSTEM “employees3.dtd”> <employees>

 

<employee  serial=”emp1”>

 

<name age=”37” sex=”Male” race=”African American” m_status=”Married”> Bob Jones

 

</name>

 

<position>Dispatcher</position>

 

 

</employee>

 

<employee  serial=”emp2”>

 

<name age=”19” sex=”Female” race=”Caucasian” m_status=”Single”> Mary Parks

 

</name>

 

<position>&dp;</position>

 

 

</employee>

 

<employee  serial=”emp3”>

 

<name age=”23” sex=”Male” race=”African American” m_status=”Single”> Jimmy Griffin

 

</name>

 

<position>&dp;</position>

 

 

</employee>

 

</employees>

For the first employee, Bob Jones, the dp entity was not used for his position value because he is the company’s dispatcher. However, for Mary Parks and Jimmy Griffin, the entity reference &dp; was inserted as the value for their position elements because they are both delivery people. This entity reference would also be used for any new employees added to the XML document that are delivery people.

 

The DTD for Zippy Deliver Service’s Human Resources department is now com-plete. The DTD contains all the information required. It takes account for infor-mation that might not be applicable. The employees’ personal and contact information has been logically separated between attributes and elements. Also, entities have been added to serve as timesaving devices for future addi-tions to the XML document. The Zippy Human Resource department has built a DTD that will serve to validate their XML employee records effectively and efficiently.



Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail
XML and Web Services : Essentials of XML : Validating XML with the Document Type Definition (DTD) : DTD Entities |


Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.