DTD
Entities
Entities in DTDs are storage
units. They can also be considered placeholders. Entities are special markups
that contain content for insertion into the XML document. Usually this will be
some type of information that is bulky or repetitive. Entities make this type
of information more easily handled because the DTD author can use them to
indicate where the information should be inserted in the XML document. This is
much better than hav-ing to retype the same information over and over.
An entity’s content could be
well-formed XML, normal text, binary data, a database record, and so on. The
main purpose of an entity is to hold content, and there is virtually no limit
on the type of content an entity can hold.
The general syntax of an
entity is as follows:
<!ENTITY entityname
[SYSTEM | PUBLIC]
entitycontent>
ENTITY is the tag name that specifies that this
definition will be for an entity.
entityname is the name by which the entity will be
referred in the XML document.
• entitycontent is the actual contents of the entity—the data for which the entity
is serving as a placeholder.
• SYSTEM and PUBLIC are optional keywords. Either one can be added to the defini-tion
of an entity to indicate that the entity refers to external content.
Entities may either point to
internal data or external data. Internal entities represent data that is
contained completely within the DTD. External entities point to content in
another
location via a URL. External
data could be anything from normal parsed text in another file, to a graphics
or audio file, to an Excel spreadsheet. The type of data to which an external
entity can refer is virtually unlimited.
An entity is referenced in an
XML document by inserting the name of the entity prefixed by & and suffixed by ;. When referenced in this
manner, the content of the entity will be placed into the XML document when the
document is parsed and validated. Let’s take a look at an example of how this
works (see Listing 3.14).
LISTING 3.14 Using
Internal Entities
<?xml version=”1.0”?>
<!DOCTYPE library [
<!ENTITY cpy “Copyright
2000”> <!ELEMENT library (book+)>
<!ELEMENT book
(title,author,copyright)> <!ELEMENT title (#PCDATA)>
<!ELEMENT author
(#PCDATA)> <!ELEMENT copyright (#PCDATA)> ]>
<library>
<book>
<title>How to Win
Friends</title> <author>Joe Charisma</author>
<copyright>&cpy;</copyright> </book>
<book>
<title>Make Money
Fast</title> <author>Jimmy QuickBuck</author>
<copyright>&cpy;</copyright> </book>
</library>
Listing 3.14 uses an internal
DTD. In the DTD, an entity called cpy is declared that con-tains the content
“Copyright 2000”. In the copyright element of the XML document, this entity is referenced by using &cpy;. When this document is
parsed, &cpy; will be replaced with “Copyright 2000” in each instance in which
it is used. Using the entity &cpy; saves the XML document author from having to type in “Copyright
2000” over and over. This is a fairly simple example, but imagine if the
entity contained a string of data that was several hundred characters long. It
is much more convenient (and easier on the fingers) to be able to reference a
three- or four-character entity in an XML document than to type in all that
content.
Predefined
Entities
There are five predefined
entities, as shown in Table 3.4. These entities do not have to be declared in
the DTD. When an XML parser encounters these entities (unless they are
contained in a CDATA section), they will automatically be replaced with the content
they represent.
TABLE 3.4 Predefined
Entities
The XML fragment in Listing
3.15 demonstrates the use of a predefined entity.
LISTING 3.15 Using
Predefined Entities
<icecream>
<flavor>Cherry
Garcia</flavor> <vendor>Ben & Jerry’s</vendor>
</icecream>
In this listing, the
ampersand in “Ben & Jerry’s” is replaced with the predefined entity for an
ampersand (&) .
External
Entities
External entities are used to
reference external content. As stated previously, external entities get their
content by referencing it via a URL placed in the entitycontent por-tion of the entity
declaration. Either the SYSTEM keyword or the PUBLIC keyword is used here to let the XML parser know that the content
is external.
XML is incredibly flexible.
External entities can contain references to almost any type of data—even other
XML documents. One well-formed XML document can contain another well-formed XML
document through the use of an external entity reference. Taking this a step
further, it can be easily extrapolated that a single XML document can be made
up of references to many small XML documents. When the document is parsed, the
XML parser will gather all the small XML documents, merging them into a whole.
The end-user application will
only see one document and never know the difference. One useful way to apply
the principle of combining XML documents through the use of external entities
would be in an employee-tracking application, like the one shown in Listing
3.16.
LISTING 3.16 Using
External Entities
<?xml version=”1.0”?>
<!DOCTYPE employees [
<!ENTITY bob SYSTEM
“http://srvr/emps/bob.xml”>
<!ENTITY nancy SYSTEM
“http://srvr/emps/nancy.xml”>
<!ELEMENT employees
(clerk)>
<!ELEMENT clerk
(#PCDATA)> ]>
<employees>
<clerk>&bob;</clerk>
<clerk>&nancy;</clerk>
</employees>
In this listing, two external
entity references are used to refer to XML documents outside the current
document that contain the employee data on “bob” (bob.xml) and “nancy” (nancy.xml). The SYSTEM keyword is used here to let
the XML parser know that this is external content. In order to insert the
external content into the XML document, the enti-ties &bob; and &nancy; are used. It is useful to be
able to contain the employee informa-tion in a separate file and “import” it
using an entity reference. This is because this same information could be
easily referenced by other XML documents, such as an employee directory and a
payroll application. Defining logical units of data and separating them into
multiple documents, as in this example, makes the data more extensible and
reduces the need to reproduce redundant data from document to document.
Non-Text
External Entities and Notations
Some external entities will
contain non-text data, such as an image file. We do not want the XML parser to
attempt to parse these types of files. In order to stop the XML parser, we use
the NDATA keyword. Take a look at the
following declaration:
<!ENTITY myimage
SYSTEM “myimage.gif” NDATA
gif>
The NDATA keyword is used to alert the
parser that the entity content should be sent unparsed to the output document.
The final part of the
declaration, gif, is a reference to a notation. A notation is a special declaration that identifies the format of
non-text external data so that the XML applica-tion will know how handle the
data. Any time an external reference to non-text data is used, a notation
identifying the data must be included and referenced. Notations are declared in
the body of the DTD and have the following syntax:
<!NOTATION notationname
[SYSTEM | PUBLIC
] dataformat>
ENTITY is the tag name that specifies that this
definition will be for an entity.
notationname is the name by which the notation will be
referred in the XML document.
• SYSTEM is a keyword that is added to the definition of the notation to
indicate that the format of external data is being defined. You could also use
the keyword PUB-LIC here instead of SYSTEM. However, using PUBLIC requires you to provide a URL to the data format definition.
• dataformat is a reference to a MIME type, ISO standard, or some other location that can provide a definition
of the data being referenced.
Listing 3.17 is an example of
using notation declarations for non-text external entities.
LISTING 3.17 Using
External Non-Text Entities
<!NOTATION gif
SYSTEM “image/gif” >
<!ENTITY employeephoto
SYSTEM “images/employees/MichaelQ.gif” NDATA gif >
<!ELEMENT employee (name,
sex, title, years) >
<!ATTLIST employee
pic ENTITY #IMPLIED
>
…
<employee pic=”employeephoto”>
…
</employee>
In this example, an ENTITY type of attribute, pic, is defined for the element employee. In the XML document, the pic attribute is given the value
employeephoto, which is an ex-ternal
entity that serves as a placeholder for the GIF file MichaelQ.gif. In order to aid the
application process and display the GIF file, the external entity (using the NDATA keyword) references the
notation gif, which points to the MIME type for GIF files.
Parameter
Entities
The final type of entity we
will look at is the parameter entity, which is very similar to the internal
entity. The main difference between an internal entity and a parameter entity
is that a parameter entity may only be referenced inside the DTD. Parameter
entities are in effect entities specifically for DTDs.
Parameter entities can be
useful when you have to use a lot of repetitive or lengthy text in a DTD. Use
the following syntax for parameter entities:
<!ENTITY %
entityname entitycontent>
LISTING 3.18 Using
Parameter Entities
<!ENTITY % pc
“(#PCDATA)”>
<!ELEMENT name %pc;>
<!ELEMENT age %pc;>
<!ELEMENT weight %pc;>
In this listing, pc is used as a parameter
entity to reference (#PCDATA). All entities in the DTD that hold parsed character data use the
entity reference %pc;. This saves the DTD author from having to type #PCDATA over and over. This
particular example is somewhat trivial, but you can see where this can be
extrapolated out to a situation where you have a long character string that you
do not want to have to retype.
We are almost finished.
Having covered the use of element, attribute, and entity declara-tions in DTDs,
we have just a few more loose ends to tie up. In the next section, we will look
at the use of the IGNORE and INCLUDE directives. Then we will discuss the use of comments in DTDs. In
the final part of the chapter, we will look at the future of DTDs, some
possible shortcomings of DTDs, and a possible alternative for DTD validation.
Before moving on though, let’s pay one more quick visit to the Zippy Human
Resources department in our mini case study.
Zippy Human Resources: XML for Employee Records, Part III
This is
the final part of the mini case study on the use of XML in the Human Resources
department at Zippy Delivery Service. In Part II, the Human Resources
department decided to change the structure of their DTD by moving the
employees’ personal data into attributes. This created a separation between
personal data and contact data (which remained stored in elements).
At this
point, the Human Resources department felt pretty satisfied with their work.
Now, however, there are just a couple more minor areas where they feel the DTD
(Employees2.dtd) could
be improved. They’ve decided that they need to add several entities in order to
speed the entry process for new records and to cut down on having to retype
redundant information. First, they’ve added an entity for “Delivery Person”.
This makes sense to them because all but a few of the employees of Zippy
Delivery Service are delivery people, and this will save them from having to
type it over and over. The second entity they’ve decided to add is a parameter
entity to give them a shortcut for entering #PCDATA type elements.
Here’s
the updated DTD (you can download Employees3.dtd from the Sams Web site):
<!ENTITY dp “Delivery
Person”> <!ENTITY % pc “#PCDATA”> <!ELEMENT employees (employee+)
>
<!ELEMENT employee (name,
position, address1, address2?, city, state, zip, phone?, email?) >
<!ATTLIST employee serial
ID #REQUIRED > <!ELEMENT name (%pc;) >
<!ATTLIST name
age CDATA #REQUIRED sex CDATA
#REQUIRED
race CDATA #IMPLIED m_status
CDATA #REQUIRED >
<!ELEMENT position (%pc;)
>
<!ELEMENT address1 (%pc;)
>
<!ELEMENT address2 (%pc;)
>
<!ELEMENT city (%pc;) >
<!ELEMENT state (%pc;)
>
<!ELEMENT zip (%pc;) >
<!ELEMENT phone (%pc;)
>
<!ELEMENT email (%pc;) >
In the
new DTD, the entity dp
is declared first. This entity is used to insert the value “Delivery Person”
into the XML document when it is referenced. Next, the entity pc is
declared. This is a parameter entity that holds the value “#PCDATA” for
insertion into the DTD when referenced.
The XML
document Employees2.xml has been updated to reflect the addition of
the dp entity
(the whole XML document is not listed because only a few lines actually
changed; data not shown here should be assumed to be the same as in Parts I and
II of this case study). Here’s the code for Employees3.xml (which you can download from the Sams Web
site):
<?xml version=”1.0”?>
<!DOCTYPE employees SYSTEM
“employees3.dtd”> <employees>
<employee serial=”emp1”>
<name age=”37” sex=”Male”
race=”African American” m_status=”Married”> Bob Jones
</name>
<position>Dispatcher</position>
…
</employee>
<employee serial=”emp2”>
<name age=”19”
sex=”Female” race=”Caucasian” m_status=”Single”> Mary Parks
</name>
<position>&dp;</position>
…
</employee>
<employee serial=”emp3”>
<name age=”23” sex=”Male”
race=”African American” m_status=”Single”> Jimmy Griffin
</name>
<position>&dp;</position>
…
</employee>
</employees>
For the
first employee, Bob Jones, the dp
entity was not used for his position value because he is the company’s dispatcher.
However, for Mary Parks and Jimmy Griffin, the entity reference &dp; was
inserted as the value for their position elements because they are both delivery people.
This entity reference would
also be used for any new employees added to the XML document that are delivery
people.
The DTD
for Zippy Deliver Service’s Human Resources department is now com-plete. The
DTD contains all the information required. It takes account for infor-mation
that might not be applicable. The employees’ personal and contact information
has been logically separated between attributes and elements. Also, entities
have been added to serve as timesaving devices for future addi-tions to the XML
document. The Zippy Human Resource department has built a DTD that will serve
to validate their XML employee records effectively and efficiently.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.