An Overview of XHTML
Someday the Web will be standardized. All Web pages will be completely cross-platform compatible and will load faster. Also, work will get done more quickly (especially for us Web developers). However, the standardization of the Web is still over the horizon, and movement toward that goal is painfully slow. In order to help prepare for the future, the W3C organization introduced XHTML 1.0 as an official recommendation on January 26, 2000. XHTML is a step toward the goal of standardizing markup for the Web. It is also a step toward making the Web “XML compatible.” XHTML is an XML application. XHTML is a reformulation of HTML into an XML application. Therefore, HTML is made XML compatible and open to interaction with future XML technologies.
XHTML 1.0: The Transition
XHMTL 1.0 was introduced in order to serve as a bridge (or transition) from older tech-nologies (such as the splintered and incompatible variations of HTML) to newer tech-nologies (such as XML). XHTML 1.0 creates a markup that is compatible with older Web browsers but also will be compatible as support is picked up for emerging technolo-gies. XHTML 1.0 is very similar to HTML 4. Basically, it has simply taken HTML 4 and reformulated it as an XML application.
Making HTML XML Compliant
The main goals of XHTML are to make documents XML compliant and to address the incompatibilities of HTML in the major Web browsers. Once this compliance is achieved, support will be ensured for XML technologies such as XSL, and pages will be able to be parsed and edited with standard XML tools. Also, because XHTML 1.0 is so close to HTML 4, existing Web pages can be updated to XHTML 1.0 compliance with mostly only minor changes. Developers and Webmasters of sites consisting of hundreds or thousands of pages should not break into a cold sweat at the thought of upgrading to XHTML 1.0. It is really quite easy. Before going into the three variations (DTDs) of XHTML 1.0, let’s take a look at Listings 11.10 and 11.11. These listings give you a quick before-and-after picture of how a document would be upgraded from HTML to XHTML 1.0 compliance. The code for this listing, beforexhtml.HTML, can be down-loaded from the Sams Web site.
LISTING 11.10 Document Before XHTML 1.0 Compliance
<TITLE>Sample HTML Page: Pre-XHTML 1.0 Conversion</TITLE> </HEAD>
<H1>My Favorite Musical Groups</H1> <P>
<LI>Dave Mathews Band <LI>Beck <LI>Offspring
<H4>Pretty eclectic tastes, ay? </BODY>
Listing 11.10 is a pretty typical HTML document. You can see that the tags are capital-ized and the <P>, <LI>, and <H4> tags are not closed with ending tags. However, despite not being well formed, a Web browser will render this page with no problems. Listing 11.10, although okay for HTML, is wrong in XHTML 1.0. Listing 11.11 shows how this page would be changed to be XHTML 1.0 compliant. The code for this listing, afterxhtml.HTML, can be downloaded from the Sams Web site.
LISTING 11.11 Document After XHTML 1.0 Compliance
<!DOCTYPE html PUBLIC
“-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”> <html>
<title>Sample HTML Page: Post-XHTML 1.0 Conversion</title> </head>
<h1>My Favorite Musical Groups</h1> <p />
<li>Dave Mathews Band</li> <li>Beck</li> <li>Offspring</li>
<h4>Pretty eclectic tastes, ay?</h4> </body>
The first thing you will notice in this listing is that a Document Type Declaration has been added. The Document Type Declaration contains a public reference to the Transitional DTD for XHTML 1.0 (this will be covered in detail in the next section). Additionally, all the HTML tags have been set to lowercase according to XHTML rules. HTML is notably lax about capitalization and will accept both lowercase and uppercase tags (or even mixtures of both). XML (XHTML) is strict about requiring lowercase tag names. Also, all the opening tags have been closed. The <p> tags are empty tags and have had closing “/” symbols added. A closing tag could have been added (<p></p>); however, it is easier to simply treat them like empty tags. This is the same rule for empty tags in XML. The difference being that in XHTML, in order for these tags to display properly in a Web browser, a space is inserted before the “/” symbol.
Listing 11.11 could potentially be displayed using any HTML extension, an XHTML extension, or an XML extension (when viewed with Internet Explorer 5.5). However, with the XML extension, the file may be verified for well-formedness and validated against a DTD. This provides a clear picture of the use of XHTML as a bridge from HTML to XML. Although Listing 11.10 and Listing 11.11 will be rendered by the browser equally well with an HTML extension, only Listing 11.11 will be rendered with an XML extension. This is because only Listing 11.11 is valid XML. XHTML is used to make the HTML into valid XML.
There is no difference between how Listing 11.10 and 11.11 will be displayed in today’s Web browsers. Figure 11.5 demonstrates how both Listing 11.10 and Listing 11.11 would be rendered in Internet Explorer 5.5.
The two renderings are exactly the same! The big difference is that Listing 11.11 is now a well formed XHTML document. Listing 11.11 is compatible with XML technology and may be fully integrated with future XML technology applications. What’s more, this was all relatively painless to do! Certainly there will be varying degrees of work that needs to be done on existing pages to make them compatible, but you can see that you won’t ever have to scrap your whole Web site to achieve compatibility. More than likely you will only have to make minor changes.
Specific syntax rules apply to XHTML in order to make a document well formed. We will cover those rules in a moment, but first let’s take a look at the three variants, or DTDs, that have been created for XHTML 1.0.
Variants of XHTML
In order to conform to XHTML, a document must be validated against one of three DTDs that have been defined for XHTML. These DTDs are reformulations of the DTDs defined for HTML 4: Strict, Transitional, and Frameset.
A strictly conforming XHTML document that references the Strict DTD will have the following Document Type Declaration:
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>
Adherence to the Strict DTD means that the XHTML document will have the following characteristics:
There will be a strict separation of presentation from structure. Style sheets are used for formatting, and the XHTML markup is very clean and uncluttered. There are no optional vendor-specific HTML extensions.
The Document Type Definition must be present and placed before the <html> element in the document.
The root element of the document will be <html>.
The <html> element will have the xmlns attribute in order to designate the XHTML namespace.
The document, of course, is valid according to the rules defined in the Strict DTD.
Listing 11.12 gives a very simple example of an XHTML page that conforms to the Strict DTD. The code for this listing,strictdtd.HTML, can be downloaded from the Sams Web site.
LISTING 11.12 Strict DTD Reference
PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”> <html xmlns=”http://www.w3.org/1999/xhtml”>
<title>Strict XHTML DTD Reference</title> </head>
<h1>Strict XHTML DTD Reference</h1> <table>
This is a plain, vanilla page. </td>
There are no special formatting elements included. </td>
If any formatting is needed a CSS style sheet could be referenced. </td>
In this listing, no special formatting HTML elements are included. Additionally, this is, according to XML rules, a well-formed document. Therefore, this page would be valid according the Strict DTD. Any special formatting needed could be added by referencing a CSS style sheet.
The most important requirement for the Strict DTD is the separation of presentation and structure. How many of your existing Web pages meet this requirement? How difficult would it be to get your Web pages to meet this requirement? In most of your existing Web pages, you will have an almost terminal mixture of presentation and structure in your HTML. In order to comply with the Strict DTD, you would probably have to make fairly extensive changes to your existing Web pages. In order to address this potential problem, the Transitional DTD was created to be much more lenient in its rules. It is much simpler to make an HTML page “Transitional compliant” than it is to make a page “Strict compliant.” The Transitional DTD is covered next.
The Transitional DTD for XHTML has more loosely defined requirements than the Strict DTD. As such, it is much easier to use with current Web browsers than the Strict DTD. To be more specific, you have to make far fewer changes to your existing Web pages.
As long as the Transitional DTD is referenced from the Document Type Definition, the HTML is well formed, and it follows the basic XHTML syntax rules (more on the syntax rules in a moment), there should not be any problems.
A Document Type Declaration containing a reference to the Transitional DTD will appear as follows:
PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>
This DTD is also useful if you are using or have Web site visitors that use Web browsers that do not support CSS style sheets. If you must support a lot of the formatting HTML elements, such as <font>, <b>, <u>, and so on, due to the necessity of supporting Web browsers that do not support CSS, then the Transitional DTD is your best bet to becom-ing XHTML compliant. Listing 11.13 shows an XHTML page that uses a lot of format-ting elements but is still valid because it references the Transitional DTD. The code for this listing, transdtd.HTML, can be downloaded from the Sams Web site.
LISTING 11.13 Transitional DTD Reference
PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”> <html xmlns=”http://www.w3.org/1999/xhtml”>
<title>Transitional XHTML DTD Reference</title> </head>
<h1><font face=”arial” color=”#ff0000”> <b>Transitional</b> XHTML DTD Reference</font></h1> <table>
<font size=”13px” face=”arial”>This page has
<em>quite a bit</em> of <u>formatting</u> added!</font></td> </tr>
<font size=”13px” face=”arial”><em>Many</em>
formatting elements are <strong>included</strong>.</font> </td>
<font size=”13px” face=”sans serif”> This type of formatting works for
<big>older browsers</big> that do not <small>support CSS</small>! </font>
Listing 11.13 includes many formatting elements. This is okay because these elements are supported by the Transitional DTD for the purposes of backward compatibility. This document is well formed, all elements are in lowercase, and attribute values are quoted. This document is XHTML compliant, according to the Transitional DTD, and it will still work with older Web browsers.
The third type of DTD that we will take a look at is the Frameset DTD.
The XHTML Frameset DTD is designed specifically to work with HTML frame pages. Frame pages are pages in which the browser has been broken up into several semi-inde-pendent navigable windows. Each frame, or window, will have its own content that is maintained in a file separate from the content in the other windows. Normally, one frame will contain navigation links and the other frame serves as the target for the link, loading whatever content the link points to when clicked. A frame page might be useful if you want to be able to load content from another Web site in one frame while keeping your navigation links available in another window. In addition to the files that make up the content for each of the frames, one main frame page “binds” the other frames together. From this main page, you will reference the Frameset DTD. The Frameset DTD contains rules that apply specifically to the special setup of a frame page. In order to reference the XHTML Frameset DTD, use the following Document Type Declaration:
PUBLIC “-//W3C//DTD XHTML 1.0 Frameset//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd”>
Any time you are splitting the Web browser page into two or more frames, you should reference this DTD in order to be XHTML compliant.
Now that you have seen the DTDs that are used with XHTML 1.0, you probably have a dozen or so questions dancing around in your head about the exact differences between XHTML and HTML 4.
Syntax and Definitions
This section explains the syntax requirements for an XHTML document and the differ-ences between XHTML and HTML 4.
XHTML Must Be Well Formed
As mentioned previously, according to XML syntax rules, all elements that are opened in an XHTML document must be closed. This is a departure from HTML, where many elements, such as <p> or <li>, are not closed.
For example, the following would be okay for HTML but not XHTML:
<p>The paragraph element is not closed
In order to be okay for XHTML, the preceding example would have to look like this:
<p>The paragraph element is now closed</p>
For the opening paragraph element, <p>, a closing paragraph element, </p>, has been added.
There are also many empty elements in HTML—the most notable being the <img> ele-ment. You will also see a lot of <br> and <hr>elements in HTML. In XHTML, empty elements are handled just as they are in XML. A slash character (/) is added before the closing “>” symbol. The only difference in XHTML is that, in order to be compliant with today’s Web browsers, a space must be added before the “/” symbol in the element. If this is not done, the element will not be rendered properly. Therefore, the HTML ele-ments<img>, <br>, and <hr> become <img />, <br />, <hr /> in XHTML. This rule should be applied to any empty elements, not just the ones listed here.
Elements must be properly nested. In HTML, elements should be properly nested, but Web browsers are pretty forgiving if they are not. Oftentimes, when looking at an HTML page, you will see something like this:
<p>There elements are not <b>properly nested!</p></b>
Even though the <p> elements and the <b> elements are overlapping and not properly nested, most Web browsers will still properly render the page. In XHTML, this overlap-ping must be corrected as follows
<p>There elements are <b>properly nested!</b></p>
Here, you can see that the nesting has been corrected. The <b> elements are properly contained within the <p> elements.
All Elements and Attributes Must Be Lowercase
This is another departure from HTML. In HTML, elements can be uppercase, lowercase, or even a mixture of cases. Therefore, the elements <br>, <BR>, and <Br> would be rendered identically in HTML. However, in XHTML, only <br> would be correct.
The same rule goes for attribute names. In HTML, there are no case rules for attribute names. In XHTML, attribute names must be lowercase.
Attribute Values Must Always Appear in Quotes
All attribute values must appear in quotes. Both string values and numeric values must appear in quotes as well. In HTML, however, this is optional.
For example, HTML would allow the following:
In XHTML, however, this must be rewritten as follows:
If there is a quote sign or double quote sign in your attribute value, you must use the other quote sign to quote your attribute value. For example, if you have an attribute called lastname, with the value O’Malley, then the attribute would be written lastname=”O’Malley”. In this case, double quotes are used to delimit the value because a single quote is contained in the value.
Attributes May Not Be Minimized
It is common to have attributes in HTML such as checked or nowrap that are minimized. In XHTML, minimization of attributes is not allowed. Attribute/value pairs must be writ-ten out in full.
In HTML, an attribute could be minimized as follows:
In XHTML, in order to be compliant with the Transitional DTD, this would be rewritten like so:
You simply take the minimized value in HTML and turn it into an attribute name/value pair in XHTML.
Script and Style Elements Must Be Enclosed in CDATA Sections
In order to avoid the values of script and style elements being parsed by the XML parser, you should enclose the values in CDATAsections. Listing 11.14 gives an example of this.
LISTING 11.14 Style Element in XHTML
Insert all of the pages style settings here ]]>
CDATA sections will be ignored by the XML parser and sent directly to the Web browser for interpretation and rendering.
Element Identifier References Are to the id Attribute
In HTML, the name attribute and the id attribute are both used to identify specific ele-ments. The id attribute in XHTML is an XML idtype of attribute and therefore uniquely identifies the element throughout the document. In XML, references to the identifier for an element will be to the id attribute.
HTML 4 defines the name attribute for the elements a, applet, form, frame, iframe, img, and map. In XHTML, the name attribute has been deprecated, or marked as outdated by newer constructs, and will be completely removed in future releases. Until support is actually dropped for the name attribute and all Web browsers begin using the id attribute instead, both should be used. Listing 11.15 demonstrates this.
LISTING 11.15 Using the id and name Attributes
<frame id=”frame1” name=”frame1”> Frame content goes here
Here, the id attribute and the name attribute both have the same value: frame1. The id attribute is included to provide an XHTML-valid identifier for this frame element. The name attribute is also included to ensure that existing Web browsers uniquely recognize the element.
You should be fairly comfortable with XHTML 1.0 by now. This would be a good time to start our mini case study and see how a small Internet retailer would use XHTML 1.0 on their Web site.