The XML Path Language (XPath) is a standard for creating expressions that can be used to find specific pieces of information within an XML document. XPath expressions are used by both XSLT (for which XPath provides the core functionality) and XPointer to locate a set of nodes. To understand how XPath works, it helps to imagine an XML doc-ument as a tree of nodes consisting of both elements and attributes. An XPath expression can then be considered a sort of roadmap that indicates the branches of the tree to follow and what limbs hold the information desired. The complete documentation for the XPath recommendation can be found at http://www.w3c.org/TR/xpath.
XPath expressions have the ability to locate nodes based on the nodes’ type, name, or value or by the relationship of the nodes to other nodes within the XML document. In addition to being able to find nodes based on these criteria, an XPath expression can also return any of the following:
A node set
A Boolean value
A string value
A numeric value
XML documents are, in essence, a hierarchical tree of nodes. Curiously, there is a simi-larity between URLs and XPath expressions. Why? Quite simply, URLs represent a navi-gation path of a hierarchical file system, and XPath expressions represent a navigation path for a hierarchical tree of nodes.
Operators and Special Characters
XPath expressions are composed using a set of operators and special characters, each with its own meaning. Table 5.1 lists the various operators and special characters used within the XML Path Language.
TABLE 5.1 Operators and Special Characters for the XML Path Language
Table 5.1 only provides a list of operators and special characters that can be used within an XPath expression. However, the table does not indicate what the order of precedence is. The priority for evaluating XPath expressions is as follows:
The XML Path Language provides a declarative notation, termed a pattern, used to select the desired set of nodes from XML documents. Each pattern describes a set of matching nodes to select from a hierarchical XML document. Each pattern describes a “navigation” path to the desired set of nodes similar to the Uniform Resource Identifier (URI) syntax. However, instead of navigating a file system, the XML Path Language navigates a hierarchical tree of nodes within an XML document.
Each “query” of an XML document occurs from a particular starting node that defines the context for the query. The context for the query has a very large impact on the results. For instance, the pattern that locates a node from the root of an XML document will most likely be a very different pattern when looking for the same node from some-where else in the hierarchy.
As mentioned earlier in this chapter, one possible result from performing an XPath query is a node set, or a collection of nodes matching a specified search criteria. To receive these results, a “location path” is needed to locate the result nodes. These location paths select the resulting node set relative to the current context. A location path is, itself, made up of one or more location steps. Each step is further comprised of three pieces:
A node test
Therefore, the basic syntax for an XPath expression would be something like this:
Using this basic syntax and the XML document in Listing 5.1, we could locate all the <c> nodes by using the following XPath expression:
Alternatively, we could issue the following abbreviated version of the preceding expression:
All XPath expressions are dependant on the current context. The context is the current location within the tree of nodes. Therefore, if we’re currently on the second <b> element within the XML document in Listing 5.1, we can select all the <c> elements contained within that <b> element by using the following XPath expression:
This is what’s known as a “relative” XPath expression.
The axis portion of the location step identifies the hierarchical relationship for the desired nodes from the current context. An axis for a location step could be any of the items listed within Table 5.2.
TABLE 5.2 XPath Axes for a Location Step
All the axes in Table 5.2 depend on the context of the current node. This raises the ques-tion, How do you know what the current context node is? The easiest way to explain this is through example, so let’s use the XML document shown in Listing 5.1 as the basis for the explanation of how the current context node is defined.
LISTING 5.1 Sample1.xml Contains a Simple XML Document
<c d=”Attrib 1”>Text 1</c> <c d=”Attrib 2”>Text 2</c> <c d=”Attrib 3”>Text 3</c>
<c d=”Attrib 4”>Text 4</c> <c d=”Attrib 5”>Text 5</c>
<c d=”Attrib 6”>Text 6</c> <c d=”Attrib 7”>Text 7</c> <c d=”Attrib 8”>Text 8</c> <c d=”Attrib 9”>Text 9</c>
<c d=”Attrib 10”>Text 10</c> <c d=”Attrib 11”>Text 11</c> <c d=”Attrib 12”>Text 12</c>
Using this sample XML document as a reference, and the following XPath query, we can examine how the current context node is determined:
The preceding XPath query consists of three location steps, the first of which is a. The second location step in the XPath query is b, which selects the first <b> element within the <a> element. The final location step is child::*, which selects all (signified by *) child elements of the first <b> element contained within the <a> element. It is important to understand that each location step has a different context node. For the first location step, the current context node is the root of the XML document. It should be noted that the node <a> is not the root of the XML document; it’s the first element within the hierarchy, but the root of an XML document is denoted by “/” as the first character within an XPath query. The context for the second location step is the node <a>. The third location step has the first <b> node as its context.
Now that you have a better understanding of how the context for an XPath query axis is defined, we can look at the resulting node sets for the axes described in Table 5.2. Using the XML document in Listing 5.1, Table 5.3 lists some XPath queries with the various axes and the resulting node sets.
TABLE 5.3 XPath Queries and the Resulting Node Sets
From the contents of Table 5.3, you can see what may be some strange results. However, it’s important to remember that a resulting node set contains the entire hierarchy for the nodes contained within the set. Keeping that in mind, the results for the XPath queries in Table 5.3 begin to make more sense.
The node test portion of a location step indicates the type of node desired for the results. Every axis has a principal node type: If an axis is an element, the principal node type is element; otherwise, it is the type of node the axis can contain. For instance, if the axis is attribute, the principal node type is attribute.
A node test may also contain a node name, or QName. In this case, a node with the speci-fied name is sought, and if found, it’s returned in the node set. However, the nodes selected in this manner must be the principal node type sought and have an expanded name equal to the QName specified. This means that if the node belongs to a namespace, the namespace must also be included in the node test for the node to be selected. For instance, ancestor::div and ancestor::test:div will produce two entirely different node sets. In this first case, only nodes that have no namespace specified and have a name of div will be selected. In the second case, only those div nodes belonging to the test namespace will be selected.
In addition to specifying an actual node name, other node tests are available to select the desired nodes. Here’s a list of these node tests:
As you can see, a small number of node tests are available for use within a location step. The comment() node test selects comment nodes from an XML document. The node() node test selects a node of any type, whereas the text() node test selects those nodes that are text nodes. Special consideration should be given to the processing-instruc-tion() node test, because this node test will accept a literal string parameter to specify the name of a desired processing instruction.
The predicate portion of a location step filters a node set on the specified axis to create a new node set. Each node in the preliminary node set is evaluated against the predicate to see whether it matches the filter criteria. If it does, the node ends up in the filtered node set. Otherwise, it doesn’t.
A predicate may consist of a filter condition that is applied to an axis that either directs the condition in a forward or reverse direction. A forward axis predicate contains the cur-rent context node and nodes that follow the context node. A reverse axis predicate con-tains the current context node and nodes that precede the context node.
A predicate within a location step may contain an expression that, when evaluated, results in a Boolean (or logical) value that can be either True or False. For instance, if the result of the expression is a number, such as in the expression /a/b[position()=2], then the predicate [position()=2] is evaluated for each node in the axis to see whether it is the second node, and if so, it returns True for the predicate. In fact, the expressions for a predicate can get rather complex because you are not limited to one test condition within a predicate—you may use the Boolean operators and and or. Using these two operators, you can create very powerful filter conditions to find the desired node set. Predicates may also consist of a variety of functions.
XPath predicates may, and most probably will, contain a Boolean comparison, as listed in Table 5.4.
TABLE 5.4 Boolean Operators and Their Respective Descriptions
XPath functions are used to evaluate XPath expressions and can be divided into one of four main groups:
Each of these main groups contains a set of functions that deal with specific operations needed with respect to the items covered. Table 5.5 lists each XPath function available as well as the arguments accepted, the return type, and a brief description.
TABLE 5.5 XPath Functions as Recommended by the W3C
As you can see in Table 5.5, a large number of functions are available that perform a myriad of operations. These functions can be used within a location-step predicate to help filter out undesired nodes. They also help in providing functionality that without which would make the XPath language quite limiting.
You have seen the basic construction of each piece of an XPath query, but in truth, it helps to see the XPath expressions and the results for them. Therefore, to help with this and to provide as many examples as possible, we will use the code in Listing 5.2, which provides a good baseline sample XML document we can use for the XPath examples
LISTING 5.2 Sample2.xml Provides the XML Document Against Which the Sample XPath Expressions Will Be Evaluated
<PurchaseOrder Tax=”5.76” Total=”75.77”>
<DeliveryDate>08/12/2001</DeliveryDate> <Name>Dillon Larsen</Name>
<Street>123 Jones Rd.</Street> <City>Houston</City> <State>TX</State> <Zip>77381</Zip>
<PaymentMethod>Credit Card</PaymentMethod> <BillingDate>08/09/2001</BillingDate> <Name>Madi Larsen</Name>
<Street>123 Jones Rd.</Street> <City>Houston</City> <State>TX</State> <Zip>77381</Zip>
<Order SubTotal=”70.01” ItemsSold=”17”>
<Product Name=”Baby Swiss” Id=”702890” Price=”2.89”
<Product Name=”Hard Salami” Id=”302340” Price=”2.34”
<Product Name=”Turkey” Id=”905800” Price=”5.80”
<Product Name=”Caesar Salad” Id=”991687” Price=”2.38”
<Product Name=”Chicken Strips” Id=”133382” Price=”2.50”
<Product Name=”Bread” Id=”298678” Price=”1.08”
<Product Name=”Rolls” Id=”002399” Price=”2.24”
<Product Name=”Cereal” Id=”066510” Price=”2.18”
<Product Name=”Jalapenos” Id=”101005” Price=”1.97”
<Product Name=”Tuna” Id=”000118” Price=”0.92”
<Product Name=”Mayonnaise” Id=”126860” Price=”1.98”
<Product Name=”Top Sirloin” Id=”290502” Price=”9.97”
<Product Name=”Soup” Id=”001254” Price=”1.33”
<Product Name=”Granola Bar” Id=”026460” Price=”2.14”
<Product Name=”Chocolate Milk” Id=”024620” Price=”1.58”
<Product Name=”Spaghetti” Id=”000265” Price=”1.98”
<Product Name=”Laundry Detergent” Id=”148202” Price=”8.82”
➥ Quantity=”1”/> </Order>
As you can see, Listing 5.2 looks very similar to Listing 4.1. This is basically the same sample XML document we used in Chapter 4, “Creating XML Schemas,” but it has been slightly modified to perform as a better example for XPath queries. Using this sample XML document, Table 5.6 contains sample XPath expressions and their respective results.
TABLE 5.6 Sample XPath Queries and Their Results
As you can see from the examples in Table 5.6, XPath expressions can get rather long and complex. For this reason, an abbreviated syntax has also been introduced. Table 5.7 lists the XPath expressions and their respective abbreviations.
TABLE 5.7 XPath Expressions and Their Abbreviations
Using the abbreviations in Table 5.7, the XPath expressions in Table 5.6 can be rewritten as shown in Table 5.8.
TABLE 5.8 Abbreviated XPath Expressions from Table 5.6
Full Expression Abbreviated Expression
TABLE 5.8 continued
Full Expression Abbreviated Expression
➥ [self::ShippingInformation ➥ BillingInformation
➥ or self::BillingInformation]
You can see from the examples in Table 5.8 that not every expression has an abbreviated equivalent. For instance, the XPath expression /PurchaseOrder/Order/Product/fol-lowing-sibling::Product[position()>3] has no abbreviated equivalent.