XML Languages
There have been several proposals for XML query languages, and two query
language standards have emerged. The first is XPath, which provides language constructs for specifying path
expressions to identify certain nodes (elements) or attributes within an XML
document that match specific patterns. The second is XQuery, which is a more general query language. XQuery uses XPath
expressions but has additional constructs. We give an overview of each of these
languages in this section. Then we discuss some additional languages related to
HTML in Section 12.5.3.
1. XPath: Specifying
Path Expressions in XML
An XPath expression generally returns a sequence of items that satisfy a certain
pat-tern as specified by the expression. These items are either values (from
leaf nodes) or elements or attributes. The most common type of XPath expression returns a collection of element or attribute nodes that
satisfy certain patterns specified in the expression. The names in the XPath expression are node names in the XML document tree that are either tag
(element) names or attribute names, possibly with additional qualifier conditions to further
restrict the nodes that satisfy the pattern. Two main separators are used when specifying a path: single slash (/) and
double slash (//). A single slash before a tag specifies that the tag must
appear as a direct child of the previous (parent) tag, whereas a double slash
specifies that the tag can appear as a descendant of the previous tag at any level. Let us look at some
examples of XPath as shown in Figure 12.6.
The first XPath expression in Figure 12.6 returns the company root node and all its descendant nodes, which means that it returns the
whole XML document. We should note that it is customary to include the file
name in the XPath query. This allows us to specify any local file name or even any path
name that specifies a file on the Web. For example, if the COMPANY XML document is stored at the location
www.company.com/info.XML
then the first XPath expression in Figure 12.6 can be written as
doc(www.company.com/info.XML)/company
This prefix would also be included in the other examples of XPath expressions.
The second example in Figure 12.6 returns all department nodes
(elements) and their descendant subtrees. Note that the nodes (elements) in an
XML document are ordered, so the XPath result
that returns multiple nodes will do so in the same order in which the nodes are
ordered in the document tree.
The third XPath expression in Figure 12.6 illustrates the use of //, which is convenient to use if we do not know the full path name we are searching for, but do know the name of some tags of interest within the XML document. This is particularly useful for schemaless XML documents or for documents with many nested levels of nodes.
The expression returns all employeeName nodes
that are direct children of an employee node, such that the employee node has another child element employeeSalary whose
value is greater than 70000. This illustrates the use of
qualifier conditions, which restrict the nodes selected by the XPath expression to those that satisfy the condition. XPath has a number of comparison operations for use in qualifier conditions,
including standard arithmetic, string, and set comparison operations.
The fourth XPath expression in Figure 12.6 should return the same result as the
pre-vious one, except that we specified the full path name in this example. The
fifth expression in Figure 12.6 returns all projectWorker nodes
and their descendant nodes that are children under a path /company/project and have a child node hours with a value greater than 20.0 hours.
When we need to include attributes in an XPath
expression, the attribute name is prefixed by the @ symbol to distinguish it
from element (tag) names. It is also possible to use the wildcard symbol *, which stands for any element, as in the
following example, which retrieves all elements that are child elements of the
root, regardless of their element type. When wildcards are used, the result can
be a sequence of different types of items.
/company/*
The examples above illustrate simple XPath
expressions, where we can only move down in the tree structure from a given
node. A more general model for path expressions has been proposed. In this
model, it is possible to move in multiple directions from the current node in
the path expression. These are known as the axes of an XPath expression. Our examples above used only three of these axes:
child of the current node (/),
descendent or self at any level of the current node (//), and attribute of the
current node (@). Other axes include parent, ancestor (at any level), previous
sibling (any node at same level to the left in the tree), and next sibling (any
node at the same level to the right in the tree). These axes allow for more
complex path expressions.
The main restriction of XPath path expressions is that the
path that specifies the pat-tern also specifies the items to be retrieved.
Hence, it is difficult to specify certain conditions on the pattern while
separately specifying which result items should be retrieved. The XQuery language separates these two concerns, and provides more powerful
constructs for specifying queries.
2. XQuery: Specifying
Queries in XML
XPath allows us to write expressions that select items from a tree-structured
XML document. XQuery permits the specification of more general queries on one or more XML
documents. The typical form of a query in XQuery is known
as a FLWR expression, which stands for the four main clauses of XQuery and has
the following form:
FOR <variable bindings
to individual nodes (elements)>
LET <variable bindings
to collections of nodes (elements)>
WHERE <qualifier
conditions>
RETURN <query result
specification>
There can be zero or more instances of the FOR clause, as well as of the LET clause
in a single XQuery. The WHERE clause is optional, but can appear at most once, and the RETURN clause must appear exactly once. Let us illustrate these clauses with
the fol-lowing simple example of an XQuery.
LET $d := doc(www.company.com/info.xml)
FOR $x IN
$d/company/project[projectNumber = 5]/projectWorker, $y IN $d/company/employee
WHERE $x/hours gt 20.0 AND
$y.ssn = $x.ssn
RETURN <res>
$y/employeeName/firstName, $y/employeeName/lastName, $x/hours </res>
Variables are prefixed with the $
sign. In the above example, $d, $x, and $y are variables.
The LET clause assigns a variable to a particular expression for the rest of
the query. In this example, $d is assigned to the document file
name. It is possi-ble to have a query that refers to multiple documents by
assigning multiple variables in this way.
The FOR clause assigns a variable to range over each of the individual items in a sequence. In our example, the sequences are specified by path expressions. The $x variable ranges over elements that satisfy the path expression $d/company/project[projectNumber = 5]/projectWorker. The $y variable ranges over elements that satisfy the path expression $d/company/employee. Hence, $x ranges over projectWorker elements, whereas $y ranges over employee elements.
The WHERE clause specifies additional conditions on the selection of items. In
this example, the first condition selects only those projectWorker elements that satisfy the condition (hours gt 20.0). The
second condition specifies a join condition that combines an employee with a projectWorker only if they have the same ssn value.
Finally, the RETURN clause specifies which elements
or attributes should be retrieved from the items that satisfy the query
conditions. In this example, it will return a sequence of elements each
containing <firstName,
lastName, hours> for
employees who work more that 20 hours per week on project number 5.
Figure 12.7 includes some additional examples of queries in XQuery that can be specified on an XML instance documents that follow the XML
schema document in Figure 12.5. The first query retrieves the first and last
names of employees who earn more than $70,000. The variable $x is bound to each employeeName element that is a child of an employee element, but only for employee elements that satisfy the
quali-fier that their employeeSalary value is greater than $70,000.
The result retrieves the firstName and lastName child elements of the selected employeeName
elements. The second query is an alternative way of retrieving the same
elements retrieved by the first query.
The third query illustrates how a join operation can be performed by
using more than one variable. Here, the $x variable
is bound to each projectWorker element that is a child of project number 5, whereas the $y variable is bound to each employee
ele-ment. The join condition matches ssn values
in order to retrieve the employee names. Notice that this is an alternative way
of specifying the same query in our ear-lier example, but without the LET clause.
XQuery has very powerful constructs to specify complex queries. In particular,
it can
specify universal and existential quantifiers in
the conditions of a query, aggregate functions, ordering of query results,
selection based on position in a sequence, and even conditional branching.
Hence, in some ways, it qualifies as a full-fledged programming language.
This concludes our brief introduction to XQuery. The
interested reader is referred to www.w3.org, which contains documents
describing the latest standards related to XML and XQuery. The next section briefly discusses some additional languages and
protocols related to XML.
3. Other Languages and
Protocols Related to XML
There are several other languages and protocols related to XML
technology. The long-term goal of these and other languages and protocols is to
provide the technology for realization of the Semantic Web, where all
information in the Web can be intelligently located and processed.
The Extensible Stylesheet
Language (XSL) can be used to define how a document should be rendered for
display by a Web browser.
The Extensible Stylesheet
Language for Transformations (XSLT) can be used to transform one structure into
a different structure. Hence, it can convert documents from one form to
another.
The Web Services Description
Language (WSDL) allows for the description of Web Services in XML. This makes
the Web Service available to users and programs over the Web.
The Simple Object Access Protocol
(SOAP) is a platform-independent and programming language-independent protocol
for messaging and remote procedure calls.
The Resource Description
Framework (RDF) provides languages and tools for exchanging and processing of
meta-data (schema) descriptions and specifications over the Web.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.