Architecture
of the Semantic Web
W3C has described a
seven-layer architecture for the Semantic Web, from the bottom to the top:
·
Unicode and URIs
·
XML, XML Schema, and XML Namespaces
·
RDF, RDF Schema, and Topic Maps
·
Ontologies
·
Logic
·
Proof
·
Trust
As you can see, these layers
move from the very concrete to the abstract, even ethereal. Most people have
seen a URI (layer 1), but few (maybe none) have thought through what trust
might mean, let alone how it might be specified so that it can be processed by
machines. The bottom three layers are well specified enough that some issues
can be raised with them—particularly at the interfaces of the layers. The top
four layers are (like our elephant) very much under construction. This section
describes the seven layers. Needless to say, there is a lot more detail in the
concrete layers than in the blue-sky thinking at the top.
Unicode
and URIs
At the lowest layer of the
Semantic Web are specifications for the characters used by documents on the
Semantic Web (Unicode) and for identifying documents (URIs). Because URIs are
made from characters, we’ll look at characters first.
Unicode
In the beginning, there was
ASCII, and that was good. When the World Wide Web was starting out, that is.
The original HTML specifications used the SGML Reference Concrete Syntax, which
boiled down to using ASCII (the American Standard Code for Information
Interchange), and the browser manufacturers followed along. They used the SGML
entity facility to map characters that were not included in the ASCII character
set, such as curly quotes, bullets, and so forth, using (whether or not they
knew it) the famous ISO entity sets with which so many consultants were able to
pad their deliverables.
ASCII was good because it was
a very successful standard (or vice versa). It was totally transparent and
portable—as long as you used the Western European alphabet nd didn’t need any
accented characters. If you needed accents (in French, for example) or
Cyrillic, Greek, or Hebrew characters, then the SGML entity system became more
than a little verbose. (Imagine using β whenever you wanted to write throughout an
entire document!) However, if you used a nonalphabetic writing system, ASCII
was totally unhelpful.
Enter Asian character sets,
which are often nonalphabetic and run to many thousands of characters. After
all, WWW doesn’t stand for Western-European
Wide Web—it stands for World Wide
Web. What to do?
The answer was to build a
bigger code space. ASCII, at maximum, provides for 256 characters—barely enough
for English, and far too few for even one Asian character set, and absurdly
small when Chinese, Japanese, Hangul, Thai, their many historical varia-tions,
as well as all the other Asian characters sets (including Khmer and Vietnamese)
are taken into account.
Unicode built the bigger code
space. (One reason for XML’s worldwide success is that it uses Unicode in the
form of UTF-8.) Unicode is, essentially, a giant code table. It maps
characters, which are abstract entities (such as “LATIN CHARACTER CAPITAL A,” a
Japanese Hiragana syllable, or a Chinese ideogram), to code points, which are
unique numeric values (“U+0041” for Latin “A”), to byte serializations or encodings. Rendering engines, in turn,
map byte serializations to visual representations of characters (such as
capital “A” in the font Garamond), called glyphs.
Unicode does not standardize glyphs.
There are three Unicode
encodings: UTF-32, with one 32-bit code unit per one code point; UTF-16, with
one or two 16-bit code units per code point; and UTF-8, with one to four 8-bit
code units per code point. UTF-8 is the subset of Unicode in which XML
documents are encoded. It aims to preserve the characteristics of ASCII so that
file systems, parsers, and
other software that rely on ASCII remain backward compatible with Unicode.
As already mentioned, Unicode
is required by XML. Other languages and specifications that require Unicode are
Java, ECMAScript (JavaScript), LDAP, and CORBA 3.0. Companies that have adopted
Unicode include Apple, HP, IBM, Microsoft, Oracle, SAP, Sun, Sybase, Unisys,
and other industry leaders. It is supported in all modern browsers, numerous
products, and most operating systems. Therefore, if you’re a developer writing
on a platform that doesn’t handle Unicode or an information owner making the
assump-tion that all characters are ASCII, take a hard look at your
assumptions. Importantly, Unicode “guarantees” that no characters will be
removed or reinterpreted in ways that are incompatible with the existing
standard.
Up until now, we’ve been
using the word Unicode for the
standard that solves our code space problem. In fact, there are two standards:
Unicode
ISO 10646
Unicode is a semicommercial
effort; ISO 10646 is an international standard. Fortunately, in 1991 both
efforts decided that the world didn’t need two competing solutions for the code
space problem and agreed to keep their code tables in sync with each other and
coordinate future extensions. They have kept their agreement. All characters
are at the same positions in the code tables and have the same names in Unicode
and ISO 10646.
Are there differences between
Unicode and ISO 10646? Yes. ISO 10646 is focused mainly on its code tables.
Unicode also gives information relevant to implementers, par-ticularly for
implementers of high-end composition systems. It provides rendering algo-rithms
for scripts (such as Arabic), mixing bidirectional text (such as Latin
left-to-right text and Hebrew right-to-left text), sorting and string
comparisons, and so forth. However, ISO 10646 has more complete coverage of
Chinese, Japanese, and Korean sample glyphs.
Are there differences between
Unicode and ISO 10646 that affect the Semantic Web directly? Again, yes, and
again the answer turns on code space.
First, there are some
differences between the XML view of the world at W3C and the Unicode view of
the world. For example, W3C does not consider some Unicode characters suitable
for XML. W3C feels it is better to use the HTML <BR> tag or some other markup equivalent than to
use Unicode line and paragraph separators. Other issues arise when Unicode
characters are specified to handle functions that are handled, or bet-ter handled,
by markup. Examples here include list item marker characters (better han-dled
with a style sheet), bidirectional text (specified in HTML 4.0), object
replacement (better handled with an HTML src attribute or equivalent), and others.
Potential users of Unicode
also have issues with it, especially Chinese/Japanese/Korean/ Vietnamese (CKJV)
users. Character set issues in general, and East Asian character
sets in particular, are
extremely intricate and culturally bound. (There is one, possibly apocryphal,
story of country representatives feuding over whom would be “first” in the code
table.)
Some CKJV issues are process
issues. Due to the large number of Asian character sets, and the huge numbers
of ideographs within each set, some sets and Unicode does not cover many
characters. Unicode 3.0 has almost 28,000 ideographic charac-ters. However, by
some estimates, there are 160,000 ideographs yet to be standard-ized. Even if
this estimate is high, given that the Unicode process, like any standards
process, is by nature slow, some CKJV users are bound to remain skeptical of
Unicode for some time.
Some of these skeptics will,
of course, be classical scholars who study works that use characters or
character sets that are no longer widely used. However, here we should consider
that these works may have great cultural significance. How would English
writers and speakers feel if the Unicode process had not yet standardized on
the characters used to represent the works of Shakespeare or the New Testa-ment
in Greek?
URIs
Unicode is the first part of
level 1 of the Semantic Web; the URI is the second part. Now, URIs are made out
of characters, so the issues of internationalizing character sets, as we just
discussed, potentially impact URI interoperability. Furthermore, URIs pose
interest-ing technical and philosophical issues in their own right (known as
the identification problem). So, let’s look at URIs and
their impact on the Semantic Web in detail.
On the second day, Tim
Berners-Lee created the Uniform Resource Locator (URL). It, too, was good.
Having in mind the overarching goal of creating a system whereby scien-tists
could communicate their results, he sought to create an addressing system that
could be scribbled on a cocktail napkin in a bar; in this he succeeded. URLs
are definitely human readable, as opposed to the multiline incantations and
arcane sequences of digits that preceded the URL.
Let’s process a URL, just to
get the terminology clear. Here is a sample (fake) URL:
http://jane.books.com/hardbacks/chinese
If you typed this into your
browser (and the URL weren’t fake) here is what would hap-pen: The browser
would divide the URL into scheme, domain name, and path name, and then resolve
the result.
First, the browser would
split the URL on the scheme (before the :// part). That scheme tells the browser how and
to whom to delegate locating a resource. (This “how and to whom” is called a transfer protocol; HTTP is an example.) Next, the
browser looks for a “top-level domain” in the URL (here, .com) and, reading backwards,
passes the domain name jane.books.com to a Domain Name System (DNS) router, which interprets the string
and connects the browser to the jane.books.com server. Finally, Jane’s server resolves the
remaining path name into a Web page, and returns that document to the browser,
which displays it.
So far, so good. The URL
certainly works. But today, we have transitioned from the notion of a URL that locates to a URI that identifies, and in that
transition some prob-lems have arisen:
The internationalization problem (a scalability problem)
The privatization problem (a scalability problem)
The terminology problem (a semantic problem)
The identity problem (a semantic problem)
Let’s turn to the
internationalization problem first. Again, character set issues raise their
ugly heads. On the one hand, URIs are to be 7-bit ASCII (as specified in RFC
2396). On the other hand, we have URIs embedded in XML documents that not only
are UTF-8 Unicode but may contain XML general entities such as &. What to do? The answer:
Perform a mini data conversion effort on-the-fly in the browser, in which the
general entities in the URL sequence are converted to Unicode characters, and
the resulting sequence is converted to UTF-8. In that sequence, each UTF-8 character
is used if it cor-responds to a 7-bit ASCII character. Otherwise, it is escaped
into hex, and written like %hh, where h is one of the two 7-bit ASCII characters making the hexadecimal
number in the Unicode code table for the desired character in UTF-8.
In this process, the original
use-case behind URLs—human readability—has been left far behind. A netizen in
the world’s largest emerging market and oldest continuous commer-cial
civilization (that is, a Chinese person) who wished to scribble a URI on a cocktail
napkin would see nothing but a morass of percent signs (even assuming that all
his char-acters were in UTF-8)! Furthermore, if he wanted to use his URI for
marketing and print it on a billboard (like Western firms do), which version of
his URL would he use? The one in the characters his customers could read, or
the one that they saw in the browser’s address box after conversion to UTF-8?
Our Chinese netizen can’t use his own character set for his own URI! This is
the internationalization problem for URLs in a nutshell.
Enter the second problem: the
privatization problem. Because there is a market for inter-nationalized domain
names, at least one registry service (a service that ensures the uniqueness and
hence the value of domain names as assets) has opened an international-ized
domain service intended to “catalyze” the work of the Internet Engineering Task
Force (IETF) in writing the specifications for internationalized domain names.
Well and good, but what will happen if the catalysis fails? Will the registries
decide to deny their customers services, or will they fragment the World Wide
Web by introducing domain names that are not universally resolvable?
These problems would seem
simple enough to resolve, at least in theory (just like Unicode—make a bigger
name space). Enter problem three: Not all the people who are charged with
resolving these problems use the same terminology. One key nonstandard-ized
piece of terminology, at least in the world of URIs, is the term URI reference.
Recall from earlier the
sample URI
http://jane.books.com/hardbacks/chinese
which addresses an entire
document. What if we only want to address a fragment of that document? We would
use a URI reference, which would look like this:
http://jane.books.com/hardbacks/chinese#id42
Here, id42, in an XML document, would
be the ID of an XML element, or fragment
identifier. However, although the W3C
world uses the term URI reference in
RDF for the combination of a URI and
a fragment identifier, this term is not recognized in the IETF world where URIs
are defined. The impact? Think back to internationalization for a moment—URIs,
being in the scope of IETF efforts, are 7-bit ASCII, but fragment identifiers,
being outside IETF, are not! Our Chinese entrepreneur, still trying to write a
legal URI reference on his dampening cocktail napkin, would end up with percent
signs everywhere up to the hash mark, and then nice, readable (to him)
characters after the hash mark.
Finally, we have the identity
problem. Between the original definition of URLs in RFC 1630 and their
standardization in RDF 2396, URLs became a subset of URIs; what was location
became a subset of identification. But what is it, exactly, that URIs identify?
Now, the change to URIs was
motivated by the perfectly reasonable desire to solve the huge infrastructure
problem that locations on the Web, like it or not, change. If the path name
portion of the URL represents a real path to a real document on a real machine,
the URL will break if the document is renamed or moved, if the machine gets a
new domain name, or if the machine is down. The concept of a URI introduced a
level of indirection, in that the identifier may remain stable, even if the
path name of the document (its loca-tion) changes. Of course, it turns out that
most of this indirection can be managed at the file-system level by the server,
and so URI, which was supposed to turn into an umbrella concept for other UR*s,
such as URC (Uniform Resource Citation) and URN (Uniform Resource Name), really
turned into a fancier name for a URL.
However, there is a crucial
semantic difference between a URL and a URI. Because the conceptual association
between the URI and a physical file has been severed, the URI became free to
identify anything, including resources that are not available on the Net, such
as physical books, the person Jane, and abstract ideas. RDF uses URI references
in just this way (see Chapter 23).
However, let’s put ourselves
in the place of a browser and server once more. When we-the-browser are given a
URI reference, how do we know, from the URI alone
(which is all we do know
about) whether the URI is a document to display or an identifi-cation of a
nonretrievable resource such as the person Jane or the concept of love? Using
the HTTP protocol, how do we-the-server distinguish between a URI that
identifies the person Jane and an Error 404? In Zen terms, how do we
distinguish the pointing finger from the moon?
The answer is that now we
can’t. As of this writing, there is no standard way to distin-guish between
URIs that have failed to retrieve resources and URIs that are not meant to
retrieve resources at all. Today, the Semantic Web suffers from one of the most
basic confusions possible: between the identity of a thing and the thing
itself. There are at least two alternative solutions: URIs could change to
include such a semantic, and the seman-tic could be handled at a higher layer
above URIs.
XML Specifications
XML, XML
namespaces, and XML Schema are covered elsewhere in this book. In this section,
we will focus on one XML specification, XML
Topic Maps (XTM), that addresses the issues that bubble up to the middle
layers of the Semantic Web architecture from the Unicode and URI layers. As
you’ll recall, two of those issues are international-ization and identity.
First, we’ll briefly describe
XTM topic maps. Then we’ll compare and contrast the topic approach to the
problem with the RDF approach.
Topic maps are a
bibliographic solution. They were originally designed to provide an interchange
syntax for finding aids such as indexes, thesauruses, glossaries, and
tax-onomies. Like most solutions that involve bibliographic issues (see the
discussions of DDC and AECMA 1000D), topic maps evolved into a solution for
modeling relations between information resources. Topic maps were evolved
rapidly by the members of a small team working closely together.
Topic maps create
associations among topics, which are electronic proxies for subjects (subjects being subject matter—stuff
people talk about). Subjects can be addressable (a Web page) or nonaddressable
(the person Jane or the concept of love). Users are encour-aged to develop and
use Published Subject Indicators (PSIs) for their nonaddressable subjects.
Unlike RDF, the topic map data model has a number of “pre-reified” constructs
optimized for modeling; these constructs include the topic basename
associations, where a topic can be given a label within a scope.
Topic maps address the
internationalization problem in a way that RDF labels and com-ments do not. In
topic maps, basenames that can be scoped by human languages are built in to the
data model. Therefore, a topic map information overlay can be adjusted to give
topics names that are appropriate to the user, whether the user’s preference is
for English, French, or Chinese. Because RDF labels and comments use the XML lang attribute, which is not part
of the RDF data model, RDF solutions must use ad-hoc solutions for a function
that topic maps build in.
Topic maps address the
identity problem that bubbles up from URIs. RDF has no built-in way to
distinguish between resources that fail to be retrieved (such as a document on
a downed server) and resources that can never be retrieved (such as the concept
love). Topic maps, by explicitly distinguishing in their XML markup between
addressable (that is, retrievable) and nonaddressable subjects, solves the
identity problem.
Ontology
Ontology (like semantics) is
another one of those words that, having been appropriated by the software
community, has gained new meanings. In philosophy, ontology is the study of
being; a formal account of what exists. For the Web, ontology combines taxonomy
with inference rules. A taxonomy is a system that classifies things that exist,
often in the form of a tree. For example, the taxonomy of the animal kingdom
descends from order through kingdom, phylum, class, order, family, genus, and
species. Because animals are classified on the basis of their physical
characteristics, we make inferences: If all members of the genus fish have
gills, and a trout is a fish, then trout have gills.
Ontologists also have a
notion of an ontological commitment—that being the ontology we choose to make
the basis of our inferences. For example, assume we must choose between a
universe that is Euclidean, where the geometry of space is flat, and a universe
that is Riemannian, where it is curved. This ontological commitment will affect
our inferencing rules: If the universe is flat, parallel lines will never meet.
If it is curved, par-allel lines will meet.
Pragmatically, ontologies
exist just so communities of interest can use them to share commitments. A
large part of the library community, for example, has made an ontologi-cal
commitment to the Dewey Decimal Classification (DDC).
However, the boundaries of
the twin notions “taxonomy” and “inferencing rules” are just a bit fuzzy. After
all, a DTD could be seen as a classification system for element types in markup
languages, and a DTD enables inferencing rules for instances of those types in
its content models. Even a URI can be viewed as a taxonomy of URLs, URCs, URNs,
and so forth.
Fortunately, the W3C has
formed an ontology working group, so we may hope for a more precise approach.
This will be necessary for the Semantic Web, because without ontologies to
allow communities of interest to agree formally on the meaning of XML
namespaces, there is no possibility of creating meaning that machines can
understand, because the predicates of RDF statements are implemented with
namespaces.
There
are many existing ontologies the Semantic Web could leverage. Two of the most
important are DDC and MeSH (Medical Subject Headings). There are also many
meta-ontologies: ontology languages to create ontologies. Three of the most
important are Cyc, Conceptual Graphs, and OIL. Let’s look at these ontologies
in detail. Next, we’ll look at the meta-ontologies.
DDC
DDC is the numerical
classification scheme that one sees on the spines of paper books in libraries.
Of course, because DDC classifies the content of books, it could just as well
be used online—for example, as an ontology in the Semantic Web.
Devised by Melvil Dewey in
1873, DDC uses a taxonomy of Arabic numerals separated by dots to create
classifications based on subject matter. For example, 500 represents nat-ural
sciences and mathematics in general, 530 represents physics, and 531 classical
mechanics. (The addressing system of the HyTime-compliant specification AECMA
1000D uses the same numbering principles, but for a taxonomy of part-whole
relation-ships for aircraft assemblies and subassemblies.)
An interesting artifact of
the numbering systems are that short numbers are higher in the taxonomic
hierarchy than long ones, because the more specific the subject matter, the
more digits are required to describe it. DDC therefore implies inferencing
rules, as well as a taxonomy, and is therefore a first-class ontology.
MeSH
MeSH is an ontology, too, but
organized quite differently from DDC. It is a controlled vocabulary thesaurus,
where (as in the writer’s tool Roget’s)
terms are associated with links such as synonym, antonym, homonym, and so
forth. (Chapter 23 shows how such associations could be represented in RDF with
the terms as nodes connected with labeled arcs: True and false would be
two words connected by an arc labeled “antonym,” for example.)
MeSH terms can be organized
both alphabetically and in a conceptual hierarchy. In the former, ankle would follow anatomy; in the latter, ankle
would be a narrower subject under the broader subject anatomy. Therefore, there are multiple points of entry into the
MeSH ontology; this is a general characteristic of information overlays such as
RDF and topic maps. Like DDC, MeSH’s hierarchy implies inferencing rules, and
MeSH is there-fore a first-class ontology.
Conceptual
Graphs
Conceptual Graphs (CG) is a
language in which ontologies can be created. It uses graph structures to
express meaning in a way that humans can read and machines can process. CGs are
similar to RDF when considered as syntax, in that they can be represented both
graphically and textually. Here is an example of CG syntax in text form:
[Go]-
(Agnt)->[Person:
Joe] (Dest)->[City: Manhattan] (Inst)->[Train].
The words in square brackets
([Go])are concepts; the words in
parentheses ((Dest)) are relations. Arcs that connect relations to concepts are shown
as arrows (->). The preced-ing CG statement translates to “Joe is going to
Manhattan by train.” The essential con-cept—the verb—is [Go]; the individual [person: Joe] is an (Agent) that can perform actions
such as [Go]ing to the individual [City:Manhattan] in an (Inst)rumental rela-tion with a [Train].
CG enables the definition of
taxonomies in the form of relations between concepts. CG also supports
inferencing rules in the form of First Order Logic (FOL), which we will briefly
cover when we examine the next layer of the Semantic Web.
Cyc
Cyc (pronounced psych), like CG, is an ontology and an
ontology definition language, but its approach is completely different. Cyc
intends to realize the artificial intelligence (AI) dream of enabling machines
to use common-sense reasoning. However, instead of working from the top down
(the classical AI approach), with the hierarchy’s root and the axioms for
inferencing rules, Cyc works from the bottom up. The Cyc project collects and
formalizes common-sense rules (over one million so far) and allows common-sense
conclusions to emerge from the interactions between the rules.
For example, a common sense
rule is, “If it rains, take an umbrella.” Or so it would seem. In fact, this
rule can be heavily qualified by contextual rules, which also need to be
entered. (For example, the rule presumes that the agent who may or may not take
the umbrella is sane, not dead, not quadriplegic, not on the planet Venus, and
so on.)
Fortunately, after 17 years of
development, enough rules have been collected to allow Cyc to be productized
and a significant portion of the rules base made public. It may be that
Semantic Web developers will consider that this particular wheel need not be
rein-vented; if so, they will find Murray Altheim’s XML version of Cyc a useful
tool.
OIL
Ontology Inferencing Layer
(OIL) is our last ontology definition language. Rather than create an
FOL-enabled syntax from the top down (like CG) or a vast inferencing system
from the bottom up (like Cyc), OIL is a layer on top of the W3C RDF
specification (see Chapter 23).
OIL (not yet a W3C effort,
although cited there inherits concepts from three communi-ties. First, the AI
community supplies OIL with the notion of “frames
with slots,” or, in object-oriented
(OO) terms, with classes that have attributes. Second, the knowledge
rep-resentation community brings the notions of concepts and roles in
description logic, which OIL maps to the notions of classes and attributes,
respectively. Description logic (unlike the OO notation) has well-understood
mathematical properties that enable infer-encing rules. Finally, the markup
technology community brings XML syntax and the RDF modeling primitives (instanceOf and subClassOf). OIL extends RDF to create
a full-fledged modeling language.
Here is an example of OIL
syntax:
class-def Book slot-def Price
domain Product class-def Janes Book subclass-of Book
slot-constraint
PublishedBy has-value “Janes Publisher”
This is a typical OO class
hierarchy. Because Jane’s book is a subclass of book, it inher-its the
properties of book (such as price). Whether the OO hierarchy is necessary or
suffi-cient to create an ontology seems an open question: neither CG nor Cyc
are considered OO technologies.
Logic
If RDF provides for a
statement using a simple subject/object/predicate model, we can think of First
Order Logic (FOL) in the Logic layer as enhancing RDF statements with richer
syntax and more semantic power, within a world or universe of discourse
specified by the ontological commitments made in the Ontological layer. (If our
ontological com-mitment is to the DDC, we will reason from different premises
than we would if our commitment was to MeSH, Cyc, or DAML.)
FOL statements are richer
syntactically than RDF statements because they can use con-structs that are
like natural language conjunctions (Boolean operators, such as “and”) and
demonstratives (variables, such as “this”). FOL statements are more powerful
because they can explicitly assert what is true—part or all of a world (the
quantifiers, such as “there exists”). Other forms of logic may coexist with FOL
on the Logic layer of the Semantic Web, but FOL provides a baseline
functionality.
FOL typically uses a formal
notation of its own (called Peano
notation), but in this sec-tion, for accessibility, we’ll just use
English-like phrases in italic. For the existential quantifier ∃ , for example, we will write the words there exists.
Briefly, here are the
informal synthesis of the key concepts in FOL. We will elaborate the statement
“Roses are red” in our examples.
The following concepts are
defined by the user:
Constants. Individuals in the world
(resources, such as “rose”).
Functions. Mapping resource to resource
(properties, such as “color-of(rose)
= red”).
Relations. Mapping resources to truth
values (true and false).
These concepts are supplied
by FOL itself:
Variable symbols. x and y.
Boolean values. True and false.
Conjunction. Rose is red, and violet
is blue.
Disjunction. Rose is red, or rose
is yellow.
Negation. Rose is not blue.
Inference. If rose is red, then violet
is blue.
Equivalence. Rose is red if and only if sugar is sweet. There are two kinds of quantifiers:
Universal. For all x.
Existential. There is an x.
FOL also has assembly rules
building up sentences from terms and atoms:
A term (denoting a
resource) is a constant symbol, a variable symbol, or a function of n terms.
An atom (which has a
truth value) is either a relation of n
terms or two atoms con-nected with and
or or.
A sentence is an atom,
or, if S is a sentence and x is a variable, then a sentence is
preceded by an existential quantifier: There
exists an x such that x is S.
(Note that the existential quantifier implicitly apples only within our world.)
A well-formed formula (WFF)
is a sentence with all variables “bound” by universal or existential
quantifiers. For example, “There is a
rose such that roses are red” has
“rose” as a universally quantified variable, but not “red.”
Notice that rules 1 through 3
are recursive: Terms are defined in terms of, well, terms, atoms in terms of
atoms, and sentences in terms of sentences. We’ll use this characteris-tic to
break the following example into its components.
Now, why on earth would these
rules matter to the Semantic Web? Let’s take the sen-tence with which we began
this chapter:
Buy me a
hard-cover copy of Jane’s book, in Chinese, if available for less than $39.95.
This is a pretty complicated
sentence, and a machine might well wish to break it down into its component
parts. Those parts turn out to be easily expressible in FOL.
First, let’s replace “Jane’s
book” with the variable book and make
that variable explicit where the rules of English (unlike the rules of logic)
allow it to be implicit:
Buy me a hard-cover copy of book, in Chinese, if book is available for less than $39.95.
Now, because the definition
of “sentence” is recursive, let’s break this sentence down into sentences. The
English is deceptive—it looks like we are saying “There is a hard-cover copy of
book”, but, in fact, because we can’t buy the book if it doesn’t exist, we’ll
translate the phrase to a relation, because a relation has a truth value. We’ll
also assign each sentence to a constant:
A := book (hard-cover)
B := book (Chinese)
X := book (available)
Y := book (price,
< $39.95)
Now, because sentences are
atoms, and atoms can be connected with Booleans, we can construct the following
inference:
If ((A
and B) and (X and Y)) then…
This means that we will
purchase Jane’s book only if it is available, in Chinese, in a hard-cover
version, and at the right price. Notice that all the sentences can be easily
rep-resented by RDF statements (as shown in Chapter 23), but the FOL variables
abstract away from the RDF syntax, which shows the layered design of the
Semantic Web in action to a good effect.
But how to express the
imperative “Buy me the book”? First, we need to make a little more explicit
what is implicit. How would we represent the purchase? With the following
sentence:
P := book (purchased)
Our sentence now reads like
so:
If ((A
and B) and (X and Y)) then P.
As it turns out, we have now
represented a transaction in FOL. But as you know, an entire transaction has a
truth value, because it takes place in the time between the “if” and “then”:
The last copy of Jane’s book might be sold, in which case we would want to roll
back the credit card purchase statement in P. Can we represent this in FOL? For
con-venience, we would put our transaction sentence into a variable:
T := if
((A and B) and (X and Y)) then P.
However, all we really need
to do is assert:
T
Why? A sentence is an atom,
and atoms have truth values. Therefore, if T is true, our Semantic Web–aware
system should commit the transaction. Otherwise, it should not.
Proof
The
Logic layer provides a language for describing the truth or falsity of
statements we might make in a universe of discourse. But suppose we want to
question the conclusions of the Logic layer? We would get the Proof layer to
expose the steps in the reasoning that led the Logic layer to make the
inference it did. For example, the truth of the FOL relation
Book (Chinese)
might be proved by exhibiting
the value “Mandarin” in the book title’s XML lang attribute.
The vision is that once an
XML-based interchange syntax for proofs is developed, Semantic Web users
(whether machine or human) will begin to exchange proofs as well as to mix and match
them—and in a process akin to evolutionary programming, good proofs will drive
out bad ones.
If the Logic layer of the
Semantic Web enters uncharted territory, the Proof layer enters the whitespace
on the map. There are no W3C specifications in process for it. The closest
implementation experiences to the Proof layer seem to fall into two
disciplines:
Formal methods for proving programs correct
Automated theorem proving
Neither technology has gained
broad acceptance, although both have been used with suc-cess on extremely
large-scale projects. Automated theorem proving is used in hardware
verification by chip manufacturers, for example.
Trust
If the Proof layer enters the
whitespace of the Semantic Web’s conceptual map, the Trust layer is deep within
it. Remember, again, that the Logic layer provides a language for describing
the truth or falsity about statements we might make in a universe of discourse.
Suppose, again, that we don’t trust the conclusions of the Logic layer, but we
don’t have the time or the inclination to run a proof. What to do? In the
physical world, we might ask a friend whose judgment we trust whether she
trusts the Logic layer to come to the right conclusion on the facts given to
it. On the Semantic Web, we ask a network of friends the same question. This is
the notion of a “web of trust.”
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.