Historically, RDF grew out of the need to specify a general-purpose mapping between specific bibliographic solutions using particular vocabularies. Here are some examples:
The W3C Platform for Internet Content Selection (PICS)
The Dublin Core (simple bibliographic data for Web pages)
Site maps, subject taxonomies, thesauruses, and library classification systems
The W3C Platform for Privacy Protection (P3P)
In this section, we look at how to use RDF schema to constrain, validate, document, and extend RDF vocabularies using the RDF typing system.
Validity in RDF Schema
RDF schema meets its requirement for generality by being a vocabulary for vocabularies (just as XML and SGML are languages for defining languages). We might imagine our teacher, still at the blackboard, being asked, “How do I know that this statement makes sense, even if all the words are in the right place?” She might answer, “Because some words can only be used with other words. Jane can sell books, but books can’t sell Jane.”
In RDF-speak, unlike general usage, a vocabulary is not just a list of words. Rather, an RDF vocabulary may be said to define:
Subject validity (which predicates go with which subjects)
Object validity (which predicates go with which objects)
For example, in the “Jane sells books” examples, we have a set of resources: [Jane], [sells], and [books]. If we define a schema against which to check this statement, [Jane] could be the subject in a statement where [sells] is the predicate. This is sub-ject validity. Likewise [books] could be the object in a statement where [sells] is the predicate (and [Jane] could not). That is object validity.
RDF schema can define a vocabulary that accomplishes these validity constraints through what is called its typing system.
The RDFS Typing System
To “connect the dots” in the RDF typing system, we need to cover two concepts:
rdf:type enables class/instance statements to be made. When a resource has a type, the resource is the object in a statement where the predicate is [rdf:type] and the subject is the type. Figure 23.17 shows an RDF statement that says that [Rover] is a type of [dog]. Rover can also be said to be an instance of the class dog, and every statement about the class of dogs can be made about Rover. However, instances are unique: There is only one Rover—this Rover.
rdfs:subclassOf enables subset/superset statements to be made. The class is the super-set; the subclass is the subset. When a resource is a subclass, the resource is the object in a statement where the predicate is rdfs:subClassOf and the subject is an RDF class.
Figure 23.18 shows that the class [dog] is a subclass of the class [animal] and that [animal] is a subclass of [living being]. You can also see that, in RDF, a class may be a subclass of more than one class; the class of dogs may also be a subset of the class of [companion]—a “companion animal” being (often) a pet.
Class relations are said to be transitive. If class [dog] is a subclass of the broader class [animal], and [animal] is a subclass of [living being], then [dog] is also implicitly a subclass of [living being]. Figure 23.19 shows this relationship: There are not only arcs A and B between [living being] and [animal], and between [animal] and [dog], but also arc C, drawn explicitly between [living being] and [dog]. However, although such implicit arcs are present in the RDF graph, we generally simplify the pictorial repre-sentation of class relations to keep the graph less cluttered by leaving them out. A human can trace the class relations upward or downward, if necessary.
We will look at more subtleties of classes, subclasses, and typing later when we look in more detail at the RDF hierarchy.
We now have what we need for an overview of the RDF class hierarchy. All the predicates in the RDFS typing system are either rdfs:subClassOf or rdf:type, as is shown in Figure 23.20. (The single exception to this rule is a use of rdfs:subproperty, discussed later.) The 16 RDF schema resources are divided into the following
Also, each schema resource is represented by a node.
We will discuss these categories and their resources, in order, in the remainder of this section (although validation concepts are divided into two parts).
We’ll start with validation—even though it is nearer the bottom of the RDF class hierar-chy than the top—because that’s the operation many information owners will want to perform on their data, just as they want a database schema to control the quality of their RDBMS and they want an XML DTD or XML schema to provide some level of quality assurance for their data. Recall that there are two forms of schema validation in RDF: object validity and subject validity. rdfs:domain handles subject validity; rdfs:range handles object validity.
rdfs:domain is a type of rdfs:ConstraintProperty. It constrains the classes of subjects (resources) for which the property is a valid predicate. If a property has no domain, it can be the predicate of any subject. A property may have more than one range. If a property has more than one rdfs:domain constraint, it may be the predicate of subjects that are subclasses of any one or all of the specified classes.
The range and domain of the rdfs:domain concept are specified only in a comment, so there is no pictorial representation of rdfs:domain.
rdfs:range is a type of rdfs:ConstraintProperty. It constrains the classes of objects (resources) for which the property is a valid predicate. A property doesn’t have to have a range. If so, the property can be used as an object in any statement. However, when imposed, the constrains of rdfs:range are stronger than those imposed by rdfs:domain. First, a property can have only one range. Second, the domain (subject) of an rdfs:range predicate must be an rdf:property, and its range (object) must be an rdfs:Class.
Now that you understand the key concern of information owners—validation—let’s move to the top of the RDF class hierarchy.
rdfs:Resource is the root of the RDF class hierarchy (refer back to Figure 23.8). All things described by RDF expressions—all nodes and labels in the RDF graph—are instances of rdfs:Resource. rdfs:Resource is also a class. In fact, rdfs:Resource is a type of rdfs:Class, and rdfs:Class is a subclass of rdfs:Resource. (Remember, the RDF graph permits cycles!)
rdf:property represents the subset of RDF resources that are properties (see Table 23.1). rdf:property is a type of rdfs:Class and a subclass of rdfs:Resource. rdf:property has the “rdf” namespace prefix, rather than the “rdfs” prefix, because the RDF model has implicit properties, even if it lacks a schema.
As you have seen, rdf:type indicates that a resource is an instance of a specified class. That class must be an instance of rfds:Class or a subclass of rdfs:Class. This state-ment is true for the resource that is known as rdfs:Class, which is a type of itself. (The RDF graph, again, permits loops.)
Like rdf:property, rdf:type has the “rdf” namespace prefix, rather than the “rdfs” prefix, because the RDF model has implicit types, even if it lacks a schema.
The class hierarchy in RDF is set up by with rdfs:class, rdfs:subClassOf, and rdfs:subPropertyOf (and rdf:type, which we’ve already looked at). Let’s look at these three elements.
As you have seen, rdfs:Class is both a type of resource and a subclass of itself. However, RDF classes are both like and unlike classes as OO programmers may think of them. RDF classes are like OO classes in that, through transitivity, they can specify broad-to-specific categories such as “living being to animal to dog.” RDF classes are unlike OO classes, first, because they have no methods—they don’t do anything. (Markup never does.) Second, RDF classes could be called extrinsic rather intrinsic. Instead of defining a class in terms of features intrinsic to its instances, an RDF schema will define predicates in terms of the classes of subject or object to which they may be applied, extrinsically. (This allows testing for subject and object validity.)
Theoretically, URIs representing HTML documents, dogs, books, databases, and abstract concepts could all be members of the same class—the class of things that can be repre-sented by RDF.
rdfs:subClassOf is a type of rdf:property. It specifies a subset/superset relation between classes—a relation that is transitive, as you have seen. Only instances of the type rdfs:Class may have an rdfs:type property whose value is rdfs:Class. Importantly, a class can never be declared to be a subclass of itself or any of its own sub-classes. (The RDF Schema specification cannot express this constraint formally, though it is expressed in prose.) Therefore, although the RDF graph may contain cycles, the class/subclass inheritance hierarchy that is a subgraph of the RDF graph remains a tree, whose nodes are only instances of rdfs:Class. Finally, RDF (unlike most object-ori-ented programming languages) permits multiple inheritance—that is, a class may be a subclass of several classes. (It could hardly be otherwise, because the Semantic Web must permit arbitrary combinations of RDF statements taken from multiple systems, each of which may have its own inheritance hierarchy.)
rdfs:subProperty is a type of rdf:property. It enables properties to be specialized—a process similar to inheritance, except for properties instead of classes. Like the subClassOf predicate, subPropertyOf is transitive and forms a hierarchy that is a proper tree, like rdfs:subClassOf. Multiple specializations are also permitted.
Documentation allows human-readable text to be attached to a resource, either as a label or a comment. Because the content of the documentation elements is only data, not state-ments, it does not affect the RDF graph in any way and therefore does not enable machine understanding of the resource.
rdfs:label provides for a human-readable representation of a URI, perhaps for display. The domain (subject) of a label predicate must be an rdfs:resource. The range (object) must an rdf:literal.
rdfs:comment permits human-readable documentation to be associated with a resource. The domain (subject) of a comment predicate must be an rdfs:resource. The range (object) must an rdf:literal.
rdfs:seeAlso is a cross-reference that gives more information about a resource. The nature of the information provided is not defined. The domain (subject) and range (object) of an rdfs:seeAlso predicate must both be rdfs:resource elements.
rdfs:isDefinedBy is a subproperty of rdfs:seeAlso. It’s URI is meant to be the address of the RDF Schema for the subject resource. The domain (subject) and range (object) of an rdfs:isDefinedBy predicate must both be rdfs:resource elements.
We now turn to the issue of constraints in general (that is, beyond the constraints on domain and range, discussed earlier). At this point, there’s one caveat: Because markup doesn’t do anything, RDFS doesn’t say what an application must do if a constraint is vio-lated. That is up to the application.
rdfs:ConstraintProperty is a subclass of both rdfs:ConstraintResource and rdf:property. Both rdfs:domain and rdfs:range are instances of it.
rdfs:ConstraintResource is a type of rdfs:Class and a type of rdfs:Resource. It is present in the model so that other constraint properties besides domain and range may be subclassed from it.
This type of validation is called “non-model” because expressing the notion that a literal should be checked for being a literal or that the auto-generated counter for container children should be derived from the actual number of children is something the RDF engine would have to do, not the data model.
rdfs:literal is a type of rdfs:Class. An rdfs:literal can contain atomic values such as textual strings. The XML lang attribute can be used to express the fact that a literal is in a human language, but this information does not become a statement in the graph.
rdfs:ContainerMemberShip is a type of rdfs:class and subclass of rdf:property. Its members are the properties _1, _2, _3, and so on (the order in which the children of a container appear in the container, under the ord component of the data model).