Chapter: Fundamentals of Database Systems : Object, Object-Relational, and XML: Concepts, Models, Languages, and Standards : Object and Object-Relational Databases

The Object Query Language OQL

1. Simple OQL Queries, Database Entry Points, and Iterator Variables 2. Query Results and Path Expressions 3. Other Features of OQL

The Object Query Language OQL

The object query language OQL is the query language proposed for the ODMG object model. It is designed to work closely with the programming languages for which an ODMG binding is defined, such as C++, Smalltalk, and Java. Hence, an OQL query embedded into one of these programming languages can return objects that match the type system of that language. Additionally, the implementations of class operations in an ODMG schema can have their code written in these programming languages. The OQL syntax for queries is similar to the syntax of the relational standard query language SQL, with additional features for ODMG concepts, such as object identity, complex objects, operations, inheritance, polymorphism, and relationships.

In Section 11.5.1 we will discuss the syntax of simple OQL queries and the concept of using named objects or extents as database entry points. Then, in Section 11.5.2 we will discuss the structure of query results and the use of path expressions to traverse relationships among objects. Other OQL features for handling object identity, inheritance, polymorphism, and other object-oriented concepts are discussed in Section 11.5.3. The examples to illustrate OQL queries are based on the UNIVERSITY database schema given in Figure 11.10.

1. Simple OQL Queries, Database Entry Points, and Iterator Variables

The basic OQL syntax is a select ... from ... where ... structure, as it is for SQL. For example, the query to retrieve the names of all departments in the college of ‘Engineering’ can be written as follows:

Q0: select D.Dname

from D in DEPARTMENTS

where D.College = ‘Engineering’;

In general, an entry point to the database is needed for each query, which can be any named persistent object. For many queries, the entry point is the name of the extent of a class. Recall that the extent name is considered to be the name of a persistent object whose type is a collection (in most cases, a set) of objects from the class. Looking at the extent names in Figure 11.10, the named object DEPARTMENTS is of type set<DEPARTMENT>; PERSONS is of type set<PERSON>; FACULTY is of type set<FACULTY>; and so on.

The use of an extent name—DEPARTMENTS in Q0—as an entry point refers to a persistent collection of objects. Whenever a collection is referenced in an OQL query, we should define an iterator variable—D in Q0—that ranges over each object in the collection. In many cases, as in Q0, the query will select certain objects from the collection, based on the conditions specified in the where clause. In Q0, only persistent objects D in the collection of DEPARTMENTS that satisfy the condition D.College = ‘Engineering’ are selected for the query result. For each selected object D, the value of D.Dname is retrieved in the query result. Hence, the type of the result for Q0 is bag<string> because the type of each Dname value is string (even though the actual result is a set because Dname is a key attribute). In general, the result of a query would be of type bag for select ... from ... and of type set for select distinct ... from ... , as in SQL (adding the keyword distinct eliminates duplicates).

Using the example in Q0, there are three syntactic options for specifying iterator variables:

D in DEPARTMENTS

DEPARTMENTS D

DEPARTMENTS AS D

We will use the first construct in our examples.

The named objects used as database entry points for OQL queries are not limited to the names of extents. Any named persistent object, whether it refers to an atomic (single) object or to a collection object, can be used as a database entry point.

2. Query Results and Path Expressions

In general, the result of a query can be of any type that can be expressed in the ODMG object model. A query does not have to follow the select ... from ... where ...

structure; in the simplest case, any persistent name on its own is a query, whose result is a reference to that persistent object. For example, the query

Q1: DEPARTMENTS;

returns a reference to the collection of all persistent DEPARTMENT objects, whose type is set<DEPARTMENT>. Similarly, suppose we had given (via the database bind operation, see Figure 11.8) a persistent name CS_DEPARTMENT to a single DEPARTMENT object (the Computer Science department); then, the query

Q1A: CS_DEPARTMENT;

returns a reference to that individual object of type DEPARTMENT. Once an entry point is specified, the concept of a path expression can be used to specify a path to related attributes and objects. A path expression typically starts at a persistent object name, or at the iterator variable that ranges over individual objects in a collection. This name will be followed by zero or more relationship names or attribute names connected using the dot notation. For example, referring to the UNIVERSITY data-base in Figure 11.10, the following are examples of path expressions, which are also valid queries in OQL:

Q2: CS_DEPARTMENT.Chair;

Q2A: CS_DEPARTMENT.Chair.Rank;

Q2B: CS_DEPARTMENT.Has_faculty;

The first expression Q2 returns an object of type FACULTY, because that is the type of the attribute Chair of the DEPARTMENT class. This will be a reference to the FACULTY object that is related to the DEPARTMENT object whose persistent name is CS_DEPARTMENT via the attribute Chair; that is, a reference to the FACULTY object who is chairperson of the Computer Science department. The second expression Q2A is similar, except that it returns the Rank of this FACULTY object (the Computer Science chair) rather than the object reference; hence, the type returned by Q2A is string, which is the data type for the Rank attribute of the FACULTY class.

Path expressions Q2 and Q2A return single values, because the attributes Chair (of DEPARTMENT) and Rank (of FACULTY) are both single-valued and they are applied to a single object. The third expression, Q2B, is different; it returns an object of type set<FACULTY> even when applied to a single object, because that is the type of the relationship Has_faculty of the DEPARTMENT class. The collection returned will include references to all FACULTY objects that are related to the DEPARTMENT object whose persistent name is CS_DEPARTMENT via the relationship Has_faculty; that is, references to all FACULTY objects who are working in the Computer Science department. Now, to return the ranks of Computer Science faculty, we cannot write

Q3 : CS_DEPARTMENT.Has_faculty.Rank;

because it is not clear whether the object returned would be of type set<string> or bag<string> (the latter being more likely, since multiple faculty may share the same rank). Because of this type of ambiguity problem, OQL does not allow expressions such as Q3 . Rather, one must use an iterator variable over any collections, as in Q3A or Q3B below:

Q3A: select F.Rank

from F in CS_DEPARTMENT.Has_faculty;

Q3B: select distinct F.Rank

from F in CS_DEPARTMENT.Has_faculty;

Here, Q3A returns bag<string> (duplicate rank values appear in the result), whereas Q3B returns set<string> (duplicates are eliminated via the distinct keyword). Both Q3A and Q3B illustrate how an iterator variable can be defined in the from clause to range over a restricted collection specified in the query. The variable F in Q3A and Q3B ranges over the elements of the collection CS_DEPARTMENT.Has_faculty, which is of type set<FACULTY>, and includes only those faculty who are members of the Computer Science department.

In general, an OQL query can return a result with a complex structure specified in the query itself by utilizing the struct keyword. Consider the following examples:

Q4: CS_DEPARTMENT.Chair.Advises;

Q4A: select struct ( name: struct (last_name: S.name.Lname, first_name: S.name.Fname),

degrees:( select struct (deg: D.Degree, yr: D.Year,

college: D.College) from D in S.Degrees ))

from S in CS_DEPARTMENT.Chair.Advises;

Here, Q4 is straightforward, returning an object of type set<GRAD_STUDENT> as its result; this is the collection of graduate students who are advised by the chair of the Computer Science department. Now, suppose that a query is needed to retrieve the last and first names of these graduate students, plus the list of previous degrees of each. This can be written as in Q4A, where the variable S ranges over the collec-tion of graduate students advised by the chairperson, and the variable D ranges over the degrees of each such student S. The type of the result of Q4A is a collection of (first-level) structs where each struct has two components: name and degrees.

The name component is a further struct made up of last_name and first_name, each being a single string. The degrees component is defined by an embedded query and is itself a collection of further (second level) structs, each with three string compo-nents: deg, yr, and college.

Note that OQL is orthogonal with respect to specifying path expressions. That is, attributes, relationships, and operation names (methods) can be used interchange-ably within the path expressions, as long as the type system of OQL is not compro-mised. For example, one can write the following queries to retrieve the grade point average of all senior students majoring in Computer Science, with the result ordered by GPA, and within that by last and first name:

Q5A: select struct ( last_name: S.name.Lname, first_name: S.name.Fname,

gpa: S.gpa )

from S in CS_DEPARTMENT.Has_majors

where S.Class = ‘senior’

order by gpa desc, last_name asc, first_name asc;

Q5B: select struct ( last_name: S.name.Lname, first_name: S.name.Fname,

gpa: S.gpa )

from S in STUDENTS

where S.Majors_in.Dname = ‘Computer Science’ and

S.Class = ‘senior’

order by gpa desc, last_name asc, first_name asc;

Q5A used the named entry point CS_DEPARTMENT to directly locate the reference to the Computer Science department and then locate the students via the relation-ship Has_majors, whereas Q5B searches the STUDENTS extent to locate all students majoring in that department. Notice how attribute names, relationship names, and operation (method) names are all used interchangeably (in an orthogonal manner) in the path expressions: gpa is an operation; Majors_in and Has_majors are relation-ships; and Class, Name, Dname, Lname, and Fname are attributes. The implementa-tion of the gpa operation computes the grade point average and returns its value as a float type for each selected STUDENT.

The order by clause is similar to the corresponding SQL construct, and specifies in which order the query result is to be displayed. Hence, the collection returned by a query with an order by clause is of type list.

3. Other Features of OQL

Specifying Views as Named Queries. The view mechanism in OQL uses the concept of a named query. The define keyword is used to specify an identifier of the named query, which must be a unique name among all named objects, class names, method names, and function names in the schema. If the identifier has the same name as an existing named query, then the new definition replaces the previous def-inition. Once defined, a query definition is persistent until it is redefined or deleted. A view can also have parameters (arguments) in its definition.

For example, the following view V1 defines a named query Has_minors to retrieve the set of objects for students minoring in a given department:

V1: define Has_minors(Dept_name) as

select S

from S in STUDENTS

where S.Minors_in.Dname = Dept_name;

Because the ODL schema in Figure 11.10 only provided a unidirectional Minors_in attribute for a STUDENT, we can use the above view to represent its inverse without having to explicitly define a relationship. This type of view can be used to represent inverse relationships that are not expected to be used frequently. The user can now utilize the above view to write queries such as

Has_minors(‘Computer Science’);

which would return a bag of students minoring in the Computer Science depart-ment. Note that in Figure 11.10, we defined Has_majors as an explicit relationship, presumably because it is expected to be used more often.

Extracting Single Elements from Singleton Collections. An OQL query will, in general, return a collection as its result, such as a bag, set (if distinct is specified), or list (if the order by clause is used). If the user requires that a query only return a sin-gle element, there is an element operator in OQL that is guaranteed to return a sin-gle element E from a singleton collection C that contains only one element. If C contains more than one element or if C is empty, then the element operator raises an exception. For example, Q6 returns the single object reference to the Computer Science department:

Q6: element ( select D

from D in DEPARTMENTS

where D.Dname = ‘Computer Science’ );

Since a department name is unique across all departments, the result should be one department. The type of the result is D:DEPARTMENT.

Collection Operators (Aggregate Functions, Quantifiers). Because many query expressions specify collections as their result, a number of operators have been defined that are applied to such collections. These include aggregate operators as well as membership and quantification (universal and existential) over a collection.

The aggregate operators (min, max, count, sum, avg) operate over a collection. The operator count returns an integer type. The remaining aggregate operators (min, max, sum, avg) return the same type as the type of the operand collection. Two examples follow. The query Q7 returns the number of students minoring in Computer Science and Q8 returns the average GPA of all seniors majoring in Computer Science.

Q7: count ( S in Has_minors(‘Computer Science’));

Q8: avg ( select S.Gpa

from S in STUDENTS

where S.Majors_in.Dname = ‘Computer Science’ and S.Class = ‘Senior’);

Notice that aggregate operations can be applied to any collection of the appropriate type and can be used in any part of a query. For example, the query to retrieve all department names that have more than 100 majors can be written as in Q9:

Q9: select D.Dname

from D in DEPARTMENTS

where count (D.Has_majors) > 100;

The membership and quantification expressions return a Boolean type—that is, true or false. Let V be a variable, C a collection expression, B an expression of type Boolean (that is, a Boolean condition), and E an element of the type of elements in collection C. Then:

(E in C) returns true if element E is a member of collection C.

(for all V in C : B) returns true if all the elements of collection C satisfy B. (exists V in C : B) returns true if there is at least one element in C satisfying B.

To illustrate the membership condition, suppose we want to retrieve the names of all students who completed the course called ‘Database Systems I’. This can be writ-ten as in Q10, where the nested query returns the collection of course names that each STUDENT S has completed, and the membership condition returns true if ‘Database Systems I’ is in the collection for a particular STUDENT S:

Q10: select S.name.Lname, S.name.Fname from S in STUDENTS

where ‘Database Systems I’ in

select C.Section.Of_course.Cname from C in S.Completed_sections);

Q10 also illustrates a simpler way to specify the select clause of queries that return a collection of structs; the type returned by Q10 is bag<struct(string, string)>.

One can also write queries that return true/false results. As an example, let us assume that there is a named object called JEREMY of type STUDENT. Then, query Q11 answers the following question: Is Jeremy a Computer Science minor? Similarly,

Q12 answers the question Are all Computer Science graduate students advised by Computer Science faculty? Both Q11 and Q12 return true or false, which are inter-preted as yes or no answers to the above questions:

Q11: JEREMY in Has_minors(‘Computer Science’);

Q12: for all G in

select S

fromS in GRAD_STUDENTS

where S.Majors_in.Dname = ‘Computer Science’ )

: G.Advisor in CS_DEPARTMENT.Has_faculty;

Note that query Q12 also illustrates how attribute, relationship, and operation inheritance applies to queries. Although S is an iterator that ranges over the extent GRAD_STUDENTS, we can write S.Majors_in because the Majors_in relationship is inherited by GRAD_STUDENT from STUDENT via extends (see Figure 11.10). Finally, to illustrate the exists quantifier, query Q13 answers the following question: Does any graduate Computer Science major have a 4.0 GPA? Here, again, the operation gpa is inherited by GRAD_STUDENT from STUDENT via extends.

Q13: exists G in

select S

from S in GRAD_STUDENTS

where S.Majors_in.Dname = ‘Computer Science’ )

: G.Gpa = 4;

Ordered (Indexed) Collection Expressions. As we discussed in Section 11.3.3, collections that are lists and arrays have additional operations, such as retrieving the ith, first, and last elements. Additionally, operations exist for extracting a subcollec-tion and concatenating two lists. Hence, query expressions that involve lists or arrays can invoke these operations. We will illustrate a few of these operations using sample queries. Q14 retrieves the last name of the faculty member who earns the highest salary:

Q14: first ( select struct(facname: F.name.Lname, salary: F.Salary)

from F in FACULTY

order by salary desc );

Q14 illustrates the use of the first operator on a list collection that contains the salaries of faculty members sorted in descending order by salary. Thus, the first ele-ment in this sorted list contains the faculty member with the highest salary. This query assumes that only one faculty member earns the maximum salary. The next query, Q15, retrieves the top three Computer Science majors based on GPA.

Q15: ( select struct( last_name: S.name.Lname, first_name: S.name.Fname,

gpa: S.Gpa )

from S in CS_DEPARTMENT.Has_majors

order by gpa desc ) [0:2];

The select-from-order-by query returns a list of Computer Science students ordered by GPA in descending order. The first element of an ordered collection has an index position of 0, so the expression [0:2] returns a list containing the first, second, and third elements of the select ... from ... order by ... result.

The Grouping Operator. The group by clause in OQL, although similar to the corresponding clause in SQL, provides explicit reference to the collection of objects within each group or partition. First we give an example, and then we describe the general form of these queries.

Q16 retrieves the number of majors in each department. In this query, the students are grouped into the same partition (group) if they have the same major; that is, the

same value for S.Majors_in.Dname:

Q16: ( select struct( dept_name, number_of_majors: count (partition) )

from S in STUDENTS

group by dept_name: S.Majors_in.Dname;

The result of the grouping specification is of type set<struct(dept_name: string, parti-tion: bag<struct(S:STUDENT>)>), which contains a struct for each group (partition) that has two components: the grouping attribute value (dept_name) and the bag of the STUDENT objects in the group (partition). The select clause returns the grouping attribute (name of the department), and a count of the number of elements in each partition (that is, the number of students in each department), where partition is the keyword used to refer to each partition. The result type of the select clause is set<struct(dept_name: string, number_of_majors: integer)>. In general, the syntax for the group by clause is

group by F₁: E₁, F₂: E₂, ..., F_k: E_k

where F₁: E₁, F₂: E₂, ..., F_k: E_k is a list of partitioning (grouping) attributes and each partitioning attribute specification F_i: E_i defines an attribute (field) name F_i and an expression E_i. The result of applying the grouping (specified in the group by clause) is a set of structures:

set<struct(F₁: T₁, F₂: T₂, ..., F_k: T_k, partition: bag<B>)>

where T_i is the type returned by the expression E_i, partition is a distinguished field name (a keyword), and B is a structure whose fields are the iterator variables (S in Q16) declared in the from clause having the appropriate type.

Just as in SQL, a having clause can be used to filter the partitioned sets (that is, select only some of the groups based on group conditions). In Q17, the previous query is modified to illustrate the having clause (and also shows the simplified syntax for the select clause). Q17 retrieves for each department having more than 100 majors, the average GPA of its majors. The having clause in Q17 selects only those partitions (groups) that have more than 100 elements (that is, departments with more than 100 students).

Q17: select dept_name, avg_gpa: avg ( select P.gpa from P in partition) from S in STUDENTS

group by dept_name: S.Majors_in.Dname

having count (partition) > 100;

Note that the select clause of Q17 returns the average GPA of the students in the partition. The expression

select P.Gpa from P in partition

returns a bag of student GPAs for that partition. The from clause declares an iterator variable P over the partition collection, which is of type bag<struct(S: STUDENT)>. Then the path expression P.gpa is used to access the GPA of each student in the partition.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Fundamentals of Database Systems : Object, Object-Relational, and XML: Concepts, Models, Languages, and Standards : Object and Object-Relational Databases : The Object Query Language OQL |