More Complex SQL Retrieval Queries
In Section 4.3, we described some basic types of retrieval queries in
SQL. Because of the generality and expressive power of the language, there are
many additional features that allow users to specify more complex retrievals
from the database. We dis-cuss several of these features in this section.
1. Comparisons Involving NULL and Three-Valued Logic
SQL has various rules for dealing with NULL values.
Recall from Section 3.1.2 that NULL is used to represent a missing
value, but that it usually has one of three different interpretations—value unknown
(exists but is not known), value not
available (exists but is purposely withheld), or value not applicable (the attribute is undefined for this tuple).
Consider the following examples to illustrate each of the meanings of NULL.
Unknown value. A person’s date of birth is not
known, so it is represented by NULL in the database.
Unavailable or withheld value. A person
has a home phone but does not want
it to be listed, so it is withheld and represented as NULL in the database.
Not applicable attribute. An
attribute LastCollegeDegree would be NULL for a person who has no
college degrees because it does not apply to that person.
It is often not possible to determine which of the meanings is intended;
for example, a NULL for the home phone of a person can have any of the three meanings.
Hence, SQL does not distinguish between the different meanings of NULL.
In general, each individual NULL value is
considered to be different from every other NULL value in
the various database records. When a NULL is
involved in a compari-son operation, the result is considered to be UNKNOWN (it may be TRUE or it may be FALSE). Hence, SQL uses a three-valued logic with values TRUE, FALSE, and UNKNOWN
instead of the standard two-valued (Boolean) logic
with values
TRUE or FALSE. It is therefore necessary to
define the results (or truth values) of three-valued logical expressions when the logical connectives AND, OR, and NOT are used. Table 5.1 shows the resulting values.
Table 5.1 Logical Connectives in
Three-Valued Logic
In Tables 5.1(a) and 5.1(b), the rows and columns represent the values
of the results of comparison conditions, which would typically appear in the WHERE clause of an SQL query. Each expression result would have a value of TRUE, FALSE, or UNKNOWN. The result of combining the two values using the AND logical connec-tive is shown by the entries in Table 5.1(a). Table
5.1(b) shows the result of using the OR logical
connective. For example, the result of (FALSE AND UNKNOWN) is FALSE,
whereas the result of (FALSE OR UNKNOWN) is UNKNOWN. Table 5.1(c) shows the result of the NOT logical
operation. Notice that in standard Boolean logic, only TRUE or
FALSE values are permitted; there is no UNKNOWN value.
In select-project-join queries, the general rule is that only those
combinations of tuples that evaluate the logical expression in the WHERE clause of the query to TRUE are selected. Tuple combinations
that evaluate to
FALSE or UNKNOWN are not selected. However, there are exceptions to that rule for certain
operations, such as outer joins, as we shall see in Section 5.1.6.
SQL allows queries that check whether an attribute value is NULL. Rather than using = or <> to compare an attribute value to NULL, SQL uses the comparison operators IS or IS NOT. This is because SQL considers each NULL value as
being distinct from every other NULL value, so equality comparison is not appropriate. It follows that when
a join condition is specified, tuples with NULL values
for the join attributes are not included in the result (unless it is an OUTER JOIN; see Section 5.1.6). Query 18 illustrates this.
Query 18. Retrieve the names of all employees who do not have supervisors.
Q18: SELECT Fname, Lname
FROM EMPLOYEE
WHERE Super_ssn IS NULL;
2. Nested Queries, Tuples, and Set/Multiset Comparisons
Some queries require that existing values in the database be fetched and
then used in a comparison condition. Such queries can be conveniently
formulated by using nested queries,
which are complete select-from-where blocks within the WHERE clause of another query. That other query is called the outer query. Query 4 is for-mulated in Q4 without a nested query, but it can be rephrased to use nested queries
as shown in Q4A. Q4A introduces the comparison operator IN, which
compares a value v with a set (or
multiset) of values V and evaluates
to TRUE if v is one of the elements in
V.
The first nested query selects the project numbers of projects that have
an employee with last name ‘Smith’ involved as manager, while the second nested
query selects the project numbers of projects that have an employee with last
name ‘Smith’ involved as worker. In the outer query, we use the OR logical connective to retrieve a PROJECT tuple if
the PNUMBER value of that tuple is in the result of either nested query.
Q4A:SELECT DISTINCT Pnumber
FROM PROJECT
WHERE Pnumber IN
( SELECT Pnumber
FROM PROJECT,
DEPARTMENT, EMPLOYEE
WHERE Dnum=Dnumber
AND
Mgr_ssn=Ssn AND Lname=‘Smith’ )
OR
Pnumber IN
( SELECT Pno
FROM WORKS_ON,
EMPLOYEE
WHERE Essn=Ssn
AND Lname=‘Smith’ );
If a nested query returns a single attribute and a single tuple, the query result will be a single (scalar)
value. In such cases, it is permissible to use = instead of IN for the comparison operator. In general, the nested query will return a
table (relation), which is a set or
multiset of tuples.
SQL allows the use of tuples
of values in comparisons by placing them within parentheses. To illustrate
this, consider the following query:
SELECT DISTINCT Essn
FROM WORKS_ON
WHERE (Pno, Hours) IN ( SELECT Pno, Hours
FROM WORKS_ON
WHERE Essn=‘123456789’ );
This query will select the Essns of all
employees who work the same (project, hours) combination on some project that
employee ‘John Smith’ (whose Ssn = ‘123456789’) works on. In this
example, the IN operator compares the subtuple of values in parentheses (Pno, Hours) within each tuple in WORKS_ON with the set of type-compatible
tuples produced by the nested query.
In addition to the IN operator, a number of other
comparison operators can be used to compare a single value v (typically an attribute name) to a set or multiset v (typ-ically a nested query). The = ANY (or = SOME) operator returns TRUE if the value v is equal to some value in the set V
and is hence equivalent to IN. The two keywords ANY and
SOME have the same effect. Other operators that can be
combined with
ANY (or SOME) include >, >=, <,
<=, and <>. The keyword ALL can also
be com-bined with each of these operators. For example, the comparison
condition (v > ALL V ) returns TRUE if the value v is greater than all the
values in the set (or multiset) V. An example is the following query,
which returns the names of employees whose salary is greater than the salary of
all the employees in department 5:
SELECT Lname,
Fname
FROM EMPLOYEE
WHERE Salary
> ALL ( SELECT Salary
FROM EMPLOYEE
WHERE Dno=5 );
Notice that this query can also be specified using the MAX aggregate function (see Section 5.1.7).
In general, we can have several levels of nested queries. We can once
again be faced with possible ambiguity among attribute names if attributes of
the same name exist—one in a relation in the FROM clause
of the outer query, and another in a
rela-tion in the FROM clause of the nested query.
The rule is that a reference to an unqualified
attribute refers to the relation declared in the innermost nested query. For example, in the SELECT clause and WHERE clause of the first nested query
of Q4A, a reference to any unqualified attribute of the PROJECT relation refers to the PROJECT relation specified in the FROM clause of the nested query. To refer to an attribute
of the PROJECT relation specified in the outer query, we specify and refer to an alias (tuple variable) for that
relation. These rules are similar to scope rules for program variables in most
programming languages that allow nested procedures and functions. To illustrate
the potential ambiguity of attribute names in nested queries, consider Query
16.
Query 16. Retrieve the name of each employee who has a dependent with the same first name and is the same sex as the employee.
Q16: SELECT E.Fname,
E.Lname
FROM EMPLOYEE
AS E
WHERE E.Ssn
IN ( SELECT Essn
FROM DEPENDENT
AS D
WHERE E.Fname=D.Dependent_name
AND E.Sex=D.Sex
);
In the nested query of Q16, we must qualify E.Sex because it refers to the Sex
attribute of EMPLOYEE from the outer query, and DEPENDENT also has
an attribute called Sex. If there were any unqualified
references to Sex in the nested query, they would refer to the Sex attribute of DEPENDENT. However, we would not have to qualify the attributes Fname and Ssn of EMPLOYEE if they appeared in the nested query because the DEPENDENT relation does not have attributes called Fname and Ssn, so there is no ambiguity.
It is generally advisable to create tuple variables (aliases) for all the tables referenced in an SQL query to avoid potential
errors and ambiguities, as illustrated in
Q16.
3. Correlated Nested
Queries
Whenever a condition in the WHERE clause
of a nested query references some attribute of a relation declared in the
outer query, the two queries are said to be correlated. We can understand a correlated query better by
considering that the nested query is evaluated once for each tuple (or
combination of tuples) in the outer query. For example, we can think of Q16 as follows: For each EMPLOYEE tuple, evaluate the nested query, which retrieves the Essn values for all DEPENDENT tuples with the same sex and
name as that EMPLOYEE tuple; if the Ssn value of the EMPLOYEE tuple is in the result of the
nested query, then select that EMPLOYEE tuple.
In general, a query written with nested select-from-where blocks and
using the = or IN
comparison operators can always be expressed as a single block query. For exam-ple, Q16 may be written as in Q16A:
Q16A: SELECT E.Fname,
E.Lname
FROM EMPLOYEE
AS E, DEPENDENT AS D
WHERE E.Ssn=D.Essn
AND E.Sex=D.Sex
AND E.Fname=D.Dependent_name;
4. The EXISTS and UNIQUE
Functions in SQL
The EXISTS function in SQL is used to check whether the result of a correlated
nested query is empty (contains no
tuples) or not. The result of EXISTS is a Boolean value TRUE if the nested query result contains at least one tuple, or FALSE if the nested query result contains no tuples. We illustrate the use of
EXISTS—and NOT EXISTS—with some examples. First, we formulate Query 16 in an alternative form that uses EXISTS as in Q16B:
Q16B: SELECT E.Fname, E.Lname
FROM EMPLOYEE AS E
WHERE EXISTS ( SELECT *
FROM DEPENDENT AS D
WHERE E.Ssn=D.Essn AND E.Sex=D.Sex
AND E.Fname=D.Dependent_name);
EXISTS and
NOT EXISTS are typically used in conjunction
with a correlated nested query. In Q16B, the nested query references the Ssn, Fname, and Sex attributes of the EMPLOYEE relation from the outer query. We
can think of
Q16B as follows: For each EMPLOYEE tuple,
evaluate the nested query, which retrieves all DEPENDENT tuples with the same Essn, Sex, and Dependent_name as the EMPLOYEE tuple; if at least one tuple EXISTS in the
result of the nested query, then select that EMPLOYEE tuple.
In general, EXISTS(Q) returns TRUE if there is at least one tuple
in the result of the nested query Q, and it
returns FALSE otherwise. On the other hand, NOT EXISTS(Q) returns TRUE if there are no tuples in the
result of nested query Q, and it returns FALSE otherwise. Next, we illustrate the use of NOT EXISTS.
Query 6. Retrieve the names of employees who have no dependents.
Q6: SELECT Fname,
Lname
FROM EMPLOYEE
WHERE NOT
EXISTS ( SELECT *
FROM DEPENDENT
WHERE Ssn=Essn
);
In Q6, the correlated nested query retrieves all DEPENDENT tuples related to a par-ticular EMPLOYEE tuple.
If none exist, the EMPLOYEE tuple is selected because the WHERE-clause
condition will evaluate to TRUE in this case. We can explain Q6 as
follows: For each
EMPLOYEE tuple, the correlated nested query selects all DEPENDENT tuples whose Essn value matches the EMPLOYEE Ssn; if the result is empty, no dependents are related to the employee, so
we select that EMPLOYEE tuple and retrieve its Fname and Lname.
Query 7. List the names of managers who have at least one dependent.
Q7: SELECT Fname, Lname
FROM EMPLOYEE
WHERE EXISTS ( SELECT *
FROM DEPENDENT
WHERE Ssn=Essn )
AND
EXISTS ( SELECT *
FROM DEPARTMENT
WHERE Ssn=Mgr_ssn );
One way to write this query is shown in Q7, where
we specify two nested correlated queries; the first selects all DEPENDENT tuples related to an EMPLOYEE, and the sec-ond selects all DEPARTMENT tuples managed by the EMPLOYEE. If at least one of the first
and at least one of the second exists, we select the EMPLOYEE tuple. Can you rewrite this query using only a single nested query or
no nested queries?
The query Q3: Retrieve the name of each
employee who works on all the
projects con-trolled by department number 5 can be written using EXISTS and NOT
EXISTS in SQL systems. We show two
ways of specifying this query Q3 in SQL as Q3A and Q3B. This is an example of certain types of queries that require universal quantification, as we will
discuss in Section 6.6.7. One way to write this query is to use the construct (S2 EXCEPT S1) as explained next, and checking
whether the result is empty. This option is shown as Q3A.
Q3A: SELECT Fname, Lname
FROM EMPLOYEE
WHERE NOT EXISTS ( ( SELECT Pnumber
FROM PROJECT
WHERE Dnum=5)
EXCEPT (
SELECT Pno
FROM WORKS_ON
WHERE Ssn=Essn) );
In Q3A, the first subquery (which is not correlated with the outer query)
selects all projects controlled by department 5, and the second subquery (which
is correlated) selects all projects that the particular employee being
considered works on. If the set difference of the first subquery result MINUS (EXCEPT) the second subquery result is empty, it means that the employee works
on all the projects and is therefore selected.
The second option is shown as Q3B. Notice
that we need two-level nesting in Q3B and that
this formulation is quite a bit more complex than Q3A, which uses NOT EXISTS and EXCEPT.
Q3B: SELECT Lname,
Fname
FROM EMPLOYEE
WHERE NOT EXISTS ( SELECT *
FROM WORKS_ON B
WHERE (
B.Pno IN ( SELECT Pnumber
FROM PROJECT
WHERE Dnum=5 )
AND
NOT EXISTS ( SELECT *
FROM WORKS_ON C
WHERE C.Essn=Ssn
AND C.Pno=B.Pno
)));
In Q3B, the outer nested query selects any WORKS_ON (B)
tuples whose Pno is of a project controlled by department 5, if there is not a WORKS_ON (C) tuple with the same Pno and the same Ssn as that of the EMPLOYEE tuple under consideration in the outer query. If no such tuple exists,
we select the EMPLOYEE tuple. The form of Q3B matches the following rephrasing
of Query 3: Select each employee such that there
does not exist a project controlled by department 5 that the employee does not
work on. It corresponds to the way we will write this query in tuple relation
calculus (see Section 6.6.7).
There is another SQL function, UNIQUE(Q), which returns TRUE if there are no duplicate tuples
in the result of query Q; otherwise, it returns FALSE. This can be used to test whether the result of a nested query is a set
or a multiset.
5. Explicit Sets and
Renaming of Attributes in SQL
We have seen several queries with a nested query in the WHERE clause. It is also pos-sible to use an explicit set of values in the WHERE clause,
rather than a nested query. Such a set is enclosed in parentheses in SQL.
Query 17. Retrieve the Social Security numbers of all employees who work on project numbers 1, 2, or 3.
Q17: SELECT DISTINCT Essn
FROM WORKS_ON
WHERE Pno IN (1, 2, 3);
In SQL, it is possible to rename any attribute that appears in the
result of a query by adding the qualifier AS followed
by the desired new name. Hence, the AS
construct can be used to alias both attribute and relation names, and it can be
used in both the SELECT
and FROM clauses. For example, Q8A shows how query Q8 from Section 4.3.2 can be slightly changed to retrieve the last name of each employee
and his or her supervisor, while renaming the resulting attribute names as Employee_name and Supervisor_name. The new names will appear as column headers in the query result.
Q8A: SELECT E.Lname AS Employee_name,
S.Lname AS Supervisor_name
FROM EMPLOYEE AS E,
EMPLOYEE AS S
WHERE E.Super_ssn=S.Ssn;
6. Joined Tables in SQL
and Outer Joins
The concept of a joined table
(or joined relation) was
incorporated into SQL to permit users to specify a table resulting from a join
operation in the FROM clause of a query. This
construct may be easier to comprehend than mixing together all the select and
join conditions in the WHERE clause. For example, consider
query Q1, which retrieves the name and address of every employee who works for
the ‘Research’ department. It may be easier to specify the join of the EMPLOYEE and DEPARTMENT
relations first, and then to select the desired tuples
and attributes.
This can be written in SQL as in Q1A:
Q1A: SELECT Fname, Lname,
Address
FROM (EMPLOYEE JOIN DEPARTMENT ON Dno=Dnumber)
WHERE Dname=‘Research’;
The FROM clause in Q1A contains a single joined table.
The attributes of such a table are all the attributes of the first table, EMPLOYEE, followed by all the attributes of the second table, DEPARTMENT. The concept of a joined table also allows the user to specify
different types of join, such as NATURAL JOIN and
various types of OUTER JOIN. In a
NATURAL JOIN on two relations R and
S, no join condition is specified; an implicit EQUIJOIN condition for each pair of
attributes with the same name from R
and S is created. Each such pair of
attributes is included only once in
the resulting relation (see Section 6.3.2 and 6.4.4 for more details on the
various types of join operations in relational algebra).
If the names of the join attributes are not the same in the base
relations, it is possi-ble to rename the attributes so that they match, and
then to apply NATURAL
JOIN. In this case, the AS
construct can be used to rename a relation and all its attributes in the FROM clause. This is illustrated in Q1B, where
the DEPARTMENT relation is renamed as DEPT and its attributes are renamed
as Dname, Dno (to match the name of the desired join attribute Dno in the EMPLOYEE table), Mssn, and Msdate. The implied join condition for this NATURAL JOIN is EMPLOYEE.Dno=DEPT.Dno, because this is the only pair of attributes with the same name after
renaming:
Q1B: SELECT Fname, Lname,
Address
FROM (EMPLOYEE NATURAL
JOIN
(DEPARTMENT AS DEPT (Dname, Dno, Mssn, Msdate)))
WHERE Dname=‘Research’;
The default type of join in a joined table is called an inner join, where a tuple is included
in the result only if a matching tuple exists in the other relation. For
exam-ple, in query Q8A, only employees who have a supervisor are included in the
result; an EMPLOYEE tuple whose value for Super_ssn is NULL is excluded. If the user requires that all employees be included, an OUTER JOIN must be used explicitly (see Section 6.4.4 for the definition of OUTER JOIN). In SQL, this is handled by explicitly specifying the keyword OUTER JOIN in a joined table, as illustrated in Q8B:
Q8B: SELECT E.Lname AS Employee_name,
S.Lname AS Supervisor_name
FROM (EMPLOYEE AS E LEFT
OUTER JOIN EMPLOYEE AS S
ON E.Super_ssn=S.Ssn);
There are a variety of outer join operations, which we shall discuss in
more detail in Section 6.4.4. In SQL, the options available for specifying
joined tables include INNER JOIN (only pairs of tuples that match
the join condition are retrieved, same as JOIN), LEFT
OUTER JOIN (every tuple in the left table
must appear in the result; if it does not have a matching tuple, it is padded
with NULL values for the attributes of the right table), RIGHT OUTER JOIN (every tuple in the right table must appear in the result; if it does
not have a matching tuple, it is padded with NULL values
for the attributes of the left table), and FULL OUTER JOIN. In the
latter three options, the keyword OUTER may be
omitted. If the join attributes have the same name, one can also specify the
natural join variation of outer joins by using the keyword NATURAL before the operation (for example, NATURAL LEFT OUTER JOIN). The keyword CROSS JOIN is used to specify the CARTESIAN PRODUCT operation (see Section 6.2.2), although this should be
used only with the utmost care because it generates all possible tuple
combinations.
It is also possible to nest
join specifications; that is, one of the tables in a join may itself be a
joined table. This allows the specification of the join of three or more tables
as a single joined table, which is called a multiway join. For example, Q2A is a
different way of specifying query Q2 from
Section 4.3.1 using the concept of a joined table:
Q2A: SELECT Pnumber, Dnum,
Lname, Address, Bdate
FROM ((PROJECT JOIN DEPARTMENT ON Dnum=Dnumber)
JOIN EMPLOYEE ON Mgr_ssn=Ssn)
WHERE Plocation=‘Stafford’;
Not all SQL implementations have implemented the new syntax of joined
tables. In some systems, a different syntax was used to specify outer joins by
using the com-parison operators +=, =+, and +=+ for left, right, and full outer
join, respectively, when specifying the join condition. For example, this
syntax is available in Oracle. To specify the left outer join in Q8B using this syntax, we could write the query Q8C as follows:
Q8C: SELECT E.Lname, S.Lname
FROM EMPLOYEE E,
EMPLOYEE S
WHERE E.Super_ssn +=
S.Ssn;
7. Aggregate Functions
in SQL
In Section 6.4.2, we will introduce the concept of an aggregate function
as a relational algebra operation. Aggregate
functions are used to summarize information from multiple tuples into a
single-tuple summary. Grouping is
used to create sub-groups of tuples before summarization. Grouping and aggregation
are required in many database applications, and we will introduce their use in
SQL through exam-ples. A number of built-in aggregate functions exist: COUNT, SUM, MAX, MIN, and AVG. The
COUNT function returns the number of
tuples or values as specified in a query. The functions SUM, MAX, MIN, and AVG can be applied to a set or multiset of numeric values and return,
respectively, the sum, maximum value, minimum value, and average (mean) of
those values. These functions can be used in the SELECT clause
or in a HAVING clause (which we introduce later). The functions MAX and MIN
can also be used with attributes that have
nonnumeric domains if the domain values
have a total ordering among one
another. We illustrate the use of these functions with sample queries.
Query 19. Find the sum of the salaries of all employees, the maximum salary, the minimum salary, and the average salary.
Q19: SELECT SUM (Salary),
MAX (Salary), MIN (Salary),
AVG (Salary)
FROM EMPLOYEE;
If we want to get the preceding function values for employees of a
specific depart-ment—say, the ‘Research’ department—we can write Query 20,
where the EMPLOYEE
tuples are restricted by the WHERE clause to those employees who work for the
‘Research’ department.
Query 20. Find the sum of the salaries of all employees of the ‘Research’ department, as well as the maximum salary, the minimum salary, and the
aver-age salary in this department.
Q20: SELECT SUM (Salary),
MAX (Salary), MIN (Salary),
AVG (Salary)
FROM (EMPLOYEE JOIN DEPARTMENT ON Dno=Dnumber)
WHERE Dname=‘Research’;
Queries 21 and 22. Retrieve the total number of employees in the company (Q21) and the number of employees in the ‘Research’ department (Q22).
Q21: SELECT COUNT (*)
FROM EMPLOYEE;
Q22: SELECT COUNT (*)
FROM EMPLOYEE, DEPARTMENT
WHERE DNO=DNUMBER AND DNAME=‘Research’;
Here the asterisk (*) refers to the rows
(tuples), so COUNT (*) returns the number of rows in the result of the query. We may also
use the COUNT function to count values in a column rather than tuples, as in the next
example.
Query 23. Count the number of distinct salary values in the database.
Q23: SELECT COUNT (DISTINCT Salary)
FROM EMPLOYEE;
If we write COUNT(SALARY) instead
of COUNT(DISTINCT
SALARY) in Q23, then duplicate values will not be eliminated. However, any tuples with
NULL for SALARY
will not be counted. In general, NULL values
are discarded when aggregate
func-tions are applied to a particular column (attribute).
The preceding examples summarize a
whole relation (Q19, Q21, Q23) or a selected subset of tuples (Q20, Q22), and hence all produce single tuples or single values. They illustrate
how functions are applied to retrieve a summary value or summary tuple from the
database. These functions can also be used in selection conditions involving nested
queries. We can specify a correlated nested query with an aggregate function,
and then use the nested query in the WHERE clause
of an outer query. For example, to retrieve the names of all employees who have
two or more dependents (Query 5), we can write the following:
Q5: SELECT Lname, Fname
FROM EMPLOYEE
WHERE (
SELECT COUNT (*)
FROM DEPENDENT
WHERE Ssn=Essn
) >= 2;
The correlated nested query counts the number of dependents that each
employee has; if this is greater than or equal to two, the employee tuple is
selected.
8. Grouping: The GROUP
BY and HAVING Clauses
In many cases we want to apply the aggregate functions to subgroups of tuples in a relation, where the subgroups are based
on some attribute values. For example, we
may want to find the average salary of employees in each department or the number of employees who work on each project. In these cases we need
to partition the relation into
nonoverlapping subsets (or groups)
of tuples. Each group (partition) will consist of the tuples that have the same
value of some attribute(s), called the grouping
attribute(s). We can then apply the function to each such group
inde-pendently to produce summary information about each group. SQL has a GROUP BY
clause for this purpose. The GROUP BY clause specifies the grouping attributes, which
should also appear in the SELECT clause, so that the value resulting from applying each aggregate
function to a group of tuples appears along with the value of the grouping
attribute(s).
Query 24. For each department, retrieve the department number, the number of employees in the department, and their average salary.
Q24: SELECT Dno, COUNT (*),
AVG (Salary)
FROM EMPLOYEE
GROUP BY Dno;
In Q24, the EMPLOYEE tuples are partitioned into groups—each group having the same value for
the grouping attribute Dno. Hence, each group contains the
employees who work in the same department. The COUNT and AVG functions are applied to each such group of tuples. Notice that the SELECT clause includes only the grouping attribute and the aggregate functions
to be applied on each group of tuples. Figure 5.1(a) illustrates how grouping
works on Q24; it also shows the result of Q24.
If NULLs exist in the grouping attribute, then a separate group is created for all tuples with a NULL value in the grouping attribute. For example, if the EMPLOYEE table had some tuples that had NULL for the
grouping attribute Dno, there would be a separate group
for those tuples in the result of Q24.
Query 25. For each project, retrieve the project number, the project name, and the number of employees who work on that project.
Q25: SELECT Pnumber, Pname,
COUNT (*)
FROM PROJECT,
WORKS_ON
WHERE Pnumber=Pno
GROUP BY Pnumber,
Pname;
Q25 shows how we can use a join condition in conjunction with GROUP BY. In this
case, the grouping and functions are applied after the joining of the two relations.
Sometimes we want to retrieve the values of these functions only for groups that sat-isfy certain conditions. For
example, suppose that we want to modify Query 25 so that only projects with more than two employees appear in the
result. SQL provides a HAVING clause, which can appear in
conjunction with a GROUP BY clause, for this purpose. HAVING provides a condition on the summary information regarding the group of
tuples associated with each value of the grouping attributes. Only the groups
that satisfy the condition are retrieved in the result of the query. This is
illus-trated by Query 26.
Query 26. For each project on which
more than two employees work, retrieve the project number, the project name, and the number of employees who
work on the project.
Q26: SELECT Pnumber, Pname,
COUNT (*)
FROM PROJECT,
WORKS_ON
WHERE Pnumber=Pno
GROUP BY Pnumber,
Pname
HAVING COUNT (*) >
2;
Notice that while selection conditions in the WHERE clause limit the tuples to
which functions are applied, the HAVING clause
serves to choose whole groups. Figure
5.1(b) illustrates the use of HAVING and displays the result of Q26.
Query 27. For each project, retrieve the project number, the project name, and the number of employees from department 5 who work on the project.
Q27: SELECT Pnumber, Pname,
COUNT (*)
FROM PROJECT,
WORKS_ON, EMPLOYEE
WHERE Pnumber=Pno
AND Ssn=Essn AND Dno=5
GROUP BY Pnumber,
Pname;
Here we restrict the tuples in the relation (and hence the tuples in
each group) to those that satisfy the condition specified in the WHERE clause—namely, that they work in department number 5. Notice that we
must be extra careful when two dif-ferent conditions apply (one to the
aggregate function in the SELECT clause and another to the
function in the HAVING clause). For example, suppose that we want to count the total number of employees whose salaries
exceed $40,000 in each department, but only for departments where more than
five employees work. Here, the condition (SALARY >
40000) applies only to the COUNT function in the SELECT clause. Suppose that we write the following incorrect query:
SELECT Dname,
COUNT (*)
FROM DEPARTMENT,
EMPLOYEE
WHERE Dnumber=Dno
AND Salary>40000
GROUP BY Dname
HAVING COUNT (*) >
5;
This is incorrect because it will select only departments that have more
than five employees who each earn more
than $40,000. The rule is that the WHERE clause
is executed first, to select individual tuples or joined tuples; the HAVING clause is applied later, to select individual groups of tuples. Hence,
the tuples are already restricted to employees who earn more than $40,000 before the function in the HAVING clause is applied. One way to write this query correctly is to use a
nested
query, as shown in Query 28.
Query 28. For each department that has more than five employees, retrieve the department number and the number of its employees who are making
more than $40,000.
Q28: SELECT Dnumber, COUNT (*)
FROM DEPARTMENT,
EMPLOYEE
WHERE Dnumber=Dno
AND Salary>40000 AND
( SELECT Dno
FROM EMPLOYEE
GROUP BY Dno
HAVING COUNT (*) > 5)
9. Discussion and
Summary of SQL Queries
A retrieval query in SQL can consist of up to six clauses, but only the
first two— SELECT
and FROM—are mandatory. The query can
span several lines, and is ended by a
semicolon. Query terms are separated by spaces, and parentheses can be used to
group relevant parts of a query in the standard way. The clauses are specified
in the following order, with the clauses between square brackets [ ... ] being
optional:
SELECT <attribute and function list> FROM <table
list>
[ WHERE <condition> ]
[ GROUP BY <grouping attribute(s)> ] [ HAVING <group
condition>
]
[ ORDER BY <attribute list> ];
The SELECT clause lists the attributes or functions to be retrieved. The FROM clause specifies all relations (tables) needed in the query, including
joined relations, but not those in nested queries. The WHERE clause specifies the conditions for selecting the tuples from these
relations, including join conditions if needed. GROUP BY specifies
grouping attributes, whereas HAVING specifies a condition on the
groups being selected rather than on the individual tuples. The built-in
aggregate functions COUNT, SUM, MIN,
MAX, and AVG are used
in conjunction with grouping, but they can also
be applied to all the selected tuples in a query without a GROUP BY clause. Finally, ORDER BY specifies an order for
displaying the result of a query.
In order to formulate queries correctly, it is useful to consider the
steps that define the meaning or semantics of each query. A query is
evaluated conceptually by first applying the FROM clause (to identify all tables involved in the query or to material-ize
any joined tables), followed by the WHERE clause
to select and join tuples, and then by GROUP BY and HAVING. Conceptually, ORDER BY is applied at the end to sort
the query result. If none of the last three clauses (GROUP BY, HAVING, and ORDER
BY) are specified, we can think conceptually of a query as being executed as follows:
For each combination of tuples—one
from each of the relations specified in the FROM
clause—evaluate the WHERE clause; if it evaluates to TRUE, place the values of the attributes specified in the SELECT clause from this tuple combination in the result of the query. Of
course, this is not an efficient way to implement the query in a real system,
and each DBMS has special query optimization routines to decide on an execution
plan that is efficient to execute. We discuss query processing and optimization
in Chapter 19.
In general, there are numerous ways to specify the same query in SQL.
This flexibility in specifying queries has advantages and disadvantages. The
main advantage is that users can choose the technique with which they are most
comfortable when specifying a query. For example, many queries may be specified
with join conditions in the WHERE clause, or by using joined
relations in the FROM clause, or with some form of nested queries and the IN comparison operator. Some users may be more comfortable with one
approach, whereas others may be more comfortable with another. From the
programmer’s and the system’s point of view regarding query optimization, it is
generally preferable to write a query with as little nesting and implied
ordering as possible.
The disadvantage of having numerous ways of specifying the same query is
that this may confuse the user, who may not know which technique to use to
specify particular types of queries. Another problem is that it may be more
efficient to execute a query specified in one way than the same query specified
in an alternative way. Ideally, this should not be the case: The DBMS should
process the same query in the same way regardless of how the query is specified.
But this is quite difficult in practice, since each DBMS has different methods
for processing queries specified in different ways. Thus, an additional burden
on the user is to determine which of the alternative specifications is the most
efficient to execute. Ideally, the user should worry only about specifying the
query correctly, whereas the DBMS would deter-mine how to execute the query
efficiently. In practice, however, it helps if the user is aware of which types
of constructs in a query are more expensive to process than others (see Chapter
20).
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.