Implementing Aggregate
Operations and OUTER JOINs
1. Implementing Aggregate Operations
The aggregate operators (MIN, MAX, COUNT, AVERAGE, SUM), when applied to an entire table, can be computed by a table scan
or by using an appropriate index, if available. For example, consider the
following SQL query:
SELECT MAX(Salary)
FROM EMPLOYEE;
If an (ascending) B+-tree index on Salary exists for the EMPLOYEE relation, then the optimizer can decide on
using the Salary index to search for the largest Salary value in the index by following the rightmost pointer in each index node
from the root to the rightmost leaf. That node would include the largest Salary value as its last entry.
In most cases, this would be more efficient than a full table scan of EMPLOYEE, since no actual records need to be retrieved. The MIN function can be handled in a similar manner, except that the leftmost pointer in the index is
followed from the root to leftmost leaf. That node would include the smallest Salary value as its first
entry.
The index could also be used for the AVERAGE and SUM aggregate functions, but only if it is a dense index—that is, if there is an
index entry for every record in the main file. In this case, the associated
computation would be applied to the values in the index. For a nondense index, the actual number of
records associated with each index value must be used for a correct
computation. This can be done if the number
of records associated with each value in
the index is stored in each index entry. For the COUNT aggregate function, the number of values can be also computed from
the index in a similar manner. If a COUNT(*) function is applied to a whole relation, the
number of records currently in each relation are typically stored in the
catalog, and so the result can be retrieved directly from the catalog.
When a GROUP BY clause is used in a query, the aggregate
operator must be applied separately to each group of tuples as partitioned by
the grouping attribute. Hence, the table must first be partitioned into subsets
of tuples, where each partition (group) has the same value for the grouping
attributes. In this case, the computation is more complex. Consider the following
query:
SELECT Dno, AVG(Salary)
FROM EMPLOYEE
GROUP BY Dno;
The usual technique for such queries is to
first use either sorting or hashing on the grouping attributes to
partition the file into the appropriate groups. Then the algorithm computes
the aggregate function for the tuples in each group, which have the same
grouping attribute(s) value. In the sample query, the set of EMPLOYEE tuples for each department number would be grouped together in a
partition and the aver-age salary computed for each group.
Notice that if a clustering index (see Chapter 18) exists on the grouping
attribute(s), then the records are already
partitioned (grouped) into the appropriate subsets. In this case, it is
only necessary to apply the computation to each group.
2. Implementing OUTER JOINs
In Section 6.4, the outer join operation was discussed, with
its three variations: left outer join, right outer join, and full outer join.
We also discussed in Chapter 5 how these operations can be specified in SQL.
The following is an example of a left outer join operation in SQL:
SELECT Lname, Fname, Dname
FROM (EMPLOYEE LEFT
OUTER JOIN DEPARTMENT ON Dno=Dnumber);
The result of this query is a table of employee
names and their associated departments. It is similar to a regular (inner)
join result, with the exception that if an
EMPLOYEE tuple (a tuple in the left relation) does not have an associated department, the employee’s name will still appear in the resulting table, but
the department name would be NULL for such tuples in the query result.
Outer join can be computed by modifying one of
the join algorithms, such as nested-loop join or single-loop join. For example,
to compute a left outer join, we use
the left relation as the outer loop or single-loop because every tuple in the
left relation must appear in the result. If there are matching tuples in the
other relation, the joined tuples are produced and saved in the result.
However, if no matching tuple is found, the tuple is still included in the
result but is padded with NULL value(s). The sort-merge and hash-join
algorithms can also be extended to compute outer joins.
Theoretically, outer join can also be computed
by executing a combination of relational algebra operators. For example, the
left outer join operation shown above is equivalent to the following sequence
of relational operations:
1. Compute the (inner) JOIN of the EMPLOYEE and DEPARTMENT tables.
TEMP1 ← πLname, Fname, Dname
(EMPLOYEE
2. Find the EMPLOYEE tuples that do not appear in the (inner) JOIN result.
TEMP2 ← πLname, Fname
(EMPLOYEE) – πLname, Fname
(TEMP1)
3. Pad each tuple in TEMP2 with a NULL Dname field.
TEMP2 ← TEMP2 × NULL
Apply
the UNION operation to TEMP1, TEMP2 to produce the LEFT
OUTER JOIN result.
RESULT ← TEMP1 ∪ TEMP2
The cost of the outer join as computed above
would be the sum of the costs of the associated steps (inner join, projections,
set difference, and union). However, note that step 3 can be done as the
temporary relation is being constructed in step 2; that is, we can simply pad
each resulting tuple with a NULL. In addition, in step 4, we know that the two
operands of the union are disjoint (no common tuples), so there is no need for
duplicate elimination.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.