Home | | **Database Management Systems** | | **FUNDAMENTALS OF Database Systems** | | **Database Management Systems** | Implementing Aggregate Operations and OUTER JOINs

1. Implementing Aggregate Operations
2. Implementing OUTER JOINs

**Implementing Aggregate
Operations and OUTER JOINs**

**1. Implementing Aggregate Operations**

The aggregate operators (MIN, MAX, COUNT, AVERAGE, SUM), when applied to an entire table, can be computed by a table scan
or by using an appropriate index, if available. For example, consider the
following SQL query:

SELECT MAX(Salary)

FROM EMPLOYEE;

If an (ascending) B^{+}-tree index on Salary exists for the EMPLOYEE relation, then the optimizer can decide on
using the Salary index to search for the largest Salary value in the index by following the *rightmost* pointer in each index node
from the root to the rightmost leaf. That node would include the largest Salary value as its *last* entry.
In most cases, this would be more efficient than a full table scan of EMPLOYEE, since no actual records need to be retrieved. The MIN function can be handled in a similar manner, except that the *leftmost* pointer in the index is
followed from the root to leftmost leaf. That node would include the smallest Salary value as its *first*
entry.

The index could also be used for the AVERAGE and SUM aggregate functions, but only if it is a **dense index**—that is, if there is an
index entry for every record in the main file. In this case, the associated
computation would be applied to the values in the index. For a **nondense index**, the actual number of
records associated with each index value must be used for a correct
computation. This can be done if the *number*
*of records associated with each value *in
the index is stored in each index entry. For the* *COUNT aggregate function, the number of values can be also computed from
the index in a similar manner. If a COUNT(_{*}) function is applied to a whole relation, the
number of records currently in each relation are typically stored in the
catalog, and so the result can be retrieved directly from the catalog.

When a GROUP BY clause is used in a query, the aggregate
operator must be applied separately to each group of tuples as partitioned by
the grouping attribute. Hence, the table must first be partitioned into subsets
of tuples, where each partition (group) has the same value for the grouping
attributes. In this case, the computation is more complex. Consider the following
query:

SELECT Dno, AVG(Salary)

FROM EMPLOYEE

GROUP BY Dno;

The usual technique for such queries is to
first use either **sorting** or **hashing** on the grouping attributes to
partition the file into the appropriate groups. Then the algorithm computes
the aggregate function for the tuples in each group, which have the same
grouping attribute(s) value. In the sample query, the set of EMPLOYEE tuples for each department number would be grouped together in a
partition and the aver-age salary computed for each group.

Notice that if a **clustering index** (see Chapter 18) exists on the grouping
attribute(s), then the records are *already
partitioned* (grouped) into the appropriate subsets. In this case, it is
only necessary to apply the computation to each group.

**2. Implementing OUTER JOINs**

In Section 6.4, the *outer join operation* was discussed, with
its three variations: left outer join, right outer join, and full outer join.
We also discussed in Chapter 5 how these operations can be specified in SQL.
The following is an example of a left outer join operation in SQL:

SELECT Lname, Fname, Dname

FROM (EMPLOYEE LEFT
OUTER JOIN DEPARTMENT ON Dno=Dnumber);

The result of this query is a table of employee
names and their associated departments. It is similar to a regular (inner)
join result, with the exception that if an

EMPLOYEE tuple (a tuple in the *left* relation) *does not have an associated department,* the employee’s name will still appear in the resulting table, but
the department name would be NULL for such tuples in the query result.

Outer join can be computed by modifying one of
the join algorithms, such as nested-loop join or single-loop join. For example,
to compute a *left* outer join, we use
the left relation as the outer loop or single-loop because every tuple in the
left relation must appear in the result. If there are matching tuples in the
other relation, the joined tuples are produced and saved in the result.
However, if no matching tuple is found, the tuple is still included in the
result but is padded with NULL value(s). The sort-merge and hash-join
algorithms can also be extended to compute outer joins.

Theoretically, outer join can also be computed
by executing a combination of relational algebra operators. For example, the
left outer join operation shown above is equivalent to the following sequence
of relational operations:

**1. **Compute the (inner)** **JOIN** **of the** **EMPLOYEE** **and** **DEPARTMENT** **tables.

TEMP1 ← π_{Lname}_{,} _{Fname}_{,} _{Dname}
(EMPLOYEE

**2. **Find the** **EMPLOYEE** **tuples that do not appear in the (inner)** **JOIN** **result.

TEMP2 ← π_{Lname}_{,} _{Fname}
(EMPLOYEE) – π_{Lname}_{,} _{Fname}
(TEMP1)

**3. **Pad each tuple in** **TEMP2** **with a** **NULL Dname** **field.

TEMP2 ← TEMP2 × NULL

**
**Apply
the UNION operation to TEMP1, TEMP2 to produce the LEFT
OUTER JOIN result.

RESULT ← TEMP1 ∪ TEMP2

The cost of the outer join as computed above
would be the sum of the costs of the associated steps (inner join, projections,
set difference, and union). However, note that step 3 can be done as the
temporary relation is being constructed in step 2; that is, we can simply pad
each resulting tuple with a NULL. In addition, in step 4, we know that the two
operands of the union are disjoint (no common tuples), so there is no need for
duplicate elimination.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

**Related Topics **

Copyright © 2018-2020 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.