Algorithms
for PROJECT and Set Operations
A PROJECT operation π<attribute
list>(R)
is straightforward to implement if <attribute list> includes a key of
relation R, because in this case the
result of the operation will have the same number of tuples as R, but with only the values for the
attributes in <attribute list> in each tuple. If <attribute list>
does not include a key of R, duplicate tuples must be eliminated. This can be done by sorting the result
of the operation and then eliminating
duplicate tuples, which appear consecutively after sorting. A sketch of the algorithm
is given in Figure 19.3(b). Hashing can also be used to eliminate duplicates:
as each record is hashed and inserted into a bucket of the hash file in memory,
it is checked against those records already in the bucket; if it is a
duplicate, it is not inserted in the bucket. It is useful to recall here that
in SQL queries, the default is not to eliminate duplicates from the query
result; duplicates are eliminated from the query result only if the keyword DISTINCT is included.
Set operations—UNION, INTERSECTION, SET
DIFFERENCE, and CARTESIAN PRODUCT—are sometimes expensive to implement. In particular, the CARTESIAN PRODUCT operation R × S is quite expensive because its result includes
a record for each combination of records from R and S. Also, each record
in the result includes all attributes of R
and S. If R has n records and j attributes, and S has m records and k attributes, the result relation for R × S will have
n * m records and each
record will have j + k attributes. Hence,
it is important to avoid the CARTESIAN PRODUCT operation and to substitute other operations
such as join during query optimization (see Section 19.7).
The other three set operations—UNION, INTERSECTION, and SET DIFFERENCE14—apply only to type-compatible (or union-compatible) relations, which have the same number of attributes and
the same attribute domains. The customary way to implement these operations is
to use variations of the sort-merge technique: the two relations are sorted
on the same attributes, and, after sorting, a single scan through each relation is sufficient to produce the
result. For example, we can implement the UNION operation, R
∪ S,
by scanning and merging both sorted files concurrently, and whenever the same
tuple exists in both relations, only one is kept in the merged result. For the INTERSECTION operation, R ∩ S, we keep in the merged
result only those tuples that appear in both
sorted relations. Figure 19.3(c) to (e) sketches the implementation of
these operations by sorting and merging. Some of the details are not included
in these algorithms.
Hashing can also be used to implement UNION, INTERSECTION, and SET DIFFER-ENCE. One table is first scanned and then partitioned into an in-memory
hash table with buckets, and the records in the other table are then scanned
one at a time and used to probe the appropriate partition. For example, to
implement R ∪ S,
first hash (partition) the records of R;
then, hash (probe) the records of S,
but do not insert duplicate records in the buckets. To implement R ∩ S,
first partition the records of R to
the hash file. Then, while hashing each record of S, probe to check if an identical record from R is found in the bucket, and if so add
the record to the result file. To implement R
– S, first hash the records of R to the hash file buckets. While
hashing (probing) each record of S,
if an identical record is found in the bucket, remove that record from the
bucket.
In SQL, there are two variations of these set
operations. The operations UNION,
INTERSECTION, and EXCEPT (the SQL keyword for the SET DIFFERENCE operation) apply to traditional sets, where no duplicate records
exist in the result. The operations UNION
ALL, INTERSECTION
ALL, and EXCEPT
ALL apply to multisets (or bags), and duplicates
are fully considered. Variations of the above algorithms can be used for the
multiset operations in SQL. We leave these as an exercise for the reader.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2026 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.