Operations Using Pipelining
A query specified in SQL will typically be
translated into a relational algebra expression that is a sequence of relational operations. If we execute a single
operation at a time, we must generate temporary files on disk to hold the
results of these temporary operations, creating excessive overhead. Generating
and storing large temporary files on disk is time-consuming and can be
unnecessary in many cases, since these files will immediately be used as input
to the next operation. To reduce the number of temporary files, it is common to
generate query execution code that cor-responds to algorithms for combinations
of operations in a query.
For example, rather than being implemented
separately, a JOIN can be combined with two SELECT operations on the input files and a final PROJECT operation on the resulting file; all this is implemented by one
algorithm with two input files and a single output file. Rather than creating
four temporary files, we apply the algorithm directly and get just one result
file. In Section 19.7.2, we discuss how heuristic rela-tional algebra
optimization can group operations together for execution. This is called pipelining or stream-based processing.
It is common to create the query execution code
dynamically to implement multiple operations. The generated code for producing
the query combines several algorithms that correspond to individual
operations. As the result tuples from one operation are produced, they are
provided as input for subsequent operations. For example, if a join operation
follows two select operations on base relations, the tuples resulting from each
select are provided as input for the join algorithm in a stream or pipeline as
they are produced.