To develop reasonably accurate cost functions for JOIN operations, we need to have an estimate for the size (number of tuples) of the file that results after the JOIN operation.

**Examples
of Cost Functions for JOIN**

To develop reasonably accurate cost functions
for JOIN operations, we need to have an estimate for the size (number of
tuples) of the file that results *after*
the JOIN operation. This is usually kept as a ratio of the size (number of
tuples) of the resulting join file to the size of the CARTESIAN PRODUCT file, if both are applied to the same input files, and it is
called the **join selectivity (** ** js)**.
If we denote the number of tuples of a relation

If there is no join condition *c,* then *js* = 1 and the join is the same as the CARTESIAN PRODUCT. If no tuples from the relations satisfy the join condition, then *js* = 0. In

*C*_{J2c}* *=* b _{R} *+ (|

If a hash key exists for one of the two join
attributes—say, *B* of *S*—we get

*C*_{J2d}* *=* b _{R} *+ (|

where *h*
≥ 1 is the average number of block accesses to retrieve a record,
given its hash key value. Usually, *h*
is estimated to be *1* for static and
linear hashing and *2* for extendible
hashing.

**J3—Sort-merge join. **If the files are already sorted on the join
attributes, the

cost function for this method is

*C*_{J3a}* *=* b _{R}
*+

If we must sort the files, the cost of sorting
must be added. We can use the formulas from Section 19.2 to estimate the
sorting cost.

Example of Using the Cost Functions. Suppose that we have the EMPLOYEE file described in the example in the previous section, and assume
that the DEPARTMENT file in Figure 3.5 consists of *r _{D}* = 125 records stored in

Suppose that we have a primary index on Dnumber of DEPARTMENT with *x*_{Dnumber}= 1 level and a secondary index on Mgr_ssn of DEPARTMENT with selection cardinality *s*_{Mgr_ssn}= 1 and levels* x*_{Mgr_ssn}= 2. Assume that the join selectivity for* *OP6* *is* js*_{OP6}* *=*
*(1/|DEPARTMENT|) = 1/125 because Dnumber is a key of DEPARTMENT. Also assume that the blocking factor for the
resulting join file is *bfr*_{ED}=
4 records per block. We can estimate the worst-case costs for the JOIN operation OP6 using the applicable methods J1 and J2 as
follows:

**1. **Using method J1 with** **EMPLOYEE** **as outer loop:

^{C}_{J1} ^{=
b}_{E}^{+ (b}* _{E}* *

2000 +
(2000 _{*} 13) + (((1/125) _{*} 10,000 _{*} 125)/4) =
30,500

**
**2. Using
method J1 with DEPARTMENT as outer loop:

^{C}_{J1} ^{=
b}_{D}^{+ (b}* _{E}* *

13 + (13
_{*} 2000) + (((1/125) _{*} 10,000 _{*} 125/4) = 28,513

**
**3. Using
method J2 with EMPLOYEE as outer loop:

^{C}_{J2c} ^{=} ^{b}_{E}^{+ (r}* _{E}* *

2000 +
(10,000 _{*} 2) + (((1/125) _{*} 10,000 _{*} 125/4) =
24,500

**
**4. Using
method J2 with DEPARTMENT as outer loop:

^{C}_{J2a} ^{=}
^{b}_{D}^{+ (r}* _{D}* *

13 +
(125 _{*} (2 + 80)) + (((1/125) _{*} 10,000 _{*} 125/4)
= 12,763

Case 4 has the lowest cost estimate and will be
chosen. Notice that in case 2 above, if 15 memory buffers (or more) were
available for executing the join instead of just 3, 13 of them could be used to
hold the entire DEPARTMENT relation (outer loop relation) in memory, one
could be used as buffer for the result, and one would be used to hold one block
at a time of the EMPLOYEE file (inner loop file), and the cost

for case 2 could be drastically reduced to just
*b _{E}* +

*C*_{J1}* *=* b _{D}
*+ (

=13 +
( 13/8
_{*} 2000) + (((1/125) _{*} 10,000 _{*} 125/4) =
28,513

=13 + (2 _{*}
2000) + 2500 = 6,513

As an exercise, the reader should perform a
similar analysis for OP7.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Fundamentals of Database Systems : Query Processing and Optimization, and Database Tuning : Algorithms for Query Processing and Optimization : Examples of Cost Functions for JOIN |

**Related Topics **

Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.