Chapter: Fundamentals of Database Systems : Database Design Theory and Normalization : Relational Database Design Algorithms and Further Dependencies

Algorithms for Relational Database Schema Design

1. Dependency-Preserving Decomposition into 3NF Schemas 2. Nonadditive Join Decomposition into BCNF Schemas 3. Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF Schemas

Algorithms for Relational Database Schema Design

We now give three algorithms for creating a relational decomposition from a uni-versal relation. Each algorithm has specific properties, as we discuss next.

1. Dependency-Preserving Decomposition into 3NF Schemas

Algorithm 16.4 creates a dependency-preserving decomposition D = {R₁, R₂, ..., R_m} of a universal relation R based on a set of functional dependencies F, such that each R_i in D is in 3NF. It guarantees only the dependency-preserving property; it does not guarantee the nonadditive join property. The first step of Algorithm 16.4 is to find a minimal cover G for F; Algorithm 16.2 can be used for this step. Note that multiple minimal covers may exist for a given set F (as we illustrate later in the example after Algorithm 16.4). In such cases the algorithms can potentially yield multiple alternative designs.

Algorithm 16.4. Relational Synthesis into 3NF with Dependency Preservation

Input: A universal relation R and a set of functional dependencies F on the attributes of R.

Find a minimal cover G for F (use Algorithm 16.2);

For each left-hand-side X of a functional dependency that appears in G, cre-ate a relation schema in D with attributes {X ∪ {A₁} ∪ {A₂} ... ∪ {A_k} }, where X → A₁, X → A₂, ..., X → A_k are the only dependencies in G with X as the left-hand-side (X is the key of this relation);

Place any remaining attributes (that have not been placed in any relation) in a single relation schema to ensure the attribute preservation property.

Example of Algorithm 16.4. Consider the following universal relation:

U(Emp_ssn, Pno, Esal, Ephone, Dno, Pname, Plocation)

Emp_ssn, Esal, Ephone refer to the Social Security number, salary, and phone number of the employee. Pno, Pname, and Plocation refer to the number, name, and location of the project. Dno is department number.

The following dependencies are present:

FD1: Emp_ssn → {Esal, Ephone, Dno}

FD2: Pno → { Pname, Plocation}

FD3: Emp_ssn, Pno → {Esal, Ephone, Dno, Pname, Plocation}

By virtue of FD3, the attribute set {Emp_ssn, Pno} represents a key of the universal relation. Hence F, the set of given FDs includes {Emp_ssn → Esal, Ephone, Dno;

Pno → Pname, Plocation; Emp_ssn, Pno → Esal, Ephone, Dno, Pname, Plocation}.

By applying the minimal cover Algorithm 16.2, in step 3 we see that Pno is a redun-dant attribute in Emp_ssn, Pno → Esal, Ephone, Dno. Moreover, Emp_ssn is redun-dant in Emp_ssn, Pno → Pname, Plocation. Hence the minimal cover consists of FD1 and FD2 only (FD3 being completely redundant) as follows (if we group attributes with the same left-hand side into one FD):

Minimal cover G: {Emp_ssn → Esal, Ephone, Dno; Pno → Pname, Plocation}

By applying Algorithm 16.4 to the above Minimal cover G, we get a 3NF design con-sisting of two relations with keys Emp_ssn and Pno as follows:

R₁ (Emp_ssn, Esal, Ephone, Dno)

R₂ (Pno, Pname, Plocation)

An observant reader would notice easily that these two relations have lost the original information contained in the key of the universal relation U (namely, that there are certain employees working on certain projects in a many-to-many relationship). Thus, while the algorithm does preserve the original dependencies, it makes no guarantee of preserving all of the information. Hence, the resulting design is a lossy design.

Claim 3. Every relation schema created by Algorithm 16.4 is in 3NF. (We will not provide a formal proof here;⁶ the proof depends on G being a minimal set of dependencies.)

It is obvious that all the dependencies in G are preserved by the algorithm because each dependency appears in one of the relations R_i in the decomposition D. Since G is equivalent to F, all the dependencies in F are either preserved directly in the decomposition or are derivable using the inference rules from Section 16.1.1 from those in the resulting relations, thus ensuring the dependency preservation prop-erty. Algorithm 16.4 is called a relational synthesis algorithm, because each relation schema R_i in the decomposition is synthesized (constructed) from the set of functional dependencies in G with the same left-hand-side X.

2. Nonadditive Join Decomposition into BCNF Schemas

The next algorithm decomposes a universal relation schema R = {A₁, A₂, ..., A_n} into a decomposition D = {R₁, R₂, ..., R_m} such that each R_i is in BCNF and the decomposition D has the lossless join property with respect to F. Algorithm 16.5 utilizes Property NJB and Claim 2 (preservation of nonadditivity in successive decompositions) to create a nonadditive join decomposition D = {R₁, R₂, ..., R_m} of a universal relation R based on a set of functional dependencies F, such that each R_i in D is in BCNF.

Algorithm 16.5. Relational Decomposition into BCNF with Nonadditive Join Property

Input: A universal relation R and a set of functional dependencies F on the attributes of R.

Set D := {R} ;

While there is a relation schema Q in D that is not in BCNF do

{

choose a relation schema Q in D that is not in BCNF;

find a functional dependency X → Y in Q that violates BCNF; replace Q in D by two relation schemas (Q – Y) and (X ∪ Y);

} ;

Each time through the loop in Algorithm 16.5, we decompose one relation schema Q that is not in BCNF into two relation schemas. According to Property NJB for binary decompositions and Claim 2, the decomposition D has the nonadditive join property. At the end of the algorithm, all relation schemas in D will be in BCNF. The reader can check that the normalization example in Figures 15.12 and 15.13 basi-cally follows this algorithm. The functional dependencies FD3, FD4, and later FD5 violate BCNF, so the LOTS relation is decomposed appropriately into BCNF rela-tions, and the decomposition then satisfies the nonadditive join property. Similarly, if we apply the algorithm to the TEACH relation schema from Figure 15.14, it is decomposed into TEACH1(Instructor, Student) and TEACH2(Instructor, Course) because the dependency FD2 Instructor → Course violates BCNF.

In step 2 of Algorithm 16.5, it is necessary to determine whether a relation schema Q is in BCNF or not. One method for doing this is to test, for each functional dependency X → Y in Q, whether X⁺ fails to include all the attributes in Q, thereby determining whether or not X is a (super)key in Q. Another technique is based on an observation that whenever a relation schema Q has a BCNF violation, there exists a pair of attributes A and B in Q such that {Q – {A, B} } → A; by computing the closure {Q – {A, B} }⁺ for each pair of attributes {A, B} of Q, and checking whether the closure includes A (or B), we can determine whether Q is in BCNF.

3. Dependency-Preserving and Nonadditive (Lossless) Join Decomposition into 3NF Schemas

So far, in Algorithm 16.4 we showed how to achieve a 3NF design with the potential for loss of information and in Algorithm 16.5 we showed how to achieve BCNF design with the potential loss of certain functional dependencies. By now we know that it is not possible to have all three of the following: (1) guaranteed nonlossy design, (2) guaranteed dependency preservation, and (3) all relations in BCNF. As we have said before, the first condition is a must and cannot be compromised. The second condition is desirable, but not a must, and may have to be relaxed if we insist on achieving BCNF. Now we give an alternative algorithm where we achieve conditions 1 and 2 and only guarantee 3NF. A simple modification to Algorithm 16.4, shown as Algorithm 16.6, yields a decomposition D of R that does the following:

Preserves dependencies

Has the nonadditive join property

Is such that each resulting relation schema in the decomposition is in 3NF

Because the Algorithm 16.6 achieves both the desirable properties, rather than only functional dependency preservation as guaranteed by Algorithm 16.4, it is preferred over Algorithm 16.4.

Algorithm 16.6. Relational Synthesis into 3NF with Dependency Preservation and Nonadditive Join Property

Input: A universal relation R and a set of functional dependencies F on the attributes of R.

Find a minimal cover G for F (use Algorithm 16.2).

If none of the relation schemas in D contains a key of R, then create one more relation schema in D that contains attributes that form a key of R.⁷ (Algorithm 16.2(a) may be used to find a key.)

Eliminate redundant relations from the resulting set of relations in the relational database schema. A relation R is considered redundant if R is a projection of another relation S in the schema; alternately, R is subsumed by S.

Step 3 of Algorithm 16.6 involves identifying a key K of R. Algorithm 16.2(a) can be used to identify a key K of R based on the set of given functional dependencies F. Notice that the set of functional dependencies used to determine a key in Algorithm 16.2(a) could be either F or G, since they are equivalent.

Example 1 of Algorithm 16.6. Let us revisit the example given earlier at the end of Algorithm 16.4. The minimal cover G holds as before. The second step produces relations R₁ and R₂ as before. However, now in step 3, we will generate a relation corresponding to the key {Emp_ssn, Pno}. Hence, the resulting design contains:

R₁ (Emp_ssn , Esal, Ephone, Dno)

R₂ (Pno, Pname, Plocation)

R₃ (Emp_ssn, Pno)

This design achieves both the desirable properties of dependency preservation and nonadditive join.

Example 2 of Algorithm 16.6 (Case X ). Consider the relation schema LOTS1A shown in Figure 15.13(a). Assume that this relation is given as a universal relation with the following functional dependencies:

FD1: Property_id → Lot#, County, Area

FD2: Lot#, County → Area, Property_id

FD3: Area → County

These were called FD1, FD2, and FD5 in Figure 15.13(a). The meanings of the above attributes and the implication of the above functional dependencies were explained in Section 15.4. For ease of reference, let us abbreviate the above attributes with the first letter for each and represent the functional dependencies as the set

F : { P → LCA, LC → AP, A → C }.

If we apply the minimal cover Algorithm 16.2 to F, (in step 2) we first represent the set F as

F : {P → L, P → C, P → A, LC → A, LC → P, A → C}.

In the set F, P → A can be inferred from P → LC and LC → A; hence P → A by tran-sitivity and is therefore redundant. Thus, one possible minimal cover is

Minimal cover GX: {P → LC, LC → AP, A → C }.

In step 2 of Algorithm 16.6 we produce design X (before removing redundant rela-tions) using the above minimal cover as

Design X: R₁ (P, L, C), R₂ (L, C, A, P), and R₃ (A, C).

In step 4 of the algorithm, we find that R₃ is subsumed by R₂ (that is, R₃ is always a projection of R₂ and R₁ is a projection of R₂ as well. Hence both of those relations are redundant. Thus the 3NF schema that achieves both of the desirable properties is (after removing redundant relations)

Design X: R₂ (L, C, A, P).

or, in other words it is identical to the relation LOTS1A (Lot#, County, Area, Property_id) that we had determined to be in 3NF in Section 15.4.2.

Example 2 of Algorithm 16.6 (Case Y ). Starting with LOTS1A as the universal relation and with the same given set of functional dependencies, the second step of the minimal cover Algorithm 16.2 produces, as before

F: {P → C, P → A, P → L, LC → A, LC → P, A → C}.

The FD LC → A may be considered redundant because LC → P and P → A implies LC → A by transitivity. Also, P → C may be considered to be redundant because P → A and A → C implies P → C by transitivity. This gives a different minimal cover as

Minimal cover GY: { P → LA, LC → P, A → C }.

The alternative design Y produced by the algorithm now is

Design Y: S₁ (P, A, L), S₂ (L, C, P), and S₃ (A, C).

Note that this design has three 3NF relations, none of which can be considered as redundant by the condition in step 4. All FDs in the original set F are preserved. The reader will notice that out of the above three relations, relations S₁ and S₃ were pro-duced as the BCNF design by the procedure given in Section 15.5 (implying that S₂ is redundant in the presence of S₁ and S₃). However, we cannot eliminate relation S₂ from the set of three 3NF relations above since it is not a projection of either S₁ or S₃. Design Y therefore remains as one possible final result of applying Algorithm 16.6 to the given universal relation that provides relations in 3NF.

It is important to note that the theory of nonadditive join decompositions is based on the assumption that no NULL values are allowed for the join attributes. The next section discusses some of the problems that NULLs may cause in relational decom-positions and provides a general discussion of the algorithms for relational design by synthesis presented in this section.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Fundamentals of Database Systems : Database Design Theory and Normalization : Relational Database Design Algorithms and Further Dependencies : Algorithms for Relational Database Schema Design |