Chapter: Fundamentals of Database Systems : Database Design Theory and Normalization : Relational Database Design Algorithms and Further Dependencies

Further Topics in Functional Dependencies: Inference Rules, Equivalence, and Minimal Cover

1. Inference Rules for Functional Dependencies 2. Equivalence of Sets of Functional Dependencies 3. Minimal Sets of Functional Dependencies

Further Topics in Functional Dependencies: Inference Rules, Equivalence, and Minimal Cover

We introduced the concept of functional dependencies (FDs) in Section 15.2, illustrated it with some examples, and developed a notation to denote multiple FDs over a single relation. We identified and discussed problematic functional dependencies in Sections 15.3 and 15.4 and showed how they can be eliminated by a proper decomposition of a relation. This process was described as normalization and we showed how to achieve the first through third normal forms (1NF through 3NF) given primary keys in Section 15.3. In Sections 15.4 and 15.5 we provided generalized tests for 2NF, 3NF, and BCNF given any number of candidate keys in a relation and showed how to achieve them. Now we return to the study of functional dependencies and show how new dependencies can be inferred from a given set and dis-cuss the concepts of closure, equivalence, and minimal cover that we will need when we later consider a synthesis approach to design of relations given a set of FDs.

1. Inference Rules for Functional Dependencies

We denote by F the set of functional dependencies that are specified on relation schema R. Typically, the schema designer specifies the functional dependencies that are semantically obvious; usually, however, numerous other functional dependencies hold in all legal relation instances among sets of attributes that can be derived from and satisfy the dependencies in F. Those other dependencies can be inferred or deduced from the FDs in F.

In real life, it is impossible to specify all possible functional dependencies for a given

situation. For example, if each department has one manager, so that

uniquely determines Mgr_ssn (Dept_no → Mgr_ssn), and a manager has a unique phone number called Mgr_phone (Mgr_ssn → Mgr_phone), then these two dependencies together imply that Dept_no → Mgr_phone. This is an inferred FD and need not be explicitly stated in addition to the two given FDs. Therefore, it is useful to define a concept called closure formally that includes all possible dependencies that can be inferred from the given set F.

Definition. Formally, the set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F; it is denoted by F⁺.

For example, suppose that we specify the following set F of obvious functional dependencies on the relation schema in Figure 15.3(a):

F = {Ssn → {Ename, Bdate, Address, Dnumber}, Dnumber → {Dname, Dmgr_ssn} }

Some of the additional functional dependencies that we can infer from F are the fol-lowing:

Ssn → {Dname, Dmgr_ssn}

Ssn → Ssn

Dnumber → Dname

An FD X → Y is inferred from a set of dependencies F specified on R if X → Y holds in every legal relation state r of R; that is, whenever r satisfies all the dependencies in F, X → Y also holds in r. The closure F⁺ of F is the set of all functional dependencies that can be inferred from F. To determine a systematic way to infer dependencies, we must discover a set of inference rules that can be used to infer new dependencies from a given set of dependencies. We consider some of these inference rules next. We use the notation F |=X → Y to denote that the functional dependency X → Y is inferred from the set of functional dependencies F.

In the following discussion, we use an abbreviated notation when discussing functional dependencies. We concatenate attribute variables and drop the commas for convenience. Hence, the FD {X,Y} → Z is abbreviated to XY → Z, and the FD {X, Y, Z} → {U, V} is abbreviated to XYZ → UV. The following six rules IR1 through IR6 are well-known inference rules for functional dependencies:

IR1 (reflexive rule)¹: If X ⊇ Y, then X →Y.

IR2 (augmentation rule)²: {X → Y} |=XZ → YZ.

IR3 (transitive rule): {X → Y, Y → Z} |=X → Z.

IR4 (decomposition, or projective, rule): {X → YZ} |=X → Y.

IR5 (union, or additive, rule): {X → Y, X → Z} |=X → YZ.

IR6 (pseudotransitive rule): {X → Y, WY → Z} |=WX → Z.

The reflexive rule (IR1) states that a set of attributes always determines itself or any of its subsets, which is obvious. Because IR1 generates dependencies that are always true, such dependencies are called trivial. Formally, a functional dependency X → Y is trivial if X ⊇ Y; otherwise, it is nontrivial. The augmentation rule (IR2) says that adding the same set of attributes to both the left- and right-hand sides of a dependency results in another valid dependency. According to IR3, functional dependencies are transitive. The decomposition rule (IR4) says that we can remove attributes from the right-hand side of a dependency; applying this rule repeatedly can decompose the FD X → {A₁, A₂, ..., A_n} into the set of dependencies {X → A₁, X → A₂, ..., X → A_n}. The union rule (IR5) allows us to do the opposite; we can combine a set of dependencies {X → A₁, X → A₂, ..., X → A_n} into the single FD X → {A₁, A₂, ..., A_n}. The pseudotransitive rule (IR6) allows us to replace a set of attributes Y on the left hand side of a dependency with another set X that functionally determines Y, and can be derived from IR2 and IR3 if we augment the first functional dependency X → Y with W (the augmentation rule) and then apply the transitive rule.

One cautionary note regarding the use of these rules. Although X → A and X → B implies X → AB by the union rule stated above, X → A and Y → B does imply that XY → AB. Also, XY → A does not necessarily imply either X → A or Y → A.

Each of the preceding inference rules can be proved from the definition of functional dependency, either by direct proof or by contradiction. A proof by contradiction assumes that the rule does not hold and shows that this is not possible. We now prove that the first three rules IR1 through IR3 are valid. The second proof is by contradiction.

Proof of IR1. Suppose that X ⊇ Y and that two tuples t₁ and t₂ exist in some relation instance r of R such that t₁ [X] = t₂ [X]. Then t₁[Y] = t₂[Y] because X ⊇ Y; hence, X → Y must hold in r.

Proof of IR2 (by contradiction). Assume that X → Y holds in a relation instance r of R but that XZ → YZ does not hold. Then there must exist two tuples t₁ and t₂ in r such that (1) t₁ [X] = t₂ [X], (2) t₁ [Y] = t₂ [Y], (3) t₁ [XZ] = t₂ [XZ], and

t₁ [YZ] ≠ t₂ [YZ]. This is not possible because from (1) and (3) we deduce

t₁ [Z] = t₂ [Z], and from (2) and (5) we deduce (6) t₁ [YZ] = t₂ [YZ], contra-dicting (4).

Proof of IR3. Assume that (1) X → Y and (2) Y → Z both hold in a relation r. Then for any two tuples t₁ and t₂ in r such that t₁ [X] = t₂ [X], we must have (3) t₁ [Y] = t₂ [Y], from assumption (1); hence we must also have (4) t₁ [Z] = t₂ [Z] from (3) and assumption (2); thus X → Z must hold in r.

Using similar proof arguments, we can prove the inference rules IR4 to IR6 and any additional valid inference rules. However, a simpler way to prove that an inference rule for functional dependencies is valid is to prove it by using inference rules that have already been shown to be valid. For example, we can prove IR4 through IR6 by using IR1 through IR3 as follows.

Proof of IR4 (Using IR1 through IR3).

X → YZ (given).

YZ → Y (using IR1 and knowing that YZ ⊇ Y).

X → Y (using IR3 on 1 and 2).

Proof of IR5 (using IR1 through IR3).

X →Y (given).

X → Z (given).

X → XY (using IR2 on 1 by augmenting with X; notice that XX = X).

XY → YZ (using IR2 on 2 by augmenting with Y).

X → YZ (using IR3 on 3 and 4).

Proof of IR6 (using IR1 through IR3).

X → Y (given).

WY → Z (given).

WX → WY (using IR2 on 1 by augmenting with W).

WX → Z (using IR3 on 3 and 2).

It has been shown by Armstrong (1974) that inference rules IR1 through IR3 are sound and complete. By sound, we mean that given a set of functional dependencies F specified on a relation schema R, any dependency that we can infer from F by using IR1 through IR3 holds in every relation state r of R that satisfies the dependencies in F. By complete, we mean that using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in the complete set of all possible dependencies that can be inferred from F. In other words, the set of dependencies F⁺, which we called the closure of F, can be determined from F by using only inference rules IR1 through IR3. Inference rules IR1 through IR3 are known as Armstrong’s inference rules.³

Typically, database designers first specify the set of functional dependencies F that can easily be determined from the semantics of the attributes of R; then IR1, IR2, and IR3 are used to infer additional functional dependencies that will also hold on R. A systematic way to determine these additional functional dependencies is first to determine each set of attributes X that appears as a left-hand side of some func-tional dependency in F and then to determine the set of all attributes that are dependent on X.

Definition. For each such set of attributes X, we determine the set X⁺ of attrib-utes that are functionally determined by X based on F; X⁺ is called the closure of X under F. Algorithm 16.1 can be used to calculate X⁺.

Algorithm 16.1. Determining X⁺, the Closure of X under F

Input: A set F of FDs on a relation schema R, and a set of attributes X, which is a subset of R.

X⁺ := X;

repeat

oldX⁺ := X⁺;

for each functional dependency Y → Z in F do if X⁺ ⊇ Y then X⁺ := X⁺ ∪ Z;

until (X⁺ = oldX⁺);

Algorithm 16.1 starts by setting X⁺ to all the attributes in X. By IR1, we know that all these attributes are functionally dependent on X. Using inference rules IR3 and IR4, we add attributes to X⁺, using each functional dependency in F. We keep going through all the dependencies in F (the repeat loop) until no more attributes are added to X⁺ during a complete cycle (of the for loop) through the dependencies in F. For example, consider the relation schema EMP_PROJ in Figure 15.3(b); from the semantics of the attributes, we specify the following set F of functional dependen-cies that should hold on EMP_PROJ:

F = {Ssn → Ename,

Pnumber → {Pname, Plocation},

{Ssn, Pnumber} → Hours}

Using Algorithm 16.1, we calculate the following closure sets with respect to F:

{Ssn} ⁺ = {Ssn, Ename}

{Pnumber} ⁺ = {Pnumber, Pname, Plocation}

{Ssn, Pnumber} ⁺ = {Ssn, Pnumber, Ename, Pname, Plocation, Hours}

Intuitively, the set of attributes in the right-hand side in each line above represents all those attributes that are functionally dependent on the set of attributes in the left-hand side based on the given set F.

2. Equivalence of Sets of Functional Dependencies

In this section we discuss the equivalence of two sets of functional dependencies. First, we give some preliminary definitions.

Definition. A set of functional dependencies F is said to cover another set of functional dependencies E if every FD in E is also in F⁺; that is, if every dependency in E can be inferred from F; alternatively, we can say that E is covered by F.

Definition. Two sets of functional dependencies E and F are equivalent if E⁺ = F⁺. Therefore, equivalence means that every FD in E can be inferred from F, and every FD in F can be inferred from E; that is, E is equivalent to F if both the conditions—E covers F and F covers E—hold.

We can determine whether F covers E by calculating X⁺ with respect to F for each FD X → Y in E, and then checking whether this X⁺ includes the attributes in Y. If this is the case for every FD in E, then F covers E. We determine whether E and F are equivalent by checking that E covers F and F covers E. It is left to the reader as an exercise to show that the following two sets of FDs are equivalent:

F = {A → C, AC → D, E → AD, E → H} and G = {A → CD, E → AH}.

3. Minimal Sets of Functional Dependencies

Informally, a minimal cover of a set of functional dependencies E is a set of functional dependencies F that satisfies the property that every dependency in E is in the closure F⁺ of F. In addition, this property is lost if any dependency from the set F is removed; F must have no redundancies in it, and the dependencies in F are in a standard form. To satisfy these properties, we can formally define a set of functional dependencies F to be minimal if it satisfies the following conditions:

Every dependency in F has a single attribute for its right-hand side.

We cannot replace any dependency X → A in F with a dependency Y → A, where Y is a proper subset of X, and still have a set of dependencies that is equivalent to F.

We cannot remove any dependency from F and still have a set of dependencies that is equivalent to F.

We can think of a minimal set of dependencies as being a set of dependencies in a standard or canonical form and with no redundancies. Condition 1 just represents every dependency in a canonical form with a single attribute on the right-hand side. Conditions 2 and 3 ensure that there are no redundancies in the dependencies either by having redundant attributes on the left-hand side of a dependency (Condition 2) or by having a dependency that can be inferred from the remaining FDs in F (Condition 3).

Definition. A minimal cover of a set of functional dependencies E is a minimal set of dependencies (in the standard canonical form and without redundancy) that is equivalent to E. We can always find at least one minimal cover F for any set of dependencies E using Algorithm 16.2.

If several sets of FDs qualify as minimal covers of E by the definition above, it is cus-tomary to use additional criteria for minimality. For example, we can choose the minimal set with the smallest number of dependencies or with the smallest total length (the total length of a set of dependencies is calculated by concatenating the dependencies and treating them as one long character string).

Algorithm 16.2. Finding a Minimal Cover F for a Set of Functional Dependencies E

Input: A set of functional dependencies E.

Set F := E.

Replace each functional dependency X → {A₁, A₂, ..., A_n} in F by the n func-tional dependencies X →A₁, X →A₂, ..., X → A_n.

For each functional dependency X → A in F

for each attribute B that is an element of X

if { {F – {X → A} } ∪ { (X – {B} ) → A} } is equivalent to F then replace X → A with (X – {B} ) → A in F.

4. For each remaining functional dependency X → A in F if {F – {X → A} } is equivalent to F,

then remove X → A from F.

We illustrate the above algorithm with the following:

Let the given set of FDs be E : {B → A, D → A, AB → D}. We have to find the mini-mal cover of E.

All above dependencies are in canonical form (that is, they have only one

attribute on the right-hand side), so we have completed step 1 of Algorithm 16.2 and can proceed to step 2. In step 2 we need to determine if AB → D has any redundant attribute on the left-hand side; that is, can it be replaced by B → D or A → D?

Since B → A, by augmenting with B on both sides (IR2), we have BB → AB, or B → AB (i). However, AB → D as given (ii).

Hence by the transitive rule (IR3), we get from (i) and (ii), B → D. Thus AB → D may be replaced by B → D.

We now have a set equivalent to original E, say E : {B → A, D → A, B → D}. No further reduction is possible in step 2 since all FDs have a single attribute on the left-hand side.

In step 3 we look for a redundant FD in E . By using the transitive rule on B → D and D → A, we derive B → A. Hence B → A is redundant in E and can be eliminated.

Therefore, the minimal cover of E is {B → D, D → A}.

In Section 16.3 we will see how relations can be synthesized from a given set of dependencies E by first finding the minimal cover F for E.

Next, we provide a simple algorithm to determine the key of a relation:

Algorithm 16.2(a). Finding a Key K for R Given a set F of Functional Dependencies

Input: A relation R and a set of functional dependencies F on the attributes of R.

Set K := R.

For each attribute A in K

{compute (K – A)⁺ with respect to F;

if (K – A)⁺ contains all the attributes in R, then set K := K – {A} };

In Algoritm 16.2(a), we start by setting K to all the attributes of R; we then remove one attribute at a time and check whether the remaining attributes still form a superkey. Notice, too, that Algorithm 16.2(a) determines only one key out of the possible candidate keys for R; the key returned depends on the order in which attributes are removed from R in step 2.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Fundamentals of Database Systems : Database Design Theory and Normalization : Relational Database Design Algorithms and Further Dependencies : Further Topics in Functional Dependencies: Inference Rules, Equivalence, and Minimal Cover |