Database Design Theory and Normalization
Basics of Functional Dependencies and Normalization for Relational Databases
In Chapters 3 through 6, we presented various aspects of the relational model and the languages associated with it. Each relation schema consists of a number of attributes, and the relational database schema consists of a number of relation schemas. So far, we have assumed that attributes are grouped to form a relation schema by using the common sense of the database designer or by mapping a database schema design from a conceptual data model such as the ER or Enhanced-ER (EER) data model. These models make the designer identify entity types and relationship types and their respective attributes, which leads to a natural and logical grouping of the attributes into relations when the mapping procedures discussed in Chapter 9 are followed. However, we still need some formal way of analyzing why one grouping of attributes into a rela-tion schema may be better than another. While discussing database design in Chapters 7 through 10, we did not develop any measure of appropriateness or goodness to measure the quality of the design, other than the intuition of the designer. In this chapter we discuss some of the theory that has been developed with the goal of evaluating relational schemas for design quality—that is, to measure for-mally why one set of groupings of attributes into relation schemas is better than another.
There are two levels at which we can discuss the goodness of relation schemas. The first is the logical (or conceptual) level—how users interpret the relation schemas and the meaning of their attributes. Having good relation schemas at this level enables users to understand clearly the meaning of the data in the relations, and hence to formulate their queries correctly. The second is the implementation (or physical storage) level—how the tuples in a base relation are stored and updated. This level applies only to schemas of base relations—which will be physically stored as files—whereas at the logical level we are interested in schemas of both base rela-tions and views (virtual relations). The relational database design theory developed in this chapter applies mainly to base relations, although some criteria of appropri-ateness also apply to views, as shown in Section 15.1.
As with many design problems, database design may be performed using two approaches: bottom-up or top-down. A bottom-up design methodology (also called design by synthesis) considers the basic relationships among individual attributes as the starting point and uses those to construct relation schemas. This approach is not very popular in practice because it suffers from the problem of having to collect a large number of binary relationships among attributes as the starting point. For practical situations, it is next to impossible to capture binary relationships among all such pairs of attributes. In contrast, a top-down design methodology (also called design by analysis) starts with a number of groupings of attributes into relations that exist together naturally, for example, on an invoice, a form, or a report. The relations are then analyzed individually and collectively, leading to further decomposition until all desirable properties are met. The theory described in this chapter is applicable to both the top-down and bottom-up design approaches, but is more appropriate when used with the top-down approach.
Relational database design ultimately produces a set of relations. The implicit goals of the design activity are information preservation and minimum redundancy. Information is very hard to quantify—hence we consider information preservation in terms of maintaining all concepts, including attribute types, entity types, and relationship types as well as generalization/specialization relationships, which are described using a model such as the EER model. Thus, the relational design must preserve all of these concepts, which are originally captured in the conceptual design after the conceptual to logical design mapping. Minimizing redundancy implies minimizing redundant storage of the same information and reducing the need for multiple updates to maintain consistency across multiple copies of the same information in response to real-world events that require making an update.
We start this chapter by informally discussing some criteria for good and bad rela-tion schemas in Section 15.1. In Section 15.2, we define the concept of functional dependency, a formal constraint among attributes that is the main tool for formally measuring the appropriateness of attribute groupings into relation schemas. In Section 15.3, we discuss normal forms and the process of normalization using func-tional dependencies. Successive normal forms are defined to meet a set of desirable constraints expressed using functional dependencies. The normalization procedure consists of applying a series of tests to relations to meet these increasingly stringent requirements and decompose the relations when necessary. In Section 15.4, we dis cuss more general definitions of normal forms that can be directly applied to any given design and do not require step-by-step analysis and normalization. Sections
to 15.7 discuss further normal forms up to the fifth normal form. In Section
we introduce the multivalued dependency (MVD), followed by the join dependency (JD) in Section 15.7. Section 15.8 summarizes the chapter.
Chapter 16 continues the development of the theory related to the design of good relational schemas. We discuss desirable properties of relational decomposition— nonadditive join property and functional dependency preservation property. A general algorithm that tests whether or not a decomposition has the nonadditive (or lossless) join property (Algorithm 16.3 is also presented). We then discuss properties of functional dependencies and the concept of a minimal cover of dependencies. We consider the bottom-up approach to database design consisting of a set of algo-rithms to design relations in a desired normal form. These algorithms assume as input a given set of functional dependencies and achieve a relational design in a tar-get normal form while adhering to the above desirable properties. In Chapter 16 we also define additional types of dependencies that further enhance the evaluation of the goodness of relation schemas.
If Chapter 16 is not covered in a course, we recommend a quick introduction to the desirable properties of decomposition and the discussion of Property NJB in Section 16.2.
Copyright © 2018-2020 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.