Data anomalies are inconsistencies in the data stored in a database as a result of an operation such as update, insertion, and/or deletion.

__Data Anomalies__

Data anomalies are inconsistencies in the data stored in a database as a result of an operation such as update, insertion, and/or deletion.

Such inconsistencies may arise when have a particular record stored in multiple locations and not all of the copies are updated.

We can prevent such anomalies by implementing 7 different level of normalization called Normal Forms (NF) We’ll only look at the first three.First Normal Form (1NF)

__1st Normal Form:__

__The Requirements:__

The requirements to satisfy the 1st NF:

Each table has a primary key: minimal set of attributes which can uniquely identify a record

The values in each column of a table are atomic (No multi-value attributes allowed). There are no repeating groups: two columns do not store similar information in the same table

__1st Normal Form__

__Example 1:__

__Un-normalized Students table:__

**Student# AdvID AdvName AdvRoom Class1 Class2**

**123 123A James 555 102-8 104-9**

**124 123B Smith 467 209-0 102-8**

**Normalized Students table:**

**Student# AdvID AdvName AdvRoom Class#**

123 123A James 555 102-8

123 123A James 555 104-9

124 123B Smith 467 209-0

124 123B Smith 467 102-8

**Example 2:**

**Title Author1 Author2 ISBN Subject Pages Publisher**

Database Abraham Henry F. 0072958863 MySQL, 1168 McGraw-

System Silberschatz Korth Computers Hill

Concepts

Operating Abraham Henry F. 0471694665 Computers 944 McGraw-

System Silberschatz Korth Hill

Concepts

__Table 1 problems__

l This table is not very efficient with storage.

l This design does not protect data integrity.

l Third, this table does not scale well.

l In our Table 1, we have two violations of First Normal Form:

l First, we have more than one author field,

l Second, our subject field contains more than one piece of information. With more than one value in a single field, it would be very difficult to search for all books on a given subject

l We now have two rows for a single book. Additionally, we would be violating the Second Normal Form…

l A better solution to our problem would be to separate the data into separate tables- an Author table and a Subject table to store our information, removing that information from the Book table:

l Each table has a primary key, used for joining tables together when querying the data. A primary key value must be unique with in the table (no two books can have the same ISBN number), and a primary key is also an index, which speeds up data retrieval based on the primary key.

l Now to define relationships between the tables

__Relationships__

__Second Normal Form (2NF)__

• a form is in 2NF if and only if it is in 1NF and has no attributes which require only part of the key to uniquely identify them

• To do - remove part-key dependencies:

1. where a key has more than one attribute, check that each non-key attribute depends on the whole key and not part of the key

2. for each subset of the key which determines an attribute or group of attributes create a new form. Move the *dependant* attributes to the new form.

3. Add the part key to new form, making it the primary key.

4. Mark the part key as a foreign key in the original form.

• **Result: 2NF forms**

**If each attribute A in a relation schema R meets one of the following criteria**:

It must be in first normal form.

It is not partially dependent on a candidate key.

Every non-key attribute is fully dependent on each candidate key of the relation. Second Normal Form (or 2NF) deals with redundancy of data in vertical columns.

__Example of Second Normal Form:__

__Here is a list of attributes in a table that is in First Normal Form:__

__Department__

**Project_Name**

**Employee_Name**

**Emp_Hire_Date**

**Project_Manager**

Project_Name and Employee_Name are the candidate key Project_Manager are partially depend on the Employee_Name, Therefore, this table will not satisfy the Second Normal Form

In order to satisfy the Second Normal Form, we need to put the Emp_Hire_Date and Project_Manager to other tables. We can put the Emp_Hire_Date to the Employee table and put the Project_Manager to the Project table.

** So now we have three tables:**

__Department____Project__

Project_Name Project_ID

Employee_Name Project_Name

Project_Manager

__Employee__

Employee_ID

Employee_Name

Employee_Hire_Date

Now, the Department table will only have the candidate key left.

__Third Normal Form__

A relation R is in Third Normal Form (3NF) if and only if it is:

in Second Normal Form.

Every non-key attribute is non-transitively dependent on the primary key.

An attribute C is transitively dependent on attribute A if there exists an attribute B such that A -> B and B -> C, then A -> C.

__Example of Third Normal Form:__

__Here is the Second Normal Form of the table for the invoice table:__

**It violates the Third Normal Form because there will be redundancy for having multiple invoice number for the same customer. In this example, Jones had both invoice 1001 and 1003.**

__To solve the problem, we will have to have another table for the Customers__

**By having Customer table, there will be no transitive relationship between the invoice number and the customer name and address. Also, there will not be redundancy on the customer information**

__There will be more examples for the First, Second, and Third Normal Forms.__

The following is the example of a table that change from each of the normal forms.

__Third Normal Form:__

__Functional Dependency of the Second Normal Form:__

SUPPLIER.s# —> SUPPLIER.status (Transitive dependency)

SUPPLIER.s# —> SUPPLIER.city

SUPPLIER.city —> SUPPLIER.status

__Dependency Preservation:__

A FD X! Y is preserved in a relation R if R contains all the attributes of X and Y.

__Why Do We Preserve The Dependency?__

We would like to check easily that updates to the database do not result in illegal relations being created.

It would be nice if our design allowed us to check updates without having to compute natural joins.

__Definition__

A decomposition D = {R1, R2, ..., Rn} of R is dependency-preserving with respect to F if the union of the projections of F on each Ri in D is equivalent to F; that is

__Property of Dependency-Preservation__

If decomposition is not dependency-preserving, therefore, that dependency is lost in the decomposition.

__Example:__

R(A B C D)

FD1: A -> B

FD2: B -> C

FD3: C -> D

Decomposition:

R1(A B C) R2(C D)

FD1: A -> B

FD2: B -> C

FD3: C -> D

__Boyce-Codd Normal Form:__

• A relation is in Boyce-Codd normal form (BCNF) if for every FD A B either

• B is contained in A (the FD is trivial), or

• A contains a candidate key of the relation,

• In other words: every determinant in a non-trivial dependency is a (super) key.

• The same as 3NF except in 3NF we only worry about non-key Bs

• If there is only one candidate key then 3NF and BCNF are the same

__Stream and BCNF__

• Stream is not in BCNF as the FD {Time} -> {Course} is non-trivial and {Time} does not contain a candidate key

__Conversion to BCNF__

**Stream has been put into BCNF but we have lost the FD {Student, Course} **

__Decomposition Properties__

• Lossless: Data should not be lost or created when splitting relations up

• Dependency preservation: It is desirable that FDs are preserved when splitting relations up

• Normalisation to 3NF is always lossless and dependency preserving

• Normalisation to BCNF is lossless, but may not preserve all dependencies

__Higher Normal Forms:__

**Converting to BCNF**

1. The determinant, Offering#, becomes part of the key and the dependant attribute T_Code, becomes a non key attribute. So the Dependency diagram is now

S_Num, Offering# T_Code, Review Date

2. There are problems with this structure as T_Code is now dependant on only part of the key. This violates the rules for 2NF, so the table needs to be divided with the partial dependency becoming a new table. The dependencies would then be

3. The original table is divided into two new tables. Each is in 3NF and in BCNF.

__Note that even relations in BCNF can have anomalies.__

__Anomalies:__

**INSERT**: We cannot record the city for a supplier_no without also knowing the supplier_name

**DELETE**: If we delete the row for a given supplier_name, we lose the information that the** **supplier_no is associated with a given city.

**UPDATE**: Since supplier_name is a candidate key (unique), there are none.

__Definition of MVD__

A multivalued dependency is a *full constraint* between two sets of attributes in a relation.

In contrast to the *functional independency*, the multivalued dependency requires that certain tuples be present in a relation. Therefore, a multivalued dependency is also referred as a *tuple-generating* dependency. The multivalued dependency also plays a role in 4NF normalization.

__Full constraint__

A constraint which expresses something about *all* attributes in a database. (In contrary to an **embedded constraint**.) That a multivalued dependency is a** ***full constraint*** **follows from its** **definition, where it says something about the attributes *R* − β.

__Tuple-generating dependency__

A dependency which explicitly requires certain tuples to be present in the relation.

__A Formal Definition__

Let *R* be a relation schema and let and . The multivalued dependency α ->> β holds on R if, in any legal relation r(R), for all pairs of tuples t1 and

Ø A multivalued dependency on *R*, *X* ->>*Y,* says that if two tuples of *R* agree on all the attributes of *X*, then their components in *Y* may be swapped, and the result will be two tuples that are also in the relation.

**i.e., for each value of X, the values of Y are independent of the values of R-X-Y**

Tuples Implied by name->->phones

If we have tuples:

__Example__

But we cannot swap area codes or phones by themselves. That is, neither name->->areaCode nor name->->phone holds for this relation.

__Properties of MVD__

__The following also involve functional dependencies:__

__MVD Rules__

• Every FD is an MVD

– If X ‐>Y, then swapping Y ’s between two tuples that agree on X doesn’t change the tuples.

– Therefore, the “new” tuples are surely in the relation, and we know X ‐>‐>Y.

• Definition of keys depend on FDs and not MDs.

n Requires that other tuples of a certain form be present in the relation.

n Also referred to as:

**“Tuple-generating dependency”**

__We can use multivalued dependencies__

n To test relations to determine whether they are legal under a given set of functional and multivalued dependencies

n To specify constraints on the set of legal relations.

__4th Normal Form__

A Boyce Codd normal form relation is in fourth normal form if

Ø there is no multi value dependency in the relation or

Ø there are multi value dependency but the attributes, which are multi value dependent on a specific attribute, are dependent between themselves.

Ø This is best discussed through mathematical notation.

Ø Assume the following relation

(a) **R(a:pk1, b:pk2, c:pk3)**

Ø Recall that a relation is in BCNF if all its determinant are candidate keys, in other words each determinant can be used as a primary key.

Ø Because relation **R** has only one determinant **(a, b, c)**, which is the composite primary key and since the primary is a candidate key therefore R is in BCNF.

__Now R may or may not be in fourth normal form.__

Ø If R contains **no multi value dependency** then R will be in Fourth normal form.

Ø Assume R has the following two-multi value dependencies:

** a --->> b** **and** **a --->> c**

In this case R will be in the fourth normal form if **b** and **c** __dependent on each other__.

Ø However if b and **c are independent of each other** then __R is not____in__ __fourth____normal form__ and the relation has to be projected to following two non-loss projections. These non-loss projections will be in fourth normal form.

**Many-to-many relationships**

Ø Fourth Normal Form applies to situations involving many-to-many relationships.

Ø In relational databases, many-to-many relationships are expressed through cross-reference tables.

**Note about FDs and MVDs**

Every Functional Dependency is a MVD

(if A1A2…An ® B1B2…Bn , then A1A2…An Þ B1B2…Bn )

Ø FDs rule out certain tuples (i.e. if A ® B then two tuples will not have the same value for A and different values for B)

Ø MVDs do not rule out tuples. They guarantee that certain tuples must exist

**4th NF and Anomalies**

n As an example of the anomalies that can occur, realize that it is not possible to add a new class taught by some teacher without adding at least one student who is enrolled in this class.

n To achieve 4NF, represent each independent many-to-many relationship through its own cross-reference table.

__Case 1:__

**Assume the following relation:**

**Employee (Eid:pk1, Language:pk2, Skill:pk3)**

**No multi value dependency, therefore R is in fourth normal form.**

__Case 2:__

Assume the following relation with multi-value dependency:

**Employee (Eid:pk1, Languages:pk2, Skills:pk3)**

**Eid --->> Languages** **Eid --->> Skills**

Languages and Skills are dependent.

This says an employee speak several languages and has several skills. However for each skill a specific language is used when that skill is practiced

**Thus employee 100 when he/she teaches speaks English but when he cooks speaks French. This relation is in fourth normal form and does not suffer from any anomalies.**

__Case 3:__

Assume the following relation with multi-value dependency:

**Employee (Eid:pk1, Languages:pk2, Skills:pk3)**

**Eid --->> Languages** **Eid --->> Skills**

Languages and Skills are __independent__.

This relation is __not__ in fourth normal form and suffers from all three types of anomalies

** Insertion anomaly: **To insert row (200 English Cooking) we have to insert two extra rows (200 Arabic

** Deletion anomaly: **If employee 100 discontinues politic skill we have to delete two rows: (100 Kurdish Politic), and (100 English Politic) otherwise the database will be inconsistent.

__FIFTH NORMAL FORM__

R is in 5NF if and only if every join dependency in R is implied by the candidate keys of R

w 5NF is always achievable.

w a join dependency, * (A, B, …, Z), is implied by the candidate keys, K1, …, Km of R

w the fact that K1, …, Km are candidate keys for R determine the fact that R has the JD * (A, B, …, Z)

__Problems in bringing a relation to 5NF__

w check whether all JDs are implied by the candidate keys

§ Fagin : provides an algorithm for doing this for any relation, given all its JDs and all candidate keys

w discover **all** JDs for a given relation

§ they do not have that intuitive meaning as FDs or MVDs

w A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation

w Implies that relations that have been decomposed in previous normal forms can be recombined via natural joins to recreate the original relation.

__Usage__

Only in rare situations does a __4NF__ table not conform to 5NF. These are situations in which a complex real-world constraint governing the valid combinations of attribute values in the 4NF table is not implicit in the structure of that table. If such a table is not normalized to 5NF, the burden of maintaining the logical consistency of the data within the table must be carried partly by the application responsible for insertions, deletions, and updates to it; and there is a heightened risk that the data within the table will become inconsistent. In contrast, the 5NF design excludes the possibility of such inconsistencies.

Example 2:

Consider a relation Supply (sname, partName, projName). Add the additional constraint that:

If project j requires part p

and supplier s supplies part p

and supplier s supplies at least one item to project j Then

supplier s also supplies part p to project

Let R be in BCNF and let R have no composite keys. Then R is in 5NF

Note: That only joining all three relations together will get you back to the original relation. Joining any two will create spurious tuples!

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Database Management Systems : Introduction to DBMS : Data Anomalies |

**Related Topics **

Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.