Reliability and Integrity
Databases amalgamate data
from many sources, and users expect a DBMS to provide access to the data in a
reliable way. When software engineers say that software has reliability, they mean that the
software runs for very long periods of time without failing. Users certainly
expect a DBMS to be reliable, since the data usually are key to business or
organizational needs. Moreover, users entrust their data to a DBMS and rightly
expect it to protect the data from loss or damage. Concerns for reliability and
integrity are general security issues, but they are more apparent with
databases.
A DBMS guards against loss or
damage in several ways that we study them in this section. However, the
controls we consider are not absolute: No control can prevent an authorized
user from inadvertently entering an acceptable but incorrect value.
Database concerns about
reliability and integrity can be viewed from three dimensions:
Database integrity: concern
that the database as a whole is protected against damage, as from the failure
of a disk drive or the corruption of the master database index. These concerns
are addressed by operating system integrity controls and recovery procedures.
Element integrity: concern that the value of a
specific data element is written or changed only by authorized users. Proper
access controls protect a database from corruption by unauthorized users.
Element accuracy: concern
that only correct values are written into the elements of a database. Checks on
the values of elements can help prevent insertion of improper values. Also,
constraint conditions can detect incorrect values.
Protection Features from the Operating System
In Chapter 4 we discussed the protection an operating system
provides for its users. A responsible system administrator backs up the files
of a database periodically along with other user files. The files are protected
during normal execution against outside access by the operating system's
standard access control facilities. Finally, the operating system performs
certain integrity checks for all data as a part of normal read and write
operations for I/O devices. These controls provide basic security for
databases, but the database manager must enhance them.
Two-Phase Update
A serious problem for a
database manager is the failure of the computing system in the middle of
modifying data. If the data item to be modified was a long field, half of the
field might show the new value, while the other half would contain the old.
Even if errors of this type were spotted easily (which they are not), a more subtle
problem occurs when several fields are updated and no single field appears to
be in obvious error. The solution to this problem, proposed first by Lampson
and Sturgis [LAM76] and adopted by most
DBMSs, uses a two-phase update.
Update Technique
During the first phase,
called the intent phase, the DBMS gathers the resources it needs to perform the
update. It may gather data, create dummy records, open files, lock out other
users, and calculate final answers; in short, it does everything to prepare for
the update, but it makes no changes to the database. The first phase is
repeatable an unlimited number of times because it takes no permanent action.
If the system fails during execution of the first phase, no harm is done
because all these steps can be restarted and repeated after the system resumes
processing.
The last event of the first
phase, called committing, involves
the writing of a commit flag to the
database. The commit flag means that the DBMS has passed the point of no
return: After committing, the DBMS begins making permanent changes.
The second phase makes the
permanent changes. During the second phase, no actions from before the commit
can be repeated, but the update activities of phase two can also be repeated as
often as needed. If the system fails during the second phase, the database may
contain incomplete data, but the system can repair these data by performing all
activities of the second phase. After the second phase has been completed, the
database is again complete.
Two-Phase Update Example
Suppose a database contains
an inventory of a company's office supplies. The company's central stockroom
stores paper, pens, paper clips, and the like, and the different departments
requisition items as they need them. The company buys in bulk to obtain the
best prices. Each department has a budget for office supplies, so there is a
charging mechanism by which the cost of supplies is recovered from the
department. Also, the central stockroom monitors quantities of supplies on hand
so as to order new supplies when the stock becomes low.
Suppose the process begins
with a requisition from the accounting department for 50 boxes of paper clips.
Assume that there are 107 boxes in stock and a new order is placed if the
quantity in stock ever falls below 100. Here are the steps followed after the
stockroom receives the requisition.
The stockroom checks the database to determine that 50 boxes of
paper clips are on hand. If not, the requisition is rejected and the
transaction is finished.
If enough paper clips are in stock, the stockroom deducts 50 from
the inventory figure in the database (107 - 50 = 57).
The stockroom charges accounting's supplies budget (also in the
database) for 50 boxes of paper clips.
The stockroom checks its remaining quantity on hand (57) to
determine whether the remaining quantity is below the reorder point. Because it
is, a notice to order more paper clips is generated, and the item is flagged as
"on order" in the database.
A delivery order is prepared, enabling 50 boxes of paper clips to
be sent to accounting.
All five of these steps must be completed in
the order listed for the database to be accurate and for the transaction to be
processed correctly.
Suppose a failure occurs
while these steps are being processed. If the failure occurs before step 1 is
complete, there is no harm because the entire transaction can be restarted.
However, during steps 2, 3, and 4, changes are made to elements in the
database. If a failure occurs then, the values in the database are inconsistent.
Worse, the transaction cannot be reprocessed because a requisition would be
deducted twice, or a department would be charged twice, or two delivery orders
would be prepared.
When a two-phase commit is
used, shadow values are maintained
for key data points. A shadow data value is computed and stored locally during
the intent phase, and it is copied to the actual database during the commit
phase. The operations on the database would be performed as follows for a
two-phase commit.
Intent:
Check the value of COMMIT-FLAG in the database. If it is set, this
phase cannot be performed. Halt or loop, checking COMMIT-FLAG until it is not
set.
Compare number of boxes of paper clips on hand to number
requisitioned; if more are requisitioned than are on hand, halt.
Compute TCLIPS = ONHAND - REQUISITION.
Obtain BUDGET, the current supplies budget remaining for accounting
department. Compute TBUDGET = BUDGET - COST, where COST is the cost of 50 boxes
of clips.
Check whether TCLIPS is below reorder point; if so, set TREORDER =
TRUE; else set TREORDER = FALSE.
Commit:
Set COMMIT-FLAG in database.
Copy TCLIPS to CLIPS in database.
Copy TBUDGET to BUDGET in database.
Copy TREORDER to REORDER in database.
Prepare notice to deliver paper clips to accounting department.
Indicate transaction completed in log.
Unset COMMIT-FLAG.
With this example, each step
of the intent phase depends only on unmodified values from the database and the
previous results of the intent phase. Each variable beginning with T is a
shadow variable used only in this transaction. The steps of the intent phase
can be repeated an unlimited number of times without affecting the integrity of
the database.
Once the DBMS begins the
commit phase, it writes a commit flag. When this flag is set, the DBMS will not
perform any steps of the intent phase. Intent steps cannot be performed after
committing because database values are modified in the commit phase. Notice,
however, that the steps of the commit phase can be repeated an unlimited number
of times, again with no negative effect on the correctness of the values in the
database.
The one remaining flaw in
this logic occurs if the system fails after writing the "transaction
complete" message in the log but before clearing the commit flag in the
database. It is a simple matter to work backward through the transaction log to
find completed transactions for which the commit flag is still set and to clear
those flags.
Redundancy/Internal Consistency
Many DBMSs maintain
additional information to detect internal inconsistencies in data. The
additional information ranges from a few check bits to duplicate or shadow
fields, depending on the importance of the data.
Error Detection and
Correction Codes
One form of redundancy is
error detection and correction codes, such as parity bits, Hamming codes, and
cyclic redundancy checks. These codes can be applied to single fields, records,
or the entire database. Each time a data item is placed in the database, the
appropriate check codes are computed and stored; each time a data item is
retrieved, a similar check code is computed and compared to the stored value.
If the values are unequal, they signify to the DBMS that an error has occurred
in the database. Some of these codes point out the place of the error; others
show precisely what the correct value should be. The more information provided,
the more space required to store the codes.
Shadow Fields
Entire attributes or entire
records can be duplicated in a database. If the data are irreproducible, this
second copy can provide an immediate replacement if an error is detected.
Obviously, redundant fields require substantial storage space.
Recovery
In addition to these error
correction processes, a DBMS can maintain a log of user accesses, particularly
changes. In the event of a failure, the database is reloaded from a backup copy
and all later changes are then applied from the audit log.
Concurrency/Consistency
Database systems are often
multiuser systems. Accesses by two users sharing the same database must be constrained
so that neither interferes with the other. Simple locking is done by the DBMS.
If two users attempt to read the same data item, there is no conflict because
both obtain the same value.
If both users try to modify
the same data items, we often assume that there is no conflict because each
knows what to write; the value to be written does not depend on the previous
value of the data item. However, this supposition is not quite accurate.
To see how concurrent
modification can get us into trouble, suppose that the database consists of
seat reservations for a particular airline flight. Agent A, booking a seat for
passenger Mock, submits a query to find which seats are still available. The
agent knows that Mock prefers a right aisle seat, and the agent finds that
seats 5D, 11D, and 14D are open. At the same time, Agent B is trying to book
seats for a family of three traveling together. In response to a query, the
database indicates that 8ABC and 11DEF are the two remaining groups of three
adjacent unassigned seats. Agent A submits the update command
SELECT (SEAT-NO = '11D')
ASSIGN 'MOCK,E' TO PASSENGER-NAME
while Agent B submits the
update sequence
SELECT (SEAT-NO = '11D')
ASSIGN 'EHLERS,P' TO PASSENGER-NAME
as well as commands for seats
11E and 11F. Then two passengers have been booked into the same seat (which
would be uncomfortable, to say the least).
Both agents have acted
properly: Each sought a list of empty seats, chose one seat from the list, and
updated the database to show to whom the seat was assigned. The difficulty in
this situation is the time delay between reading a value from the database and
writing a modification of that value. During the delay time, another user has
accessed the same data.
To resolve this problem, a
DBMS treats the entire queryupdate cycle as a single atomic operation. The
command from the agent must now resemble "read the current value of seat
PASSENGER-NAME for seat 11D; if it is 'UNASSIGNED', modify it to 'MOCK,E' (or
'EHLERS,P')." The readmodify cycle must be completed as an uninterrupted
item without allowing any other users access to the PASSENGER-NAME field for
seat 11D. The second agent's request to book would not be considered until
after the first agent's had been completed; at that time, the value of
PASSENGERNAME would no longer be 'UNASSIGNED'.
A final problem in concurrent
access is readwrite. Suppose one user is updating a value when a second user
wishes to read it. If the read is done while the write is in progress, the
reader may receive data that are only partially updated. Consequently, the DBMS
locks any read requests until a write has been completed.
Monitors
The monitor is the unit of a DBMS responsible for the structural
integrity of the database. A monitor can check values being entered to ensure
their consistency with the rest of the database or with characteristics of the
particular field. For example, a monitor might reject alphabetic characters for
a numeric field. We discuss several forms of monitors.
Range Comparisons
A range comparison monitor
tests each new value to ensure that the value is within an acceptable range. If
the data value is outside the range, it is rejected and not entered into the
database. For example, the range of dates might be 131, "/," 112, "/,"
19002099. An even more sophisticated range check might limit the day portion to
130 for months with 30 days, or it might take into account leap year for
February.
Range comparisons are also
convenient for numeric quantities. For example, a salary field might be limited
to $200,000, or the size of a house might be constrained to be between 500 and
5,000 square feet. Range constraints can also apply to other data having a
predictable form.
Range comparisons can be used
to ensure the internal consistency of a database. When used in this manner,
comparisons are made between two database elements. For example, a grade level
from K8 would be acceptable if the record described a student at an elementary
school, whereas only 912 would be acceptable for a record of a student in high
school. Similarly, a person could be assigned a job qualification score of
75100 only if the person had completed college or had had at least ten years of
work experience. Filters or patterns are more general types of data form
checks. These can be used to verify that an automobile plate is two letters
followed by four digits, or the sum of all digits of a credit card number is a
multiple of 9.
Checks of these types can
control the data allowed in the database. They can also be used to test
existing values for reasonableness. If you suspect that the data in a database
have been corrupted, a range check of all records could identify those having
suspicious values.
State Constraints
State constraints describe the condition of the entire database.
At no time should the database values violate these constraints. Phrased differently, if these constraints are
not met, some value of the database is in error.
In the section on two-phase
updates, we saw how to use a commit flag, which is set at the start of the
commit phase and cleared at the completion of the commit phase. The commit flag
can be considered a state constraint because it is used at the end of every
transaction for which the commit flag is not set. Earlier in this chapter, we
described a process to reset the commit flags in the event of a failure after a
commit phase. In this way, the status of the commit flag is an integrity
constraint on the database.
For another example of a
state constraint, consider a database of employees' classifications. At any
time, at most one employee is classified as "president." Furthermore,
each employee has an employee number different from that of every other
employee. If a mechanical or software failure causes portions of the database
file to be duplicated, one of these uniqueness constraints might be violated.
By testing the state of the database, the DBMS could identify records with
duplicate employee numbers or two records classified as "president."
Transition Constraints
State constraints describe
the state of a correct database. Transition
constraints describe conditions necessary before changes can be applied to
a database. For example, before a new employee can be added to the database,
there must be a position number in the database with status "vacant."
(That is, an empty slot must exist.) Furthermore, after the employee is added,
exactly one slot must be changed from "vacant" to the number of the
new employee.
Simple range checks and
filters can be implemented within most database management systems. However,
the more sophisticated state and transition constraints can require special
procedures for testing. Such user-written procedures are invoked by the DBMS
each time an action must be checked.
Summary of Data Reliability
Reliability, correctness, and integrity are
three closely related concepts in databases. Users trust the DBMS to maintain
their data correctly, so integrity issues are very important to database
security.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.