Operations on Files
Operations on files are usually grouped into retrieval operations and update
operations. The former do not change any data in the file, but only locate
certain records so that their field
values can be examined and processed. The latter change the file by insertion
or deletion of records or by modification of field values. In either case, we
may have to select one or more
records for retrieval, deletion, or modification based on a selection condition (or filtering condition), which specifies
criteria that the desired record or records must satisfy.
Consider an EMPLOYEE file with fields Name, Ssn, Salary, Job_code, and Department. A simple selection condition
may involve an equality comparison on some field value—for example, (Ssn = ‘123456789’) or (Department = ‘Research’). More com-plex
conditions can involve other types of comparison operators, such as > or ≥; an
example is (Salary ≥ 30000). The general case is to have an arbitrary Boolean expres-sion on
the fields of the file as the selection condition.
Search operations on files are generally based on simple selection
conditions. A complex condition must be decomposed by the DBMS (or the
programmer) to extract a simple condition that can be used to locate the
records on disk. Each located record is then checked to determine whether it
satisfies the full selection condition. For example, we may extract the simple
condition (Department = ‘Research’) from the complex condition ((Salary ≥ 30000) AND (Department = ‘Research’)); each record satisfying (Department =
‘Research’) is located and then tested to see if it also satisfies (Salary ≥ 30000).
When several file records satisfy a search condition, the first record—with respect to the physical
sequence of file records—is initially located and designated the current record. Subsequent search operations commence from this record and
locate the next record in the file that satisfies the condition.
Actual operations for locating and accessing file records vary from
system to system. Below, we present a set of representative operations.
Typically, high-level programs, such as DBMS software programs, access records
by using these commands, so we sometimes refer to program variables in the following descriptions:
Open. Prepares the file for reading or
writing. Allocates appropriate buffers (typically
at least two) to hold file blocks from disk, and retrieves the file header.
Sets the file pointer to the beginning of the file.
Reset. Sets the file pointer of an open
file to the beginning of the file.
Find (or Locate). Searches for the first record that satisfies a search condition.
Transfers the block containing that record into a main memory buffer (if it is
not already there). The file pointer points to the record in the buffer and it
becomes the current record.
Sometimes, different verbs are used to indicate whether the located record is
to be retrieved or updated.
Read (or Get). Copies the current record from
the buffer to a program variable in the user program. This command may also
advance the current record pointer to the next record in the file, which may
necessitate reading the next file block from disk.
FindNext. Searches for the next record in
the file that satisfies the search condition.
Transfers the block containing that record into a main memory buffer (if it is
not already there). The record is located in the buffer and becomes the current
record. Various forms of FindNext (for example, Find Next record within a
current parent record, Find Next record of a given type, or Find Next record
where a complex condition is met) are available in legacy DBMSs based on the
hierarchical and network models.
Delete. Deletes the current record and
(eventually) updates the file on disk to
reflect the deletion.
Modify. Modifies some field values for
the current record and (eventually) updates
the file on disk to reflect the modification.
Insert. Inserts a new record in the file
by locating the block where the record is
to be inserted, transferring that block into a main memory buffer (if it is not
already there), writing the record into the buffer, and (eventually) writ-ing
the buffer to disk to reflect the insertion.
Close. Completes the file access by
releasing the buffers and performing any
other needed cleanup operations.
The preceding (except for Open and Close) are called record-at-a-time operations because
each operation applies to a single record. It is possible to streamline the
operations Find, FindNext, and Read into a single operation, Scan, whose
description is as follows:
Scan. If the file has just been opened
or reset, Scan returns the first
record; otherwise it returns the
next record. If a condition is specified with the oper-ation, the returned
record is the first or next record satisfying the condition.
In database systems, additional set-at-a-time
higher-level operations may be applied to a file. Examples of these are as
follows:
FindAll. Locates all the records in the file that satisfy a search condition.
Find (or Locate) n. Searches for the first record that satisfies a search condition and
then continues to locate the next n –
1 records satisfying the same condition. Transfers the blocks containing the n records to the main memory buffer (if
not already there).
FindOrdered. Retrieves all the records in the
file in some specified order.
Reorganize. Starts the reorganization process. As we shall see, some file organizations require periodic
reorganization. An example is to reorder the file records by sorting them on a
specified field.
At this point, it is worthwhile to note the difference between the terms
file organization and access method. A file organization refers to the organization of the data
of a file into records, blocks, and
access structures; this includes the way records and blocks are placed on the
storage medium and interlinked. An access
method, on the other hand, provides a group of operations—such as those
listed earlier—that can be applied to a file. In general, it is possible to
apply several access methods to a file organization. Some access methods,
though, can be applied only to files organized in certain ways. For example, we
cannot apply an indexed access method to a file without an index (see Chapter
18).
Usually, we expect to use some search conditions more than others. Some
files may be static, meaning that
update operations are rarely performed; other, more dynamic files may change frequently, so update operations are
constantly applied to them. A
successful file organization should perform as efficiently as possible the
operations we expect to apply frequently
to the file. For example, consider the EMPLOYEE file, as
shown in Figure 17.5(a), which stores the records for current employees in a company. We expect to insert records (when employees are
hired), delete records (when employees leave the company), and modify records
(for exam-ple, when an employee’s salary or job is changed). Deleting or
modifying a record requires a selection condition to identify a particular
record or set of records. Retrieving one or more records also requires a
selection condition.
If users expect mainly to apply a search condition based on Ssn, the designer must choose a file organization that facilitates locating
a record given its Ssn value. This may involve physically
ordering the records by Ssn value or defining an index on Ssn (see Chapter 18). Suppose that a second application uses the file to
generate
employees’ paychecks and requires that paychecks
are grouped by department. For this application, it is best to order employee
records by department and then by name within each department. The clustering
of records into blocks and the organ-ization of blocks on cylinders would now
be different than before. However, this arrangement conflicts with ordering the
records by Ssn values. If both applications are important, the designer should choose
an organization that allows both operations to be done efficiently.
Unfortunately, in many cases a single organization does not allow all needed
operations on a file to be implemented efficiently. This requires that a
compromise must be chosen that takes into account the expected importance and
mix of retrieval and update operations.
In the following sections and in Chapter 18, we discuss methods for
organizing records of a file on disk. Several general techniques, such as
ordering, hashing, and indexing, are used to create access methods.
Additionally, various general techniques for handling insertions and deletions
work with many file organizations.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.