Files of Unordered Records (Heap Files)
In this simplest and most basic type of organization, records are placed
in the file in the order in which they are inserted, so new records are
inserted at the end of the file. Such an organization is called a heap or pile file. This organization is often used with additional access
paths, such as the secondary indexes discussed in Chapter 18. It is also used
to collect and store data records for future use.
Inserting a new record is very
efficient. The last disk block of the file is copied into a buffer, the new
record is added, and the block is then rewritten
back to disk. The address of the last file block is kept in the file header.
However, searching for a record using any search condition involves a linear search through the file block by
block—an expensive procedure. If only one record satisfies the search
condition, then, on the average, a program will read into memory and search
half the file blocks before it finds the record. For a file of b blocks, this requires searching (b/2) blocks, on average. If no records
or several records satisfy the search condition, the program must read and
search all b blocks in the file.
To delete a record, a program must first find its block, copy the block
into a buffer, delete the record from the buffer, and finally rewrite the block back to the disk.
This leaves unused space in the disk block. Deleting a large number of records
in this way results in wasted storage space. Another technique used for record
deletion is to have an extra byte or bit, called a deletion marker, stored with each record. A record is deleted by
setting the deletion marker to a certain value. A different value for the
marker indicates a valid (not deleted) record. Search programs consider only
valid records in a block when conducting their search. Both of these deletion
techniques require periodic reorganization
of the file to reclaim the unused space of deleted records. During
reorganization, the file blocks are accessed consecutively, and records are
packed by removing deleted records. After such a reorganization, the blocks are
filled to capacity once more. Another possibility is to use the space of
deleted records when inserting new records, although this requires extra
bookkeeping to keep track of empty locations.
We can use either spanned or unspanned organization for an unordered
file, and it may be used with either fixed-length or variable-length records.
Modifying a variable-length record may require deleting the old record and
inserting a modified record because the modified record may not fit in its old
space on disk.
To read all records in order of the values of some field, we create a
sorted copy of the file. Sorting is an expensive operation for a large disk
file, and special techniques for external
sorting are used (see Chapter 19).
For a file of unordered fixed-length
records using unspanned blocks
and contiguous allocation, it is straightforward to access any record by its position in the file. If the file records are numbered 0, 1, 2, ..., r − 1 and the records in each block
are numbered 0, 1, ..., bfr − 1, where bfr is the blocking
factor, then the ith record of the
file is located in block (i/bfr) and
is the (i mod bfr)th record in that block. Such a file is often called a relative or direct file because records can easily be accessed directly by
their relative positions. Accessing a record by its position does not help
locate a record based on a search condition; however, it facilitates the
construction of access paths on the file, such as the indexes discussed in
Chapter 18.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.