Disk Storage, Basic File Structures, and Hashing
Databases are stored physically as files of records, which are typically stored on magnetic disks. This chapter and the next deal with the organization of databases in storage and the tech niques for accessing them efficiently using various algorithms, some of which require auxiliary data structures called indexes. These structures are often referred to as physical database file structures, and are at the physical level of the three-schema architecture described in Chapter 2. We start in Section 17.1 by introducing the concepts of computer storage hierarchies and how they are used in database systems. Section 17.2 is devoted to a description of magnetic disk storage devices and their characteristics, and we also briefly describe magnetic tape storage devices. After discussing different storage technologies, we turn our attention to the methods for physically organizing data on disks. Section 17.3 covers the technique of double buffering, which is used to speed retrieval of multiple disk blocks. In Section we discuss various ways of formatting and storing file records on disk. Section discusses the various types of operations that are typically applied to file records. We present three primary methods for organizing file records on disk: unordered records, in Section 17.6; ordered records, in Section 17.7; and hashed records, in Section 17.8.
Section 17.9 briefly introduces files of mixed records and other primary
methods for organizing records, such as B-trees. These are particularly
relevant for storage of object-oriented databases, which we discussed in
Chapter 11. Section 17.10 describes RAID (Redundant Arrays of Inexpensive (or
Independent) Disks)—a data storage system architecture that is commonly used in
large organizations for better reliability and performance. Finally, in Section
17.11 we describe three developments in the storage systems area: storage area
networks (SAN), network attached storage (NAS), and iSCSI (Internet SCSI—Small
Computer System Interface), the latest technology, which makes storage area
networks more afford-able without the use of the Fiber Channel infrastructure
and hence is getting very wide acceptance in industry. Section 17.12 summarizes
the chapter. In Chapter 18 we discuss techniques for creating auxiliary data
structures, called indexes, which speed up the search for and retrieval of
records. These techniques involve storage of auxiliary data, called index
files, in addition to the file records themselves.
Chapters 17 and 18 may be browsed through or even omitted by readers who
have already studied file organizations and indexing in a separate course. The
material covered here, in particular Sections 17.1 through 17.8, is necessary
for understand-ing Chapters 19 and 20, which deal with query processing and
optimization, and database tuning for improving performance of queries.
The collection of data that makes up a computerized database must be
stored phys-ically on some computer storage
medium. The DBMS software can then retrieve, update, and process this data
as needed. Computer storage media form a storage
hierarchy that includes two main
Primary storage. This category includes storage
media that can be operated on directly
by the computer’s central processing unit
(CPU), such as the com-puter’s main memory and smaller but faster cache
memories. Primary stor-age usually provides fast access to data but is of
limited storage capacity. Although main memory capacities have been growing
rapidly in recent years, they are still more expensive and have less storage
capacity than sec-ondary and tertiary storage devices.
Secondary and tertiary storage. This
category includes magnetic disks, optical
disks (CD-ROMs, DVDs, and other similar storage media), and tapes. Hard-disk
drives are classified as secondary storage, whereas removable media such as
optical disks and tapes are considered tertiary storage. These devices usually
have a larger capacity, cost less, and provide slower access to data than do
primary storage devices. Data in secondary or tertiary storage cannot be
processed directly by the CPU; first it must be copied into primary storage and
then processed by the CPU.
We first give an overview of the various storage devices used for
primary and secondary storage in Section 17.1.1 and then discuss how databases
are typically handled in the storage hierarchy in Section 17.1.2.
1. Memory Hierarchies
and Storage Devices
In a modern computer system, data resides and is transported throughout
a hierarchy of storage media. The highest-speed memory is the most expensive
and is there-fore available with the least capacity. The lowest-speed memory is
offline tape storage, which is essentially available in indefinite storage
At the primary storage level,
the memory hierarchy includes at the most expensive end, cache memory, which is a static RAM (Random Access Memory). Cache
memory is typically used by the CPU to speed up execution of program
instructions using techniques such as prefetching and pipelining. The next
level of primary stor-age is DRAM (Dynamic RAM), which provides the main work
area for the CPU for keeping program instructions and data. It is popularly
called main memory. The advantage of
DRAM is its low cost, which continues to decrease; the drawback is its
volatility and lower speed compared with static RAM. At the secondary and tertiary storage
level, the hierarchy includes magnetic disks, as well as mass
storage in the form of CD-ROM (Compact Disk–Read-Only
Memory) and DVD (Digital Video Disk or Digital Versatile Disk) devices, and
finally tapes at the least expensive end of the hierarchy. The storage capacity is measured in
kilobytes (Kbyte or 1000 bytes), megabytes (MB or 1 million bytes), gigabytes
(GB or 1 billion bytes), and even terabytes (1000 GB). The word petabyte (1000
terabytes or 10**15 bytes) is now becoming relevant in the context of very
large repositories of data in physics, astronomy, earth sciences, and other
Programs reside and execute in DRAM. Generally, large permanent
databases reside on secondary storage, (magnetic disks), and portions of the
database are read into and written from buffers in main memory as needed.
Nowadays, personal computers and workstations have large main memories of
hundreds of megabytes of RAM and DRAM, so it is becoming possible to load a
large part of the database into main memory. Eight to 16 GB of main memory on a
single server is becoming common-place. In some cases, entire databases can be
kept in main memory (with a backup copy on magnetic disk), leading to main memory databases; these are
particularly useful in real-time applications that require extremely fast
response times. An example is telephone switching applications, which store
databases that contain routing and line information in main memory.
Between DRAM and magnetic disk storage, another form of memory, flash memory, is becoming common,
particularly because it is nonvolatile. Flash memories are high-density, high-performance memories using EEPROM (Electrically
Erasable Programmable Read-Only Memory) technology. The advantage of flash
memory is the fast access speed; the disadvantage is that an entire block must
be erased and written over simultaneously. Flash memory cards are appearing as
the data storage medium in appliances with capacities ranging from a few
megabytes to a few gigabytes. These are appearing in cameras, MP3 players,
cell phones, PDAs, and so on. USB (Universal Serial Bus) flash drives have
become the most portable medium for carrying data between personal computers;
they have a flash memory storage device integrated with a USB interface.
CD-ROM (Compact Disk – Read Only Memory) disks store data optically and
are read by a laser. CD-ROMs contain prerecorded data that cannot be
overwritten. WORM (Write-Once-Read-Many) disks are a form of optical storage
used for archiving data; they allow data to be written once and read any number
of times without the possibility of erasing. They hold about half a gigabyte of
data per disk and last much longer than magnetic disks. Optical jukebox memories use an array of CD-ROM platters, which are
loaded onto drives on demand. Although optical jukeboxes have capacities in the
hundreds of gigabytes, their retrieval times are in the hundreds of
milliseconds, quite a bit slower than magnetic disks. This type of storage is
continuing to decline because of the rapid decrease in cost and increase in
capacities of magnetic disks. The DVD is another standard for optical disks
allowing 4.5 to 15 GB of storage per disk. Most personal computer disk drives
now read CD-ROM and DVD disks. Typically, drives are CD-R (Compact Disk
Recordable) that can create CD-ROMs and audio CDs (Compact Disks), as well as
record on DVDs.
Finally, magnetic tapes are
used for archiving and backup storage of data. Tape jukeboxes—which
contain a bank of tapes that are catalogued and can be automat-ically loaded
onto tape drives—are becoming popular as tertiary
storage to hold terabytes of data. For example, NASA’s EOS (Earth
Observation Satellite) system stores archived databases in this fashion.
Many large organizations are already finding it normal to have
terabyte-sized data-bases. The term very
large database can no longer be precisely defined because disk storage
capacities are on the rise and costs are declining. Very soon the term may be
reserved for databases containing tens of terabytes.
2. Storage of Databases
Databases typically store large amounts of data that must persist over
long periods of time, and hence is often referred to as persistent data. Parts of this data are accessed and processed
repeatedly during this period. This contrasts with the notion of transient data that persist for only a
limited time during program execution. Most databases are stored permanently
(or persistently) on magnetic disk
secondary storage, for the following reasons:
Generally, databases are too
large to fit entirely in main memory.
The circumstances that cause
permanent loss of stored data arise less frequently for disk secondary storage
than for primary storage. Hence, we refer to disk—and other secondary storage
devices—as nonvolatile storage,
whereas main memory is often called volatile
The cost of storage per unit of
data is an order of magnitude less for disk secondary storage than for primary
Some of the newer technologies—such as optical disks, DVDs, and tape
juke-boxes—are likely to provide viable alternatives to the use of magnetic
disks. In the future, databases may therefore reside at different levels of the
memory hierarchy from those described in Section 17.1.1. However, it is
anticipated that magnetic disks will continue to be the primary medium of
choice for large databases for years to come. Hence, it is important to study
and understand the properties and characteristics of magnetic disks and the
way data files can be organized on disk in order to design effective databases
with acceptable performance.
Magnetic tapes are frequently used as a storage medium for backing up
databases because storage on tape costs even less than storage on disk.
However, access to data on tape is quite slow. Data stored on tapes is offline; that is, some intervention by
an operator—or an automatic loading device—to load a tape is needed before the
data becomes available. In contrast, disks are online devices that can be accessed directly at any time.
The techniques used to store large amounts of structured data on disk
are important for database designers, the DBA, and implementers of a DBMS.
Database designers and the DBA must know the advantages and disadvantages of
each storage technique when they design, implement, and operate a database on
a specific DBMS. Usually, the DBMS has several options available for organizing
the data. The process of physical
database design involves choosing the particular data organization
techniques that best suit the given application requirements from among the
options. DBMS system implementers must study data organization techniques so
that they can implement them efficiently and thus provide the DBA and users of
the DBMS with sufficient options.
Typical database applications need only a small portion of the database
at a time for processing. Whenever a certain portion of the data is needed, it
must be located on disk, copied to main memory for processing, and then rewritten
to the disk if the data is changed. The data stored on disk is organized as files of records. Each record is a collection of data values that can be
interpreted as facts about entities, their attributes, and their relationships.
Records should be stored on disk in a manner that makes it possible to locate
them efficiently when they are needed.
There are several primary file
organizations, which determine how the file records are physically placed on the disk, and hence how the records can be accessed.
A heap file (or
unordered file) places the records on
disk in no particular order by appending new records at the end of the file,
whereas a sorted file (or sequential file) keeps the records
ordered by the value of a particular field (called the sort key). A hashed file
uses a hash function applied to a particular field (called the hash key) to determine a record’s
placement on disk. Other primary file organizations, such as B-trees, use tree structures. We discuss
primary file organizations in Sections 17.6 through 17.9. A secondary organization or auxiliary access structure allows
efficient access to file records based on alternate
fields than those that have been used for the primary file organization.
Most of these exist as indexes and will be discussed in Chapter 18.