Cost Components for Query Execution
The cost of executing a query includes the following components:
Access cost to secondary storage. This is the cost of transferring (reading and writing) data blocks between secondary disk storage and main memory buffers. This is also known as disk I/O (input/output) cost. The cost of search-ing for records in a disk file depends on the type of access structures on that file, such as ordering, hashing, and primary or secondary indexes. In addi-tion, factors such as whether the file blocks are allocated contiguously on the same disk cylinder or scattered on the disk affect the access cost.
Disk storage cost. This is the cost of storing on disk any intermediate files that are generated by an execution strategy for the query.
Computation cost. This is the cost of performing in-memory operations on the records within the data buffers during query execution. Such operations include searching for and sorting records, merging records for a join or a sort operation, and performing computations on field values. This is also known as CPU (central processing unit) cost.
Memory usage cost. This is the cost pertaining to the number of main mem-ory buffers needed during query execution.
Communication cost. This is the cost of shipping the query and its results from the database site to the site or terminal where the query originated. In distributed databases (see Chapter 25), it would also include the cost of trans-ferring tables and results among various computers during query evaluation.
For large databases, the main emphasis is often on minimizing the access cost to sec-ondary storage. Simple cost functions ignore other factors and compare different query execution strategies in terms of the number of block transfers between disk and main memory buffers. For smaller databases, where most of the data in the files involved in the query can be completely stored in memory, the emphasis is on min-imizing computation cost. In distributed databases, where many sites are involved (see Chapter 25), communication cost must be minimized also. It is difficult to include all the cost components in a (weighted) cost function because of the diffi-culty of assigning suitable weights to the cost components. That is why some cost functions consider a single factor only—disk access. In the next section we discuss some of the information that is needed for formulating cost functions.