An Overview of Database Tuning in Relational
Systems
After a database is deployed and is in
operation, actual use of the applications, transactions, queries, and views reveals
factors and problem areas that may not have been accounted for during the
initial physical design. The inputs to physical design listed in Section 20.1.1
can be revised by gathering actual statistics about usage patterns. Resource
utilization as well as internal DBMS processing—such as query optimiza-tion—can
be monitored to reveal bottlenecks, such as contention for the same data or
devices. Volumes of activity and sizes of data can be better estimated.
Therefore, it is necessary to monitor and revise the physical database design
constantly—an activ-ity referred to as database
tuning. The goals of tuning are as follows:
To make
applications run faster.
To
improve (lower) the response time of queries and transactions.
To
improve the overall throughput of transactions.
The dividing line between physical design and
tuning is very thin. The same design decisions that we discussed in Section
20.1.2 are revisited during database tuning, which is a continual adjustment of
the physical design. We give a brief overview of the tuning process below.3
The inputs to the tuning process include statistics related to the same factors
mentioned in Section 20.1.1. In particular, DBMSs can internally collect the
following statistics:
Sizes of
individual tables.
Number
of distinct values in a column.
The
number of times a particular query or transaction is submitted and executed in
an interval of time.
The
times required for different phases of query and transaction processing (for a
given set of queries or transactions).
These and other statistics create a profile of
the contents and use of the database. Other information obtained from
monitoring the database system activities and processes includes the following:
Storage statistics. Data about allocation of storage into
tablespaces, index-spaces, and buffer pools.
I/O and device performance statistics. Total read/write activity (paging) on disk extents and disk hot spots.
Query/transaction
processing statistics. Execution
times of queries and transactions,
and optimization times during query optimization.
Locking/logging related statistics. Rates of issuing different types of locks, transaction throughput rates, and log
records activity.
Index statistics. Number of levels in an index, number of
noncontiguous leaf pages, and so on.
Some of the above statistics relate to
transactions, concurrency control, and recovery, which are discussed in
Chapters 21 through 23. Tuning a database involves deal-ing with the following
types of problems:
How to
avoid excessive lock contention, thereby increasing concurrency among
transactions.
How to
minimize the overhead of logging and unnecessary dumping of data.
How to
optimize the buffer size and scheduling of processes.
How to
allocate resources such as disks, RAM, and processes for most efficient
utilization.
Most of the previously mentioned problems can
be solved by the DBA by setting appropriate physical DBMS parameters, changing
configurations of devices, changing operating system parameters, and other
similar activities. The solutions tend to be closely tied to specific systems.
The DBAs are typically trained to handle these tuning problems for the specific
DBMS. We briefly discuss the tuning of various physical database design
decisions below.
1. Tuning Indexes
The initial choice of indexes may have to be
revised for the following reasons:
Certain
queries may take too long to run for lack of an index.
Certain
indexes may not get utilized at all.
Certain
indexes may undergo too much updating because the index is on an attribute that
undergoes frequent changes.
Most DBMSs have a command or trace facility,
which can be used by the DBA to ask the system to show how a query was
executed—what operations were performed in what order and what secondary access
structures (indexes) were used. By analyzing these execution plans, it is
possible to diagnose the causes of the above problems. Some indexes may be
dropped and some new indexes may be created based on the tuning analysis.
The goal of tuning is to dynamically evaluate
the requirements, which sometimes fluctuate seasonally or during different
times of the month or week, and to reorganize the indexes and file
organizations to yield the best overall performance. Dropping and building new
indexes is an overhead that can be justified in terms of performance
improvements. Updating of a table is generally suspended while an index is
dropped or created; this loss of service must be accounted for. Besides
drop-ping or creating indexes and changing from a nonclustered to a clustered
index and vice versa, rebuilding the
index may improve performance. Most RDBMSs use B+-trees for an
index. If there are many deletions on the index key, index pages may contain
wasted space, which can be claimed during a rebuild operation. Similarly, too
many insertions may cause overflows in a clustered index that affect
performance. Rebuilding a clustered index amounts to reorganizing the entire
table ordered on that key.
The available options for indexing and the way
they are defined, created, and reorganized varies from system to system. As an
illustration, consider the sparse and dense indexes in Chapter 18. A sparse
index such as a primary index (see Section 18.1) will have one index pointer
for each page (disk block) in the data file; a dense index such as a unique
secondary index will have an index pointer for each record. Sybase provides
clustering indexes as sparse indexes in the form of B+-trees,
whereas INGRES provides sparse clustering indexes as ISAM files and dense
clustering indexes as B+-trees. In some versions of Oracle and DB2,
the option of setting up a clustering index is limited to a dense index (with many
more index entries), and the DBA has to work with this limitation.
2. Tuning the Database Design
In Section 20.1.2, we discussed the need for a
possible denormalization, which is a departure from keeping all tables as BCNF
relations. If a given physical database design does not meet the expected
objectives, the DBA may revert to the logical database design, make adjustments
such as denormalizations to the logical schema, and remap it to a new set of
physical tables and indexes.
As discussed, the entire database design has to
be driven by the processing requirements as much as by data requirements. If
the processing requirements are dynamically changing, the design needs to
respond by making changes to the conceptual schema if necessary and to reflect
those changes into the logical schema and physical design. These changes may
be of the following nature:
Existing
tables may be joined (denormalized) because certain attributes
from two or more tables are frequently needed
together: This reduces the normalization level from BCNF to 3NF, 2NF, or 1NF.5
For the
given set of tables, there may be alternative design choices, all of which
achieve 3NF or BCNF. We illustrated alternative equivalent designs in Chapter
16. One normalized design may be replaced by another.
A relation of the form R(K,A, B, C, D, ...)—with K as a set of key attributes— that is in BCNF
can be stored in multiple tables that are also in BCNF—for example, R1(K, A, B), R2(K,
C, D, ), R3(K, ...)—by
replicating the key K in each table. Such a process is known as vertical partitioning. Each table
groups sets of attributes that are accessed together.
For example, the table
EMPLOYEE(Ssn, Name, Phone, Grade, Salary) may be split into two tables: EMP1(Ssn, Name, Phone) and EMP2(Ssn, Grade, Salary). If the original table has a large number of rows (say 100,000) and queries about phone
numbers and salary information are totally distinct and occur with very
different fre-quencies, then this separation of tables may work better.
Attribute(s)
from one table may be repeated in another even though this cre-ates redundancy
and a potential anomaly. For example, Part_name may be replicated in tables wherever the Part# appears (as foreign key), but there may be one master table called
PART_MASTER(Part#, Part_name, ...) where the Partname is guaranteed to be up-to-date.
Just as
vertical partitioning splits a table vertically into multiple tables, horizontal partitioning takes
horizontal slices of a table and stores them as distinct tables. For example, product sales data may be separated
into ten tables based on ten product lines. Each table has the same set of
columns (attributes) but contains a distinct set of products (tuples). If a
query or transaction applies to all product data, it may have to run against
all the tables and the results may have to be combined.
These types of adjustments designed to meet the
high volume of queries or transac-tions, with or without sacrificing the normal
forms, are commonplace in practice.
3. Tuning Queries
We already discussed how query performance is
dependent upon the appropriate selection of indexes, and how indexes may have
to be tuned after analyzing queries that give poor performance by using the
commands in the RDBMS that show the execution plan of the query. There are
mainly two indications that suggest that query tuning may be needed:
A query
issues too many disk accesses (for example, an exact match query scans an
entire table).
The
query plan shows that relevant indexes are not being used.
Some typical instances of situations prompting
query tuning include the following:
Many
query optimizers do not use indexes in the presence of arithmetic expressions
(such as Salary/365 > 10.50), numerical comparisons of attrib-utes of different
sizes and precision (such as Aqty = Bqty where Aqty is of type INTEGER and Bqty is of type
SMALLINTEGER), NULL comparisons (such as Bdate
IS NULL), and substring comparisons (such as Lname LIKE ‘%mann’).
Indexes
are often not used for nested queries using IN; for example, the fol-
lowing query:
SELECT Ssn FROM EMPLOYEE
WHERE Dno IN ( SELECT
Dnumber FROM DEPARTMENT
WHERE
Mgr_ssn = ‘333445555’ );
may not use the index on Dno in EMPLOYEE, whereas using Dno
= Dnumber in the WHERE-clause with a single block query may cause the index to be used.
Some DISTINCTs may be redundant and can be avoided without changing the result.
A DISTINCT often causes a sort operation and must be avoided as much as
possible.
Unnecessary
use of temporary result tables can be avoided by collapsing multiple queries
into a single query unless the
temporary relation is needed for some intermediate processing.
In some
situations involving the use of correlated queries, temporaries are useful.
Consider the following query, which retrieves the highest paid employee in each
department:
SELECT Ssn
FROM EMPLOYEE E
WHERE Salary = SELECT MAX (Salary)
FROM
EMPLOYEE AS M
WHERE
M.Dno =
E.Dno;
This has the potential danger of searching all
of the inner EMPLOYEE table M for each
tuple from the outer EMPLOYEE table E. To make the execution more efficient, the
process can be broken into two queries, where the first query just computes the
maximum salary in each department as follows:
SELECT MAX (Salary) AS High_salary, Dno INTO TEMP
FROM EMPLOYEE
GROUP BY Dno;
SELECT EMPLOYEE.Ssn
FROM EMPLOYEE, TEMP
WHERE EMPLOYEE.Salary = TEMP.High_salary
AND
EMPLOYEE.Dno =
TEMP.Dno;
If
multiple options for a join condition are possible, choose one that uses a
clustering index and avoid those that contain string comparisons. For exam-ple,
assuming that the Name attribute is a candidate key in EMPLOYEE and
STUDENT, it is better to use
EMPLOYEE.Ssn = STUDENT.Ssn as a
join condi-tion rather than EMPLOYEE.Name = STUDENT.Name if Ssn has a clustering index in one or both tables.
One
idiosyncrasy with some query optimizers is that the order of tables in the FROM-clause may affect the join processing. If that is the case, one
may have to switch this order so that the smaller of the two relations is
scanned and the larger relation is used with an appropriate index.
Some
query optimizers perform worse on nested queries compared to their equivalent
unnested counterparts. There are four types of nested queries:
Uncorrelated
subqueries with aggregates in an inner query.
Uncorrelated
subqueries without aggregates.
Correlated
subqueries with aggregates in an inner query.
Correlated
subqueries without aggregates.
Of the four types above, the first one
typically presents no problem, since most query optimizers evaluate the inner
query once. However, for a query of the second type, such as the example in
item 2, most query optimizers may not use an index on Dno in EMPLOYEE. However, the same optimizers may do so if the
query is written as an unnested query. Transformation of correlated subqueries
may involve setting temporary tables. Detailed exam-ples are outside our scope
here.
Finally,
many applications are based on views that define the data of interest to those
applications. Sometimes, these views become overkill, because a query may be
posed directly against a base table, rather than going through a view that is
defined by a JOIN.
4. Additional Query Tuning Guidelines
Additional techniques for improving queries
apply in certain situations as follows:
A query
with multiple selection conditions that are connected via OR may not be prompting the query optimizer to use any index. Such a
query may be split up and expressed as a union of queries, each with a
condition on an attribute that causes an index to be used. For example,
SELECT Fname, Lname, Salary, Age
FROM EMPLOYEE
WHERE Age > 45 OR Salary < 50000;
may be executed using sequential scan giving
poor performance. Splitting it up as
SELECT Fname, Lname, Salary, Age
FROM EMPLOYEE
WHERE Age > 45
UNION
SELECT Fname, Lname, Salary, Age
FROM EMPLOYEE
WHERE Salary < 50000;
may utilize indexes on Age as well as on
Salary.
To help
expedite a query, the following transformations may be tried:
NOT condition may be transformed into a positive
expression.
Embedded
SELECT blocks using IN, = ALL, = ANY, and = SOME may be
replaced by joins.
If an equality join is set up between two
tables, the range predicate (selec-tion condition) on the joining attribute set
up in one table may be repeated for the other table.
3. WHERE conditions may be rewritten to utilize the indexes on
multiple columns. For example,
SELECT Region#, Prod_type, Month, Sales
FROM SALES_STATISTICS
WHERE Region# = 3 AND ((Prod_type BETWEEN 1 AND 3) OR (Prod_type
BETWEEN 8 AND 10));
may use an index only on Region# and search through all leaf pages of the index
for a match on Prod_type. Instead, using
SELECT Region#, Prod_type, Month, Sales
FROM SALES_STATISTICS
WHERE (Region# = 3 AND (Prod_type BETWEEN 1 AND 3))
OR (Region# = 3 AND (Prod_type
BETWEEN 8
AND 10));
may use a composite index on (Region#, Prod_type) and work much more efficiently.
In this section, we have
covered many of the common instances where the inefficiency of a query may be
fixed by some simple corrective action such as using a temporary table,
avoiding certain types of query constructs, or avoiding the use of views. The
goal is to have the RDBMS use existing single attribute or composite attribute
indexes as much as possible. This avoids full scans of data blocks or entire
scanning of index leaf nodes. Redundant processes like sorting must be avoided
at any cost. The problems and the remedies will depend upon the workings of a
query optimizer within an RDBMS. Detailed literature exists in database tuning
guidelines for database administration by the RDBMS vendors. Major relational
DBMS vendors like Oracle, IBM and Microsoft encourage their large customers to
share ideas of tuning at the annual expos and other forums so that the entire
industry benefits by using performance enhancement techniques. These techniques
are typically available in trade literature and on various Web sites.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.