The idea of using extra space to facilitate faster access to a given data set is partic-ularly important if the data set in question contains a very large number of records that need to be stored on a disk.

**B-Trees**

The idea of using extra space
to facilitate faster access to a given data set is partic-ularly important if
the data set in question contains a very large number of records that need to
be stored on a disk. A principal device in organizing such data sets is an ** index**,
which provides some information about the location of records with indicated
key values. For data sets of structured records (as opposed to “unstruc-tured”
data such as text, images, sound, and video), the most important index
organization is the

In the B-tree version we
consider here, all data records (or record keys) are stored at the leaves, in
increasing order of the keys. The parental nodes are used for indexing.
Specifically, each parental node contains ** n** − 1 ordered keys

In addition, a B-tree of
order ** m** ≥ 2 must satisfy the following structural
properties:

The root is either a leaf or
has between 2 and ** m** children.

Each node, except for the
root and the leaves, has between ** m/**2 and

The tree is (perfectly)
balanced, i.e., all its leaves are at the same level.

An example of a B-tree of
order 4 is given in Figure 7.8.

Searching in a B-tree is very
similar to searching in the binary search tree, and even more so in the 2-3
tree. Starting with the root, we follow a chain of pointers to the leaf that
may contain the search key. Then we search for the search key among the keys of
that leaf. Note that since keys are stored in sorted order, at both parental
nodes and leaves, we can use binary search if the number of keys at a node is
large enough to make it worthwhile.

It is not the number of key
comparisons, however, that we should be con-cerned about in a typical
application of this data structure. When used for storing a large data file on
a disk, the nodes of a B-tree normally correspond to the disk pages. Since the
time needed to access a disk page is typically several orders of magnitude
larger than the time needed to compare keys in the fast computer mem-ory, it is
the number of disk accesses that becomes the principal indicator of the
efficiency of this and similar data structures.

How many nodes of a B-tree do
we need to access during a search for a record with a given key value? This
number is, obviously, equal to the height of the tree plus 1. To estimate the
height, let us find the smallest number of keys a B-tree of order ** m** and positive height

After a series of standard
simplifications (see Problem 2 in this section’s exercises), this inequality
reduces to

which, in turn, yields the
following upper bound on the height ** h** of the B-tree of order

Inequality (7.7) immediately
implies that searching in a B-tree is a ** O(**log

Keep in mind that the table’s
entries are upper estimates for the number of disk accesses. In actual
applications, this number rarely exceeds 3, with the B-tree’s root and
sometimes first-level nodes stored in the fast memory to minimize the number of
disk accesses.

The operations of insertion
and deletion are less straightforward than search-ing, but both can also be
done in ** O(**log

The most straightforward
algorithm for inserting a new record into a B-tree is quite similar to the
algorithm for insertion into a 2-3 tree outlined in Section 6.3. First, we
apply the search procedure to the new record’s key ** K** to find the appropriate leaf for the new
record. If there is room for the record in that leaf, we place it there (in an
appropriate position so that the keys remain sorted) and we are done. If there
is no room for the record, the leaf is split in half by sending the second half
of the records to a new node. After that, the smallest key

You should be aware that there
are other algorithms for implementing inser-tions into a B-tree. For example,
to avoid the possibility of recursive node splits, we can split full nodes
encountered in searching for an appropriate leaf for the new record. Another
possibility is to avoid some node splits by moving a key to the node’s sibling.
For example, inserting 65 into the B-tree in Figure 7.8 can be done by moving
60, the smallest key of the full leaf, to its sibling with keys 51 and 55, and
replacing the key value of their parent by 65, the new smallest value in

the second child. This
modification tends to save some space at the expense of a slightly more
complicated algorithm.

A B-tree does not have to be
always associated with the indexing of a large file, and it can be considered
as one of several search tree varieties. As with other types of search
trees—such as binary search trees, AVL trees, and 2-3 trees—a B-tree can be
constructed by successive insertions of data records into the initially empty
tree. (The empty tree is considered to be a B-tree, too.) When all keys reside
in the leaves and the upper levels are organized as a B-tree comprising an
index, the entire structure is usually called, in fact, a *B*^{+}** -tree**.

**Exercises 7.4**

**
**Give examples of using an index in real-life applications that do
not involve computers.

**a. **Prove the equality

which was used in the
derivation of upper bound (7.7) for the height of a B-tree.

**
**Complete the derivation of inequality (7.7).

**
**Find the minimum order of the B-tree that guarantees that the
number of disk accesses in searching in a file of 100 million records does not
exceed 3. Assume that the root’s page is stored in main memory.

**
**Draw the B-tree obtained after inserting 30 and then 31 in the
B-tree in Figure 7.8. Assume that a leaf cannot contain more than three items.

**
**Outline an algorithm for finding the largest key in a B-tree.

**a. **A** top-down 2-3-4 tree **is a B-tree of order 4 with the following
modifica-tion of the

encounters a full node (i.e.,
a node with three keys), the node is split into two nodes by sending its middle
key to the node’s parent, or, if the full node happens to be the root, the new
root for the middle key is created. Construct a top-down 2-3-4 tree by
inserting the following list of keys in the initially empty tree:

10** ,** 6

**
**What is the principal advantage of this insertion procedure
compared with the one used for 2-3 trees in Section 6.3? What is its
disadvantage?

**
****a. **Write a program implementing a key insertion
algorithm in a B-tree.

**
**Write a program for visualization of a key insertion algorithm in a
B-tree.

**SUMMARY**

Space and time trade-offs in
algorithm design are a well-known issue for both theoreticians and
practitioners of computing. As an algorithm design technique, trading space for
time is much more prevalent than trading time for space.

*Input enhancement *is one of the two principal varieties of
trading space for* *time in algorithm
design. Its idea is to preprocess the problem’s input, in whole or in part, and
store the additional information obtained in order to accelerate solving the
problem afterward. Sorting by distribution counting and several important
algorithms for string matching are examples of algorithms based on this
technique.

*Distribution counting *is a special method for sorting lists of
elements from a* *small set of possible
values.

*Horspool’s algorithm *for string matching can be considered a
simplified* *version of the *Boyer-Moore algorithm.* Both algorithms
are based on the ideas of input enhancement and right-to-left comparisons of a
pattern’s characters. Both algorithms use the same *bad-symbol shift table*; the Boyer-Moore also uses a second table,
called the *good-suffix shift table.*

*Prestructuring—*the second type of technique that exploits space-for-time* *trade-offs—uses extra space to
facilitate a faster and/or more flexible access to the data. Hashing and B^{+}-trees are important examples of
prestructuring.

*Hashing *is a very efficient approach to implementing dictionaries. It is
based* *on the idea of mapping keys
into a one-dimensional table. The size limitations of such a table make it
necessary to employ a *collision
resolution* mechanism. The two principal varieties of hashing are *open hashing* or *separate chaining* (with keys stored in linked lists outside of the
hash table) and *closed hashing*

or *open addressing* (with keys stored inside the table)*.* Both enable searching, insertion, and
deletion in ** (**1

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Introduction to the Design and Analysis of Algorithms : Space and Time Trade-Offs : B-Trees Algorithms |

**Related Topics **

Privacy Policy, Terms and Conditions, DMCA Policy and Compliant

Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.