HASHING
Hashing
is a hash function computed on some attribute of each record; the result
specifies in which block of the file the record should be placed.
Static Hashing
A
bucket is a unit of storage containing one or more records (a bucket is
typically a disk block). Hash function h
is a function from the set of all search-key values K to the set of all bucket addresses B.
Hash
function is used to locate records for access, insertion as well as deletion.
Records
with different search-key values may be mapped to the same bucket; thus entire
bucket has to be searched sequentially to locate a record.
Example
of Hash File Organization
Hash
file organization of account file,
using branch-name as key o There are
10 buckets , o The binary representation of the ith character is assumed to be the integer i.
o
The hash function returns the sum of the binary representations of the
characters modulo 10. o E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) =
3
o
Hash file organization of account
file, using branch-name as key
Hash Functions
o
Worst had function maps all search-key values to the same bucket; this makes
access time proportional to t he number of search-key values in the file.
o
An ideal hash function is uniform,
i.e., each bucket is assigned the same number of search-key values from the set
of all possible values.
o
Ideal hash function is random, so each bucket will have the same number of
records assigned to
it
irrespective of the actual distribution
of search-key values in the file.
o
Typical hash functions perform computation on the internal binary
representation of the searchkey.
o
For example, for a string search-key, the binary representations of all the
characters in the string
could
be added and the sum modulo the number of buckets could be returned.
Handling of Bucket Overflows
o
Bucket overflow can occur because of
Insufficient
buckets
Skew
in distribution of r ecords. This can occur due to two reasons: multiple
records have same search-key value
chosen
hash function produces non-uniform distribution of key values
o
Although the probability of bucket overflow can be reduced, it cannot be
eliminated; it is handled
by
using overflow buckets.
o
Overflow chaining – the overflow buckets of a given bucket are chained together
in a linked list.
o
The Above scheme is called closed hashing.
o
An alternative, called open hashing, which does not use overflow buckets, is
not suitable for database applications.
Hash Indices
o
Hashing can be used not only for file organization, but also for
index-structure creation.
o
A hash index organizes the search keys, with their associated record pointers,
into a hash file structure.
o
Strictly speaking, hash indices are always secondary indices
o
If the file itself is organized using hashing, a separate primary hash index on
it using the same search-key is unnecessary.
o
However, we use the term hash index to refer to both secondary index structures
and hash organized files.
Example
of Hash Index
Deficiencies of Static Hashing
o
In static hashing, function h maps
search-key values to a fixed set of B
of bucket addresses.
o
Databases grow with time. If initial number of buckets is too small,
performance will degrade due
to
too much overflows.
o
If file size at some point in the future is anticipated and number of buck-ets
allocated accordingly,
significant
amount of space will be wasted initially. o If database shrinks, again space
will be wasted.
o
One option is periodic re-organization of the file with a new hash func-tion,
but it is very expensive.
o
These problems can be avoided by using techniques that allow the number of
buckets to be modified dynamically.
Dynamic Hashing
Good for database that grows and shrinks in
size . Allows the hash function to be
modified dynamically.
Extendable
hashing – one form of dynamic hashing
Hash
function generates values over a large range — typically b-bit integers, with b =
32.
At
any time use only a prefix of the hash function to index into a table of bucket
addresses.
Let the length of the prefix be i bits, 0 i 32. Bucket address table size = 2i. Initially i = 0.
Value
of i grows and shrinks as the size of
the database grows and shrinks.
Multiple
entries in the bucket address table may point to a bucket. Thus, actual number
of buckets is < 2i.
The number of buckets also changes dynamically due
to coalescing and splitting of buckets.
General
Extendable Hash Structure
Use
of Extendable Hash Structure
o
Each bucket j stores a value ij; all the entries that point to the
same bucket have the same values on the first ij bits.
o
To locate the bucket containing search-key Kj:
o 1. Compute h(Kj) = X o 2.Use the
first i high order bits of X as a displacement into bucket address
table, and follow the pointer to appropriate bucket
o
To insert a record with search-key value Kj
o
follow same procedure as look-up and locate the bucket, say j. o If there is room in the bucket j insert record in the bucket.
o
Else the bucket must be split and insertion re-attempted. o Overflow buckets
used instead in some cases.
Updates
in Extendable Hash Structure
o
To split a bucket j when inserting
record with search-key value Kj: o If
i > ij (more than one pointer to bucket j)
o
allocate a new bucket z, and set ij and iz to the old ij -+ 1.
o
make the second half of the bucket address table entries pointing to j to point to z o remove and reinsert each record in bucket j.
o
recompute new bucket for Kj and
insert record in the bucket (further splitting is required if the bucket is
still full)
o
If i = ij (only one pointer to bucket
j)
o
increment i and double the size of
the bucket address table.
o
replace each entry in the table by two entries that point to the same bucket.
o
recompute new bucket address table entry for Kj Now i > ij so use the first case above.
o When inserting a value, if the bucket is full after several splits (that is, i reaches some limit b) create an overflow bucket instead of splitting bucket entry table further.
o To delete a key value, o locate it in its bucket and remove it.
o
The bucket itself can be removed if it becomes empty (with appropriate updates
to the bucket address table).
o
Coalescing of buckets can be done (can coalesce only with a ―buddy‖ bucket having same
value of ij and same ij –1 prefix, if it is present).
o
Decreasing bucket address table size is also possible.
o
Note: decreasing bucket address table size is an expensive opera-tion and
should be done only if number of buckets becomes much smaller than the size of
the table.
Use
of Extendable Hash Structure: Example Initial Hash structure, bucket size = 2
Hash
structure after insertion of one Brighton and two Downtown records Hash structure after insertion of Mianus
record
Hash
structure after insertion of three Perryridge records
Hash structure after insertion of Redwood and Round
Hill records
Extendable
Hashing vs. Other Schemes
Benefits of extendable hashing :
o
Hash performance does not degrade with growth of file o Minimal space overhead
Disadvantages
of extendable hashing
o
Extra level of indirection to find desired record
o
Bucket address table may itself become very big (larger than memory)
Need
a tree structure to locate desired record in the structure! o Changing size of
bucket address table is an expensive operation
Linear hashing is an alternative mechanism which avoids these disadvantages at the possible cost of more bucket overflows.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.