Static Hashing
In static hashing, the hash function always generates the same bucket’s address. For example, if we have a data record for employee_id = 107, the hash function is mod-5 which is – H(x) % 5, where x = id. Then the operation will take place like this:
H(106) % 5 = 1.
This indicates that the data record should be placed or searched in the 1st bucket (or 1st hash index) in the hash table.
Example:
The primary key is used as the input to the hash function and the hash function generates the output as the hash index (bucket’s address) which contains the address of the actual data record on the disk block.
Static Hashing has the following Properties
- Data Buckets: The number of buckets in memory remains constant. The size of the hash table is decided initially and it may also implement chaining that will allow handling some collision issues though, it’s only a slight optimization and may not prove worthy if the database size keeps fluctuating.
- Hash function: It uses the simplest hash function to map the data records to its appropriate bucket. It is generally modulo-hash function
- Efficient for known data size: It’s very efficient in terms when we know the data size and its distribution in the database.
- It is inefficient and inaccurate when the data size dynamically varies because we have limited space and the hash function always generates the same value for every specific input. When the data size fluctuates very often it’s not at all useful because collision will keep happening and it will result in problems like – bucket skew, insufficient buckets etc.
To resolve this problem of bucket overflow, techniques such as – chaining and open addressing are used. Here’s a brief info on both:
1. Chaining
Chaining is a mechanism in which the hash table is implemented using an array of type nodes, where each bucket is of node type and can contain a long chain of linked lists to store the data records. So, even if a hash function generates the same value for any data record it can still be stored in a bucket by adding a new node.
However, this will give rise to the problem bucket skew that is, if the hash function keeps generating the same value again and again then the hashing will become inefficient as the remaining data buckets will stay unoccupied or store minimal data.
2. Open Addressing/Closed Hashing
This is also called closed hashing this aims to solve the problem of collision by looking out for the next empty slot available which can store data. It uses techniques like linear probing, quadratic probing, double hashing, etc.
Hashing in DBMS
Hashing in DBMS is a technique to quickly locate a data record in a database irrespective of the size of the database. For larger databases containing thousands and millions of records, the indexing data structure technique becomes very inefficient because searching a specific record through indexing will consume more time. This doesn’t align with the goals of DBMS, especially when performance and date retrieval time are minimized. So, to counter this problem hashing technique is used. In this article, we will learn about various hashing techniques.