CSC 241- Set #14


Hash Table

Hash table is a great data structure for implementing a Set. There are a number variations in implementing hash tables. What we will discuss here will involve collision resolution by chaining which is only one of the many techniques for addressing collisions.

The basic concept here is that you have objects identified by a unique key that need to be maintained in a structure. The key of such an object is mapped to an array cell based on a hash function that is designed to distribute these objects in the array. Collision resolution with chaining is a technique for dealing with multiple keys mapping to the same array cell; this technique allows us to create a linked list of nodes with keyed objects. The only way to avoid needing a collision resolution technique is to have enough cells where each key maps to a unique cell in the array; which is typically not practical. Let us look at an example to understand how it all works.

Lets assume that our array has seven cells in it, thus the hash function needs to generate a number from 0 to 6. Assuming a simple int key, we can use modulus arithmetic to map the key to an int number from 0 to 6. key % 7 yields a number from 0 to 6. A few examples: if key is 10, the function will yield a 3. If key is 45, the function yields a 3. If key is 19, the function yields a 5. If key is 63, the function yields a 0.

lets put these keys in our table assuming they are inserted in the sequence shown above:


hash table
  ---     ----
0| --|-->|63--|-->
 |---|    ----
1| --|-->
 |---|
2| --|-->
 |---|    ----     ----
3| --|-->|45--|-->|10--|-->
 |---|    ----     ----
4| --|-->
 |---|    ----    
5| --|-->|19--|-->
 |---|    ----
6| --|-->
 |---|
7| --|-->
  ---

Hash tables are great for maintaining data with keys but notice that the information here is really not ordered in a way that is useful directly. In other words, we can't just generate a sorted list of keys by visiting all nodes in the table. This is an important consideration when deciding on implementations. For example, if the application needs the ability to readyly generate an ordered list of employees, the BST may be a better choice. But, with a reasonable number of cells put aside and a hash function that distributes values well, hash table compares well with BST when performing the operations required in Set.

Hash Table Implementation

We first need a node class to implement the linked list that we need for representing chains:


class hashNode {
  elementType element_;
   hashNode next_;
   hashNode (elementType e, hashNode n) {
     next_=n;
     element_=e;
   }
}

hashNode is a simple linked list node. with an element and a reference to the next node.

In looking at the hashTable class, note that there are three fields:

The constructor sets arraySize and assigns null to all cells of the array.

hash method

hash is a private method, not accessible outside the class. Math.abs(x.hashCode()) % arraySize_; is what this function returns; it really is the same as the function I shared with you earlier in our example. hashCode() is a method in the String class that returns a whole number that is unique for the string held in x; however, this number may be negative which is why we use Math.abs() to make sure we have its absolute value. % provides the remainder of performing a division of Math.abs(x.hashCode()) by arraySize_.

add method

Very simple process! We again use find to make sure the key is not already in the table as we did with our tree implementation. if we know that the value doesn't exist, we simply insert a new node in front of the chain that we hash into and increment size_. The instruction: t_[hash(x.key())]=new hashNode(x,t_[hash(x.key())]); accomplishes this. hash(x.key()) provides the index of the cell that we hash into, the rest should make sense as we are inserting in front of the chain.

remove method

The remove method is designed to hash into the appropriate cell, locate the node that needs to be removed and change references to accomplish this.

Again, we use find to detect the situation where the key is not in the list; if the key is not in the table, we throw our exception. Once, we know that the key does exist, we can decrement size_. We then check to see if the element that we are looking for is front of the chain, if it is, the cell that we hashed into will now reference the next node in the chain (t[hash(key)]=t[hash(key)].next;)

Once we establish that we need to traverse the chain, we establish a temp node reference. temp will always point to the node before the one we are considering. Once the node with the object being deleted is located, next_ in temp is replaced by the next_ in the node with our key, effectively removing that node from the chain. The for loop:
for (hashNode temp=t_[hash(k)];temp.next_ != null;temp=temp.next_) should keep going until the element correct node is found and deleted.

size method

The size method returns size_. As a side note, think about about what would happen if we don't keep track of size with size_. Clearly, in this method, we would need to go through the whole hash table and count all elements. So, by simply keeping a size_ field, the algorithm here is O(1). The alternative would have O(n).

dump method

Displays all elements using their stringView methods. With the 1st row displaying the chain stemming from cell 0, the 2nd row displaying the chain stemming from the cell 1, and so on.

toArray method

This method goes through the table the same way as dump did, except that each element is placed in an array of objects and returned when filled up. The array of objects is created at the beginning of the method. Observe that this method, unlike the toArray method in the Tree class does not produce an ordered list. If having an ordered list is important in an application, it is best to use the Tree implementation of Set; if the hashTable implementation is chosen, the array returned will need to be sorted.