CSC 241- Set #14


Hash Table

Hash table is a great data structure for implementing a Set. hash tables like BSTs allow us to store/retrieve unique objects.

Each object is mapped to an array cell based on a hash function. Hash functions are designed to distribute these objects across the whole array evenly. The range of unique identities for object is generally much larger than the number of cells that we allocate for the hash table; thus, causing what we refer to as collisions. Collisions occur when 2 or more objects are mapped into one cell of the array.

The only way to avoid needing a collision resolution technique in a hash table is to have enough cells where each object maps to a unique cell in the array. For example, out aside 999,999,999 cells in so that each employee SS# maps into its own cell; this is typically not practical.

There are a number variations in implementing hash tables; what we will discuss here will involve collision resolution by chaining. In this technique, multiple objects mapped into the same array cell are maintained in a linked list that stem from that cell.

Let us look at an example to understand how it all works. Assume that our array has seven cells in it, thus the hash function needs to generate a number from 0 to 6. For this example we will use simple ints. Modulus arithmetic is used to map each value to an index from 0 to 6. value % 7 yields a number from 0 to 6. For example: if value is 10, the function will yield a 3. If value is 45, the function yields a 3. If value is 19, the function yields a 5. If value is 63, the function yields a 0.

lets put these values in our table assuming they are inserted in the sequence shown above:


hash table
  ---     ----
0| --|---|63--|---
 |---|    ----
1| --|---
 |---|
2| --|---
 |---|    ----     ----
3| --|---|45--|---|10--|---
 |---|    ----     ----
4| --|---
 |---|    ----    
5| --|---|19--|---
 |---|    ----
6| --|---
 |---|
7| --|---
  ---

Notice that the information here is really not ordered in a way that is useful directly. In other words, we can't just generate a sorted list of values by visiting all nodes in the table. This is an important consideration when deciding on implementations. For example, if the application needs the ability to readyly generate an ordered list of employees, the BST may be a better choice. But, with a reasonable number of cells put aside and a hash function that distributes values well, hash table compares well with BST when performing the operations required in Set.

Hash Table Implementation

We first need a hashNode class to implement the linked list that we need for representing chains. hashNode is a simple linked list node. with an element and a reference to the next node like the ones we have already seen for dynamicStack, dynamicQueue and intList. The only difference is that hashNode holds a Comparable element.


class hashNode {
  Comparable element_;
   hashNode next_;
   hashNode (Comparable e, hashNode n) {
     next_=n;
     element_=e;
   }
}

In looking at the hashTable class, note that there are three fields:

The constructor sets arraySize_ and assigns null to all cells of the array.

hash method

hash is a private method, not accessible outside the class. Math.abs(x.hashCode()) % arraySize_; is what this function returns; it really is the same as the function I shared with you earlier in our example.

hashCode() is a method that all objects in Java have. If a class does not override this method, it is designed to return a unique number for an object based on its address. We assume that each object that we store in our hashTable will have a hashCode() method that uniquely identifies it based on its content. For example, String objects already do that. hashCode() may return a negative number which is why we use Math.abs() to make sure we have its absolute value. % provides the remainder of performing a division of Math.abs(x.hashCode()) by arraySize_.

add method

Very simple process! We again use find to make sure the object is not already in the table as we did with our tree implementation. if we know that the value doesn't exist, we simply insert a new node in front of the chain that we hash into and increment size_. The instruction: t_[hash(x)]=new hashNode(x,t_[hash(x)]); accomplishes this. hash(x) provides the index of the cell that we hash into. The assignment statement puts the new node in front of the chain that stems from the array cell (i.e binding the cell with the new node with what was in the array cell before, now is the next node). If the array cell used to reference null, now it references the new node with our new object.

remove method

The remove method is designed to hash into the appropriate cell, locate the node that needs to be removed and change references to accomplish this.

Again, we use find to detect the situation where the object is not in the table. If the object is found:size_ is decremented first. We then check to see if the element that we are looking for is front of the chain. if the object is in front, the cell that we hashed into will now reference the next node in the chain t[hash(e)]=t[hash(e)].next;. If not in front, we need to traverse the chain. Establish a temp referencing the first element of the chain. temp will always point to the node before the one we are considering. Once the node with the object being deleted is located, next_ in temp is replaced by the next_ in the node being removed. for (hashNode temp=t_[hash(e)];temp.next_ != null;temp=temp.next_) should keep going until the element correct node is found and deleted. This loop won't terminate by hitting the end of the chain since; since it would mean that the element is not in our chain.

size method

The size method returns size_. As a side note, think about about what would happen if we don't keep track of size with size_. Clearly, in this method, we would need to go through the whole hash table and count all elements. So, by simply keeping a size_ field, the algorithm here is O(1). The alternative would have O(n).

dump method

Displays all elements. All objects in java have a toString() method, if a class does not override this method, it returns a string with the address of the object. We expect that our objects have a toString() method that actually display their content.

The method display each index and the the chain stemming from it. The first row of output will be 0 and the chain stemming from cell 0. The 2nd row of output displays 1 and the chain stemming from the cell 1, and so on.

toArray method

This method goes through the table the same way as dump did, except that each element is placed in an array of objects and returned when filled up. The array of objects is created at the beginning of the method. Observe that this method, unlike the toArray method in the Tree class does not produce an ordered list. If having an ordered list is important in an application, it is best to use the Tree implementation of Set; if the hashTable implementation is chosen, the array returned will need to be sorted.


Writing Applications with Sets

Applications that maintain a set of unique elements are candidates for untilizing the Set interface and one of its implementations. They will need a class that represents the objects that they need to store which follows the following framework:


public class CLASS-NAME implements Comparable {
  SOME FIELDS  

  SOME METHODS

  public int compareTo (Object c) {

  }

  public int hashCode () {
   
  }

  public String toString() {
 
  }
 
  public CLASS-NAME (...) {
  }
}

So, its not enough for this class to just implement Comparable. We want such classes to have a toString and a hashCode method as well. For example, consider an application that needs to maintain a set of employees. Here is what an Employee class would look like:


public class Employee implements Comparable {
  String SS_;    the employee SS#   
  String name;   the employee name
  double salary; the employee salary

  public String SS() {return SS_;}
  public String name() {return name_;}
  public double salary() {return salary_;}

  /*compare SS#s of employees */
  public int compareTo (Object c) {
    return SS_.compareTo(((Employee)c).SS_);
  }

  /* return hashCode of the ss# of the employee */
  public int hashCode () {return SS_.hashCode();}

  /* return the object in the form of a string */
  public String toString() {return "SS# is: " + SS_ + 
                                "  Name: " + name_ + " Salary: " + salary_;}
  
  public Employee (String s, String n, double sal) {
    SS_=s;
    name_=n;
    salary_=sal;                                                            
  }
}

the field that uniquely identifies each employee is SS_. In CompareTo that is the field that is checked against the same field in the incoming object (c). Note that we first cast the object c as an Employee before we compare the fields. SS_ is a String which has a compareTo method. The hashCode method invokes hashCode from SS_. hashCode method for String objects is designed to return the same code when the content of two strings is the same, so unlike the hashCode available in Object, we can track down an employee based on his/her SS# and not the address of the employee object. We can't write an application where we would expect the user to know the address of an employee in memory. The toString is cool, when an object has such a method, when you print the object, its toString method gets invoked automatically; in this case, you can see that we return a string with the fields and their labels.

Now, lets discuss some of the ways we use Set and its methods for an employee application.