Lab 5

CSC 241- Lab 5 (Due 12/12, 2003) [60 Points]

Theme ...

You will learn ...

about two very important data structures: Hash Tables and Binary Search Trees.
to compare and contrast behavior of these data structures in the implementation of sets.
about performance tuning techniques.

In this lab, you will work with the Set Package. You start by copying the Set interface, its two implementations Tree and hashTable, along with their supporting node classes. You will also copy the test programs I have designed for these classes.

Set objects act as container for unique entities. All elements stored in a Set must implement the Comarable Interface. String is an example of a class that implements Comparable. Comparables are compared to other like objects with their CompareTo method. This characteristic is important when dealing with sets, we need to be able to distinguish and compare our objects that we store in them. With the ability to identify elements uniquely and comparing them to see is they are the same or one is less than the other, we are enabled to store them in a way where we can find them quickly.

1-Create directories and copy some files

cd public-html/classes/csc241
cp -r /csclass/mohammad/public-html/classes/csc241/Set/ . -- don't forget the dot or the -r.

From public-html/classes/csc241 directory, type ls -R Set. The list should look like:
Set: Set.java hashNode.java treeNode.java Tree.java hashTable.java testSet Set/testSet: elementForTest.java test1.java test2.java

2-Compile all Java files and run the test program

in the testSet directory, test1 and test2 are obviously test programs. test1 randomly generates String objects and stores them in a Set object and then calls the dump() method in the Set object. test2 randomly generates elementForTest objects and stores them in a Set object. It also, randomly invokes other operations, like find, and remove to test them as well. elementForTest objects are Comparable, they have a key_ field that uniquely identifies them.

Use javac *.java in each directory to compile all of the above .java files.
Run the program that tests the two implementation of Set.
- You need to be in the directory public-html/classes/csc241/Set/testSet.
- Execute test1 with java test1 tree and java test1 hash 10
- Execute test2 with java test2 tree and java test2 hash 10

3-Compare Tree against hash table

First lets perform a few experiments to compare performances for hash tables and binary search trees. We need a bigger pool of data than what test1 and test2 use to test these data structures with; just inserting 100 or so elements and performing 100 or so update operations on these structures will not give you a definitive conclusion on their performance.

The test2 program is already designed to calculate the elapsed time (i.e. how long it took to perform the operations); modify it to test for inserting 10,000 elements and 1,000 randomly chosen update operations after the insertion are complete. Two thing that you should not over look:

When generating the key (the field in the comparable object used for identifying it), the range of values generated was set to be 10 to 110. When generating 10,000 values, if this range remains the same, the majority will be duplicates and won't get inserted (i.e. your sample will have at most 100 values instead of thousands). The range of numbers used for generating the key must expand to allow 5 digit values to be generated.
At this point, we should be able to trust our operations, so we should not need to print any echos for what operation is being performed or if it was successful or not. performing output, in general, gets in the way of performance comparisons. So, don't perform any dumps at all. You shouldn't do any print or println anywhere in the test program, except for outputting the time it took to run at the bottom.

Run the test program with tree as its argument to test the tree version and collect the elapsed time for 5 runs and record their average.

Run the test program with hash as its argument and the size 10 again collecting the elapsed times for 5 runs and recording their average.

What happens if you use a size argument that is larger than just 10 when testing the hash version? Is there a point where hashtable performs better than BST. When testing for higher hash table sizes, be sure to average the results of 5 runs in each case before recording them.

response to the questions in this part, in your email, include:

what the average elapsed time for the BST was.
a table showing what sizes you tested the hash table for and
what does your analysis yield when comparing the performance of these data structures.

4-Tune the implementations

In this section, you will modify some of the code for both data structures and again through experimentation respond to a performance question.

The way things work now, the actual process of inserting or deleting elements don't have to bother with the possibility of either the element not being there, when it should be, or it being there, when it shouldn't be. The idea here is simple, by using find, you traverse the data structure looking for the element; in remove, if it is not there, we do nothing; in add, we do the opposite, if it is already there, we do nothing. By using find in add and remove, del or insert don't have to deal with an object not existing when it should, or exiting when it shouldn't be.

Remove the calls to the find method. The insert and del methods in the Tree version need to be modified to deal with the described anomalies.

insert you must consider that the element may already be in the tree and if it is, you must simply return root.

del you must consider that the element might not be in the tree and if it is not, just return null; you only determine an object is not in the tree when you root is null.

hashTable needs to be updated in a similar way. the difference is that add and remove don't call any recursive methods, thus, they themselves have to be changed to find the elements.

test changes that you make with test2 (i.e. with output), to make sure these methods still work correctly. There is no sense to to doing a comparison "new" vs. "old" if you have made mistakes in changing insertion or deletion processes in either Tree or hashTable.

Once you know they work correctly, you need to repeat the analysis that you performed in the previous section (section 3 above) in order to respond to the following question:

Did the performance of either Tree or hashTable be improved after making the specified changes.

Submit the code for both Tree and hashTable with the modifications.