Lab 5

CSC 241- Lab 5 (Due May 5, 2000) [60 Points]

Theme ...

You will learn ...

about two very important data structures: Hash Tables and Binary Search Trees.
to compare and contrast behavior of these data structures in the implementation of sets.
about performance tuning techniques.
to utilize such data structures in applications that require collections or sets.
more about practical usages of class hierarchy in Object Oriented design.

In this lab, you will work with the Set Package. I developed two interfaces that you need to become familiar with. I have already implemented the Set interface with a Tree class and a hashTable class. This interface represents collection classes that hold unique objects.

The elementType interface needs to be implemented by any class where its objects are going to be stored in our in a set. Such objects in adhering to the elementType interface must provide a key method that uniquely identify them. This characteristic is important when dealing with sets, we need to be able to distinguish and compare our objects that we store in them. With the ability identify elements uniquely and compare them, we are enabled to store them in a way where we can find them quickly.

In completing this lab, You will also build an on-line dictionary application that will utilize our tree or hashTable class. Building this application will require you to implement elementType in a class that will hold a word/meaning.

Create directories and copy some files

cd public-html/classes/csc241
cp -r ~mohammad/public-html/classes/csc241/Set/ . -- don't forget the dot or the -r.

From public-html/classes/csc241 directory, type ls -R Set. The list should look like:
Set: Set.java duplicateException.java hashNode.java notFoundException.java treeNode.java Tree.java elementType.java hashTable.java testSet Set/testSet: elementForTest.java test1.java lab5.dat test2.java

Compile all Java files and run the test program

Use javac *.java in each directory to compile all of the above .java files.
Run the program that tests the two implementation of Set.
- You need to be in the directory public-html/classes/csc241/Set/testSet.
- Execute test1 with java test1 tree and java test1 hash 10
- Execute test2 with java test2 tree and java test2 hash 10

Compare Tree against hash table

First lets perform a few experiments to compare performances for hash tables and binary search trees. We need a bigger pool of data than what test1 and test2 use to test these data structures with; just inserting 100 or so elements and performing 100 or so update operations on these structures will not give you a definitive conclusion on their performance.

Modify the test2.java program to test for inserting 10,000 elements and 1,000 randomly chosen update operations after the insertion are complete. Two thing that you should not over look:

When generating the key, the range of values generated was set to be 10 to 110. When generating 10,000 values, if this range remains the same, the majority will be duplicates and won't get inserted (i.e. your sample will have at most 100 values instead of thousands). The range of number used for generating the key must expand to allow 5 digit values to be generated.
At this point, we should be able to trust our operations, so we should not need to print any echos for what operation is being performed or if it was successful or not. performing output, in general, gets in the way of performance comparisons. So, don't perform any dumps at all. You shouldn't do any print or println anywhere in the test program, except for outputing the time it took to run at the bottom.

Run the test program with tree as its argument to test the tree version and collect the elapsed time for 5 runs and record their average.

Run the test program with hash as its argument and the size 10 again collecting the elapsed times for 5 runs and recording their average.

What happens if you use a size argument that is larger than just 10 when testing the hash version? Is there a point where hashtable performs better than BST. When testing for higher hash table sizes, be sure to average the results of 5 runs in each case before recording them.

In response to the questions in this part, in your email, include:

what the average elapsed time for the BST was.
a table showing what sizes you tested the hash table for and
what does your analysis yield when comparing the performance of these data structures.

Tune the implementations

In this section, you will modify some of the code for both data structures and again through experimentation respond to a performance question.

The way things work now, the actual process of inserting or deleting elements don't have to bother with the possibility of either the element not being there, when it should be, or it being there, when it shouldn't be. The idea here is simple, by using find, you traverse the data structure looking for the element; in remove, if it is not there, we throw an exception; in add, we do the opposite, if it is there, we throw an exception. This way, the rest of these methods don't have to deal with exceptions.

Remove the calls to the find method. Throw no exception in the add or remove methods of the Tree class. The insert and del methods in the Tree version need to be modified to account for their respective exceptions. Both insert and del need a throws with the appropriate exception that they should now throw in their method header. In insert you must consider that the element may already be in the tree and if it is, throw the duplicateException. In del you must consider that the element might not be in the tree and if it is not, throw the notFoundException. add and remove methods still need their "throws ..." clause in their header. Also, no try/catch is necessary since the respective calls to insert and del methods could cause exceptions to be thrown, add/remove will implicitly throw the needed exceptions.

hashTable need to updated in a similar way. the difference is that add and remove don't call any recursive methods, thus, they themselves have to be changed to find the elements.

test changes that you make with test2 (i.e. with output), to make sure these methods still work correctly.

Once you know they work correctly, you need to repeat the analysis that you performed in the previous section in order to respond to the following question:

Can the performance of either Tree or hashTable be improved by making the changes specified here.

Submit the code for both Tree and hashTable with the modifications.

Develop a Dictionary Application

Build a dictionary application program. Your dictionary program needs to load word/meanings from a file and provide users with the following options:

Find a particular word's meaning. Program should say something reasonable when the word is not found.
Add a new word and its meaning. Of course, the word must not exist before this operation.
Remove a word and its meaning from the dictionary.
Change the meaning of a word

You should have already copied a file into your testSet named lab5.dat. This file contains words and meanings in a simple format. This is the data file that you will use to initially load words and meanings.

Before developing the application program, you will need to first develop a dictionaryEntry class that implements elementType. This class must have two fields: word and meaning. It needs to provide the appropriate methods required by the elementType interface. The key method must return the word. The stringView must return a string that follows this template: The word is:XXXXXX with the meaning:XXXXXXXX Write a simple, but meaningful constructor for this class.

Then, write the dictionary application program that uses dictionaryEntry to store word/meanings loaded from the input file in a hashTable or a Tree. Since, the data file has words that are already sorted, the tree version will not work optimally, do you know why? Your application should follow the algorithm given below:

Build a set from the words and meanings in the data file
loop forever
Provide a menu (1. Find Word, ..., 4. Quit) and get choice
switch choice
case 1: prompt for and read a word; find; output meaning or a NOT FOUND message. break
case 2: prompt for and read a word and meaning; add an entry. output success or failure. break.
case 3: prompt for and read a word to delete. remove an entry. output success or failure. break.
case 4: prompt for and read a word and a new meaning. remove the old entry. insert a new entry. output success or failure. break.
case 5: return.

Reading from a file

The following is extracted from testNormal2.java file that we used in testing our Tank class and shows you how to set a file up for reading strings. Of course, you only need to do this once to initially read the word and meanings before you begin interacting with the user.

Also, notice how the file name is read from the keyboard. Your program interactions will require input from the user as well. For example, you may prompt them for a word to be deleted, so your program needs to read that word from the keyboard. String word=inp.readLine(); should allow you to read the word from the keyboard.


    try {
      String text;
      DataInputStream inp = new DataInputStream(System.in);
      System.out.println("Enter File Name:");
      String fname = inp.readLine();
      InputStream ist = new FileInputStream(fname);
      DataInputStream istream = new DataInputStream(ist);
      text = istream.readLine();
      while(text != null) {
	.
	.
	.
        text = istream.readLine();
      }
    }
    catch (FileNotFoundException e) {
      System.err.println("File does not exist");
    }
    catch (IOException e) {
      System.err.println("Unsuccessful read");
    }

Submit a copy of this application program, as well as, your dictionaryEntry class.

Extra Credit--10 extra points

Add an additional option to the on-line dictionary built as part of lab5. This option enables the user to either see all words/meanings or a partial set of them.

The user needs to be prompted for a partial word/*. Lets assume the user response is stored in w. If w is a "*", all word/meanings are displayed, otherwise, the subset of the elements that match the content of w at their beginning are displayed. For example, if w is "la", all words in the dictionary that start with the substring la are displayed. Keep in mind that the toArray returns all elements from the set in an array, so you can use it to return all word/meanings and selectively display what you need.