CSC 241- Week 9 (March 19, 1997)


Quick Sort

Quicksort is a divide and conquer method of sorting. It works by partitioning a list into two parts, then sorting the parts independently. The Algorithm has the following recursive structure:

quickSort (int leftIndex, int rightIndex) {
   if (rightIndex>leftIndex) {
      int i = partition(leftIndex, rightIndex);
      quickSort(leftIndex,i-1);
      quickSort(i+1,rightIndex);
   }
}

The parameters leftIndex and rightIndex delimit the sublist within the original list that is to be sorted. We initiate the sort process by calling this method via quickSort(0,n-1), where n is the number of elements in the list. Similar to Selection sort it is assumed that the list of elements is available to us in this method and the partition method. The crux of the method is the partition method which must rearrange the list to make the following three conditions hold:

  1. the element data[i] is in its final place in the array for some i,
  2. all elements in data[leftIndex],...,data[i-1] are less than or equal to data[i],
  3. all elements in data[i+1],...,data[rightIndex] are greater than or equal to data[i].

partition

All implementations of partition require the notion of a pivot value. Pivot is chosen from the sublist of elements to be sorted. You could arbitrarily choose any element you want to act as the pivot, left most element, right most element, or as we will, the element in the middle. Our algorithm moves the pivot value to the location it belongs in the completely sorted list and all elements smaller than or equal to it fall in the left partition and the rest in the right partition. This way, when we apply the same algorithm to the left sublist and the right sublist and sort them both, we are ensured that the whole list is sorted.

Once we have a pivot value, we scan the list from the leftIndex until an element greater than or equal to pivot is found. we then scan from rightIndex until an element less than or equal to pivot is found. The two elements which stopped the scans are exchanged. We continue these left and right scans until the indexes pass each other. The pivot value should ends up in it final position. Here is one implementation quickSort with the code for partition embedded in it.

void Quicksort (int leftIndex,int rightIndex) {
   int i = leftIndex;
   int j = rightIndex;
   int pivot = data[(leftIndex + rightIndex)/2];
   do {
        while(data[i] < pivot) i++;
        while(pivot < data[j]) j--;
        if (i <= j) {
           int tmp = data[i];
           data[i] = data[j];
           data[j] = tmp;
           i++;
           j--;
        }
   } while (i <= j);
   if (leftIndex < j) Quicksort(data,leftIndex,j);
   if (i < rightIndex) Quicksort(data,i,rightIndex);
}



Consider this list and follow the trace shown when the sort algorithm is applied:

leftIndex=0, rightIndex=3, pivot=12
      -------------------      -------------------      -------------------
     |11 | 12  | 9  | 10 |==>|11 | 10  | 9  | 12 |==>  |11 | 10  | 9  | 12 | 
      -------------------     -------------------      --------------------
       0    1    2    3        0    1    2    3         0    1     2    3      
    i=0^           j=3^          i=1^      j=3^                 j=2^  i=3^

Note that 12, the pivot value, is in the correct spot, all elements to its left are smaller and, as it turns out, there is nothing to its right. Now, we apply quickSort to the left sublist. Note that there won't be any calls for a right sublist here.

leftIndex=0, rightIndex=2, pivot=10
     --------------     --------------     --------------
    |11 | 10  | 9  |==>| 9 | 10  |11  |==>| 9 | 10  |11  |
     --------------     --------------     --------------
      0    1    2        0    1    2        0    1    2   
    i=0^     j=2^     i=0^      j=2^     j=0^         i=2^

Note that 10, the pivot value, is in the correct spot, the element to its left is smaller and the element to its right is larger. The sublist is sorted, we return from the recursive call.

leftIndex=0, rightIndex=3, pivot=12
      ------------------- 
     | 9 | 10  |11  | 12 |         is returned!
      ------------------- 
       0    1    2    3   

Analysis of Quick Sort

quickSort is an O(nlog2n) algorithm. Sort Applet Demo is design to show some sort algorithms working side by side. quickSort is compared against a couple of versions of bubble sort that is a O(n2) algorithm.

I recently downloaded this Sort Applet from the web, it is designed to show how a number of different sort algorithms work. Be aware that you need a big screen for this one!

Why is quickSort O(nlog2n)? It is easy to show mathematically for the best case where we can always break the list in half:
n + 2 . 1/2 . n + 4 . 1/4 . n + ... + 2log(n-1) . 1/2log(n-1) . n =
n + n + n + ... + n

How many ns are there? log2n. Thats the reason for O(nlog2n).

Empirically, it has been shown that quick sort performs very close to nlog2n even in average cases.

Why is nlog2n vs. n2 important? With n as 1000, quick sort requires about 10,000 cell visits whereas selection sort requires 1000,000 visits. As n gets larger this difference becomes more and more pronounced.


A List class

intList class maintains a list of integers. I have included some of the functionality that you get with the Vector Class, here is the definition of the class:


public class intList {
  public intList()
  public boolean empty()
  public int find(int val)
  public void append(int val) 
  public int addBefore(int pos, int val) 
  public int addAfter(int pos, int val) 
  public int remove(int pos) 
  public int valAt (int pos)
  public int last() 
  public String dump ()
}

Certainly, the definition does not give away the fact that we implemented it with a linked list, as you can see, the concept of position is used everywhere. The Node class used is basically the same as the one we used for intStack, so, we have a simple singlely-linked list.

What is the rationale for having a class defined that implies we are using an array, but, we actually implement with a linked-list? I'll give you four reasons:

  1. Providing functionality such finding elements based on their position regardless of data structures used is useful.
  2. You need more experience with linked-lists.
  3. It is important to see implementation that seem out of the ordinary, just so that you know there is never only one way to do something.
  4. Last, but not least, this example enables me to reiterates the impact data structures choices on the performance of algorthms.

To demonstrate the last point made, consider the following Selection Sort. The algorithm is the same as what was show last week, except that data is an intList object containing our values.

   for (int i=0; i<n-1; i++) {
       int min = i;
       for (int j=i+1; j<n; j++)
           if (data.valAt(j) < data.valAt(min)) {
              min = j;
           }
       tmp = data.valAt(i);
       //Note that there is no "replace" method shown above in intList,
       //but, you were asked to add it to this class as part of lab4.
       data.replace(i,data.valAt(min));
       data.replace(min,tmp);
   }
}

Is this an O(n2) algorithm? No, everytime, we say data.valAt(j), the code associated with valAt actually visits every cell leading to the jth node? without getting into the mathematical details of this one, we are looking at an O(n3) algorithm. Thats not that different is it, well, with 1000 elements, instead of 1000,000 cell visits, we will have 1000,000,000 cell visits, it gets ugly more quickly!

Details of the implementation

Let us consider the code for a few of the methods written for intList:

Lets first look at valAt:

  public int valAt (int pos) {
    if (head == null) return -1;
    Node temp;
    int last;
    for (temp=head,last=0;temp != null;temp=temp.next,last++)
        if (last == pos)
           return temp.element;
    return -1;
  }

To understand the code here, we need to make sure the for statement is understood. In parenthesis, there are three parts separated by ;.

The first part performs initializations. temp=head sets a temporary variable (temp) reference the first node in the list. last=0 starts a counter variable (last) as zero which keeps track of the position of the node to be visited.

the second part, temp!=null gives the condition that stops the loop, so, the loop stops only when there are no more nodes to consider.

The third part gives us what needs to change after each cycle of the loop. temp=temp.next moves our temp pointer to the next node in the list. last++ increments the counter.

Inside the loop, we simply check to see if we have hit the correct position or not, if we have, the element in the node referenced by temp is returned. The return is occurring inside the loop which acts as a loop exit, no other instructions in this method are performed once this return is executed.

Note that return -1; is NOT in the for loop, and only occurs if the loop ends due to temp being null. Returning -1 implies an unsuccessful call to valAt. This means that the list must not have a -1 as an element or we are in trouble, keep in mind, this is just an example.

Lets look at addAfter:

  public int addAfter(int pos, int val) {
    if (head==null) return -1;
    Node temp;
    int last;
    for (temp=head,last=0;temp != null;temp=temp.next,last++)
      if (last == pos) {
        temp.next = new Node (temp.next,val);
        return 0;
      }
    return -1;
  }

Notice how similar the approach is in addAfter to that of valAt. The loop keeps going until the node in a particular position is found. However, our intent is different in that we wish to add a new node to the list, thus, once the node in the correct position is located, temp.next = new Node (temp.next,val); is executed. This one statement does a lot of work, it creates a new node, puts the new element in it with its next pointer as temp's next pointer. The assignment statement causes the next pointer of temp to now point this new node. Lets show what happens with an example:

suppose we make the following call: addAfter(2,12), which means we wish to add a new node holding 12 after the node in the position 2. Lets assume the following list before the addAfter is performed:

head --> 9 --> 10 --> 11 --> 13 --> 14

The for statement loops until temp is pointing to the node containing 11. temp.next = new Node (temp.next,val); performs the following:

head --> 9 --> 10 --> 11--         -->13 --> 14
                          |       |
                           -->12--