CSC 241- Set #11


Algorithm Analysis

Now its time to consider performance issues in a more analytical way. Thinking about how we can improve the performance of systems has always been interesting to me. Analysis of algorithms is the formal study of what is at the core of systems performance, the algorithms that perform the tasks that we want. Algorithm complexity is typically discussed in computer science using the Asymptotic time complexity analysis (Big-O notation). Here we are interested in the Order of magnitude for the complexity of an algorithm. For example, if you have a set of values in an array, how many cells do you have to visit in order to find the element that you are interested in? As it turns our choice of data structures is also very critical when considering performance. Algorithms are often driven from the way we decide to maintain our data.

When we discussed circular queues and used a circular queue concept in implementing our fixedQueue, I discussed how it saved us from having to move all elements back when we dequeued.

We had a singlely-linked list for the dynamic version of the queue with only one reference to the front of the list. You should have noticed that for each enqueue, we started at the beginning and had to find the end of the queue to add the new element. We would have had a more efficient dynamicQueue implementation with a doublely-linked list that was made circular; this way, the last node in the queue would reference the 1st and the 1st node references back to the last. During an enqueue with such an implementation, since we start with the node in front, but it knows where the rear of the list is, we don't have to traverse the list looking for the back. Of course, the implemenation is a bit more complicated this way, but significantly more efficient.

Now, where does the order of magnitude come into play? enqueue and dequeue in our fixedQueue and the dequeue in our dynamic version are considered to be O(1). These methods have no loops and execute the instructions in them just once.

The enqueue method in dynamicQueue is considered to be O(n) algorithms. When the complexity is O(n), the more elements we have, the longer it will take to perform the task. In our enqueue, we iterate through the same set of instruction multiple times depending on the number of elements already in the queue.

Big-O Notation

Understanding why the above algorithms are O(1) or O(n) is not too hard. what about cases where we need to visit a list of elements twice, or where we need to visit one half of a list. How about cases where we need to do a few preliminary operations and then iterate through a list? Simple, Constant multipliers or additions don't get counted when determining complexity. We are interested in magnitude.

A search algorithm that needs to look for something sequentially in a list with n elements has a complexity of O(n). If we are dealing with an ordered list, on average, it will need to check half of the elements in the list before it decides if the element is in the list or not. However, its performance is still considered to be O(n).

Suppose we need to read a whole list of elements in, sort the list, and then write the newly ordered elements back to the same file. If using a sort algorithm like the selection sort, as we will discuss later, we will have an O(n2) behavior. So, O(n) for reading the data in, O(n2) for sorting the data and O(n) for writing the data back. What is the order of magnitude in cases such as this where there are several subtasks involved? It is O(n2), the subtask with the highest order of magnitude determines the complexity of the the whole. In this case, it makes NO sense to say something like this program's complexity is O(n2)+2O(n).

Time vs. Space complexity

We need to discuss the space allocation vs. timing requirements of algorithms, when we analyze algorithms, we need to be concerned about both.

Suppose we have a list of employee records that we wish to keep in memory. Let us assume that the records are kept in the order of the SS# and we wish to locate them based it.

We can obviously keep the employee records in an array of a reasonable size based on the number of employees we have. One way to search for an specific employee is to search sequentially. As mentioned before, since the list is kept in order, on average, we need to visit half of the list to locate the employee (O(n) complexity). If we have 1000 employees, we need to visit 500 cells on the average.

An alternative and superior approach is to utilize a binary search technique. Binary search performance is O(log2n), more about that later. So, for 1000 employees, we can find a particular SS# with 10 iterations.

There is yet another alternative that is even faster. Imagine, instead of putting aside say 1000 spaces in an array for our employees, we put aside 1,000,000,000 spaces. Now, each SS# is in its own little cell, so, if the SS# that we want is 546324532, we can go directly to that cell and process the employee. What is our performance? Well, as far as, timing is concerned it is O(1) which is great. But, we had to allocate 1000 times more space than we needed, our space utilization is O(n2)! Even if we only needed 1000 bytes of information kept for each employee, we would need to put aside a terabyte of memory for the employee array! Have you ever worked on a computer with 1000 gigabyte of RAM, I have never seen one!

The point here is simple, speed is not the only issue at hand when considering complexity. How much space is needed does affect your choice of algorithms.

Binary Search

As mentioned before Binary Search assumes an ordered list. The algorithm considers the middle of the list first. If the data being searched for is in the middle, obviously, you are done searching. If not in the middle, the search value is compared to see if it is smaller or greater than what is in the middle. Suppose it is less, the whole right half won't need to be considered any more. You treat the left half the same way as you treated the whole list, go to its middle and check the same way, continue considering smaller and smaller sections of the list until we find the element or decide that it is not in the list.

Let us now look carefully at why binary search is O(log2n). Here is the algorithm for Binary search, note that the method returns true if the search value is found, otherwise, it returns false. We assumes that an ordered list l along with a variable that keeps track of the number of cells occupied size are available to us in this method.


boolean find (int searchVal) {
   int first = 0;
   int last  = size - 1;
   while (first <= last) {
       int middle = (last + first) / 2;
       if (l[middle] == searchVal)
          return true;
       if (l[middle] < searchVal)
          first = middle+1;
       else
          last = middle-1;
   }
   return false;
}

Consider the following list:


 size=10
 ------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
 ------------------------------------------------
  0    1    2    3    4    5    6   7    8    9

Lets trace the algorithm for searching for 60:

first=0                                    last=9
 ------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
 ------------------------------------------------
  0    1    2    3    4    5    6   7    8    9
             middle=4 ^

                        first=5             last=9
 ------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
 ------------------------------------------------
  0    1    2    3    4    5    6   7    8    9
                           middle=7 ^              return true;

Lets trace the algorithm for searching for 18:

first=0                                    last=9
 ------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
 ------------------------------------------------
  0    1    2    3    4    5    6   7    8    9
             middle=4 ^

 first=0      last=3
 ------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
 ------------------------------------------------
  0    1    2    3    4    5    6   7    8    9
       ^ middle=1              

        first=2 last=3
 ------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
 ------------------------------------------------
  0    1    2    3    4    5    6   7    8    9
   middle=2 ^             
  
   last=1 first=2
 ------------------------------------------------
|10 | 17 | 20 | 30 | 35 | 40 | 43 | 60 | 65 | 90 |
 ------------------------------------------------
  0    1    2    3    4    5    6   7    8    9
                                                    return false;

In the first example, only 2 cells had to be examined and in the second case only 3 cells had to be examined. Binary search locates an element with log2n or fewer cells visited, 1000 elements would require about 10 cells examined, try it out. Each time we iterate, we dismiss half of the list or the sublist being considered, this is a logarithmic behavior that we see time and again in algorithm analysis and design.

Selection Sort

Let us consider Selection Sort again (we looked at it a few weeks ago as an example for manipulating intList objects). The basic approach in this sort technique is to find the cell with the smallest value and interchange the content of that cell with what is in the first cell. Then, you locate the second smallest and interchange it with what is the second cell. This process continues until all elements are placed where they belong.

Here is a code segment that implements selection sort. data is our array of elements and n tells us how many elements there are in the list.

   for (int i=0; i<n-1; i++) {
       int min = i;
       for (int j=i+1; j<n; j++)
           if (data[j] < data[min]) {
              min = j;
           }
       tmp = data[i];
       data[i] = data[min];
       data[min] = tmp;
   }

Consider this list and follow the trace shown when the sort algorithm is applied:

      -------------------     -------------------     -------------------
     |10 | 12  | 9  | 11 |==>|10 | 12  | 9  | 11 |==>|10 | 12  | 9  | 11 |==>
      -------------------     -------------------     -------------------
       0    1    2    3        0    1    2    3        0    1    2    3 
min=i=0^ j=1^           min=i=0^      j=2^          i=0^  min=j=2^  

      -------------------     -------------------     -------------------
     |10 | 12  | 9  | 11 |==>| 9 | 12  |10  | 11 |==>| 9 | 12  |10  | 11 |==>
      -------------------     -------------------     -------------------
       0    1    2    3        0    1    2    3        0    1    2    3 
    i=0^    min=2^ j=3^     i=0^                     min=i=1^ j=2^  

     -------------------     -------------------     -------------------
    | 9 | 12  |10  | 11 |==>| 9 | 12  |10  | 11 |==>| 9 | 10  |12  | 11 |==>
     -------------------     -------------------     -------------------
      0    1    2    3        0    1    2    3        0    1    2    3 
      i=1^ min=j=2^            i=1^ min=2^ j=3^         i=1^

     -------------------     -------------------     -------------------
    | 9 | 10  |12  | 11 |==>| 9 | 10 | 12  | 11 |==>| 9 | 10 | 11  | 12 |
     -------------------     -------------------     -------------------
      0    1    2    3        0    1   2     3        0    1   2     3
         min=i=2^ j=3^             i=2^min=j=3^           D O N E

How many cells were visited? all 4 to find the smallest, the 3 remaining for next number, 2 cells for the 3rd number. 9 visits were required.

In general, we need to think:

   n + (n-1) + (n-2) + ... + 2 =
   n (n+1) / 2 - 1 =
   1/2n2 - 1/2n - 1

This is considered to be O(n2), think magnitude!