SI221 (Fall AY2021)

Analysis of Algorithms I

Intro

In this lesson and the next we'll examine how the efficiency of an algorithm is analyzed, and how different algorithms (solving the same problem!) are compared in terms of efficiency. We'll try to develop analytic tools for these things, but we'd like to keep it grounded by looking at empirical data as well.

The examples below use Stop_Watch.h (written by Prof Chris Brown), which defines a C++ class that makes timing code easy. What Stop_Watchgives you is the time spent executing your code, which means that your timing results shouldn't depend (too much) on how many other people are using the same computer as you or what they're doing. This stop watch has a granularity to it as well, which you need to be conscious of. Imagine a digital stop watch with a minutes and seconds read-out. If you tried to time something that took less than a second, it would read that 0 time had passed. This doesn't mean that the event literally took 0 time, but that the stop watch couldn't measure events to an accuracy under 1 second. Such a stop watch would have a granularity of 1 second. The Stop_Watch class's granularity is machine dependent, but it's probably not capable of measuring events under 1 millisecond ... probably closer to 10 milliseconds, in fact. In this case, you have to run the code you want to time over and over again - say 1000 times - and time the duration of the 1000-run trial. The time per run would be the total divided by 1000.

What do we mean by an algorithm's "efficiency"

In order to discuss an algorithm's "efficiency", let's develop an algorithm that computes the function f(x,k) defined as follows:

            x⁸      x⁸             x⁸       x⁸
f(x,k)  =  ---  +  ---  + ... +  -----  +  ---
            1!      2!           (k-1)!     k!

As with most programming challenges, there are several solutions! But which one is best? We're going to be interested in comparing them based on speed - essentially asking which algorithm is fastest. This sounds clear, but isn't really as clear as you think!

Take two algorithms for this problem: Alg1 and Alg2. Experimentally, we determine that to compute f(3.1,8) takes 1.32 microseconds with Alg1 and 0.23 microseconds with Alg2. On the other hand, computing f(0.75,12) takes Alg1 1.81 microseconds and takes Alg2 8.02 microseconds. Which is better? Pretty hard to say, isn't it? When we talk about running time of algorithms, which input are we talking about? Different input takes different amounts of time...there must be some way to compare these algorithms!

Let's look at one specific algorithm and try to run some experiments with it. The function f_alg0 below implements a pretty straightforward algorithm for computing f(x,k):

ex0.cpp

double f_alg0(double x, int k) {
  double S = 0.0;
  for(int n = 1; n <= k; n++) {
    // Compute x^8
    double x8 = 1.0;
    for(int i = 0; i < 8; i++)
      x8 = x8*x;

    // Compute n!
    double f = 1.0;
    for(int i = 1; i <= n; i++)
      f = f*i;

    // Add next term
    S += x8/f;
  }
  return S;
}

If you run some experiments on this, you'll find that the time required does not depend on x at all, only on k. (this program allows you to run some time trials). This tells us important thing number 1: Any single arithmetic operation on the computer takes the same amount of time regardless of the specific values of the operands. In other words, it takes just as long to compute 13.4*5.2 as 30003.8798798*0.00003782. What matters in this case is k, because k affects how many times we go through our main loop - the bigger k is, the longer the computation takes. Thus, no one number gives us "the running time" of f_alg0. Instead, "the running time" is a function of k - let's call it T₀(k). Below you see a plot of this function for values of k from 1 to 20.

T0(k) in (microseconds) — T₀(k) in (microseconds)

So, important thing number 2: "Running time" is a function, not a number - it's a function of the input. Typically, though not always, as the input gets "bigger", the computation requires more time. When we talk about the efficiency of an algorithm, we're talking about this function!

How do we compare running time functions?

When we look at a different algorithm for solving the same problem, we want to compare its running time function to that of the algorithm we already have. Here are implementations of four more algorithms for solving our problem, along with a plot of the running time functions for all 5 algorithms:

Alg1 :

We notice that x^8 is getting recomputed needlessly every time through the outer loop, so we compute it once, before the loop.

double f_alg1(double x, int k) {
  // Compute x^8
  double x8 = 1.0;
  for(int i = 0; i < 8; i++)
    x8 = x8*x;

  double S = 0.0;
  for(int n = 1; n <= k; n++) {
    // Compute n!
    double f = 1.0;
    for(int i = 1; i <= n; i++)
      f = f*i;

    // Add next term
    S += x8/f;
  }
  return S;
}

Alg2:

We notice that each time through the loop the new value of the variable f is the old value of f times n, so we simply remember the old value and reuse it rather than recompute it.

double f_alg2(double x, int k) {
  double S = 0.0, f = 1.0;
  for(int n = 1; n <= k; n++) {
    // Compute x^8
    double x8 = 1.0;
    for(int i = 0; i < 8; i++)
      x8 = x8*x;

    // Compute n!
    f = f*n;

    // Add next term
    S += x8/f;
  }
  return S;
}

Alg3:

We compute x^8 before the main loop, compute it with only three multiplications (rather than an 8-iteration) loop, and we factor the x^8 out of the sum.

double f_alg3(double x, int k) {
  // Compute x^8
  double x2 = x*x;
  double x4 = x2*x2;
  double x8 = x4*x4;

  double S = 0.0;
  for(int n = 1; n <= k; n++) {
    // Compute n!
    double f = 1.0;
    for(int i = 1; i <= n; i++)
      f = f*i;

    // Add next term
    S += 1/f;
  }
  return x8*S;
}

Alg4:

We use all the best features of the previous three!.

double f_alg4(double x, int k) {
  // Compute x^8
  double x2 = x*x;
  double x4 = x2*x2;
  double x8 = x4*x4;

  double S = 0.0, f = 1.0;
  for(int n = 1; n <= k; n++) {
    // Compute n!
    f = f*n;

    // Add next term
    S += 1/f;
  }
  return x8*S;
}

And here's what they all look like when ran using the same input values and plotted against one another:

So, which algorithm is best? Hopefully you'd say Alg4. But how much better? How would you rank them? The key thing to notice, is that Alg0, Alg1 and Alg3 all curve up, whereas Alg2 and Alg4 are straight lines. To really see how important this distinction is, let's plot the computing times (in milliseconds now) over a much wider range of k values - from 100 up to 1000.

What we see is that there are two classes of algorithms: those whose computing time is a straight line, and those whose computing time curves up (actually, they're parabolic), and there's simply no comparison between the two families. Even though Alg3 was faster than Alg2 for small values of k, the fact that Alg2's running time was only growing linearly while Alg3's was curving up, growing fast and faster, meant that Alg3 was doomed. Alg2 was destined to catch up to it and surpass it, and by the time you look at values of k in the thousands ... well just forget about it!

What we're really interested in when we compare algorithms is the shape of the running time function's curve. That tells us how fast the running time grows as the input gets bigger and bigger. The difference between Alg0 and Alg3, for example, is irrelevant when you compare them to Alg2. Alg0 and Alg3's running times differ by a constant multiple - roughly T₀(k) = 3/2 T₃(k) would be my guess from the earlier graph. That's nice, but they both look equally slow on the last graph when we compare them to Alg2. Similarly, Alg4's running time differs from Alg2 by a constant multiple - roughly T₂(k) = 4 T₄(k) would be my guess from the earlier graph. That's nice, but that difference is irrelevent when you compare those two to all the other algorithms. Constant multiple differences in running time don't change the shape of the running time function graphs, they don't change the growth rate of the running time functions. So, important thing number 3: what we really care about is the growth rate of an algorithm's run-time function. This is justified by the above discussion comparing several algorithms solving our f(x,k) problem. It's got other justifications as well. The actual time in seconds for an implementation of an algorithm depends on the skill of the programmer, depends on whether or not you're using compiler optimization, depends on the language used, depends on the architecture of the computer ... depends on all sorts of things. The growth rate is independent of these things. All these things change the running time function by a constant multiple, they don't change the growth rate, they don't change the fundamental shape of the curve.

Hierarchy of growth rates

The "fast" algorithms from the previous sections grew like lines, which is to say they grew like n. We say that a function that grows like n is "Θ(n)", (pronounced "theta of n") so we'd write that "T₂(k) = Θ(n)" and "T₄(k) = Θ(n)". On the other hand, the other algorithms are "Θ(n²)", which is to say they grow like n². Clearly, n² grows faster than n, so we prefer the Θ(n). algorithm. There's a whole hierarchy of growth rates, including:

n!  <------ Grows fast!
2^n
n^3
n^2
n log(n)
n
sqrt(n)
log(n) <---- Grows slow!
1  <---- Constant, doesn't grow!

Ultimately what we'd like to do is look at an algorithm and place it on the growth rate hierarchy without even having to implement it! That way we can compare all the different algorithms we dream up for a problem, pick the best, and only bother to sit down and program the best.

When two inputs of the same "size" take different amounts of time

In the example we've been considering, we used k to measure the "size" of the input, and for fixed k all of our algorithms took the same amount of time regardless of which particular input of size k we had - i.e. regardless of x. That doesn't always happen. For example, let's consider the problem of finding the index i such that the element A[i] in a vector of integers has value x. In other words: the input consists of A and x, and the output is i (we'll assume that you know that x is in A, you simply don't know where). Clearly, the size of the vector, which we'll call N_A, is the "size" of our input. If we have an algorithm like this:

int find(vector<int> &A, int x) {
  int i = 0;
  while(i < A.size() && A[i] != x)
    i++;
  return i;
}

We'd like the running time as a function of N_A. Well, now for a given input size N_A there's a whole range of possible running times from almost no time at all (if x is sitting in the first cell of A) to however long it takes to look through the whole vector. Here's the plot of all the possible running times as N_A varies from 10 to 100 in increments of 5, and over all possible x values (remember we're assuming x is in the vector somewhere).

So now, the running time is no longer a function of the input size - because there is not a unique time value for each value of N_A. However, if you take the maximum time for each N_A, you get a function T_worst(N_A), the worst case running time of the algorithm. If you take the minimum time for each N_A, you get a function T_best(N_A), the best case running time of the algorithm. And if you take the average time for each N_A, you get a function T_ave(N_A), the average case running time of the algorithm. Here, T_worst(N_A) = Θ(N_A), and T_best(N_A) = Θ(1), i.e. it grows like the constant 1, which is to say it doesn't grow at all, and T_ave(N_A) = Θ(N_A). Typically, we'll talk about the best, worst and average case running times of algorithms as functions of their input size.