CS 10: Winter 2016

Lecture 10, January 25

Code discussed in lecture

Analysis of algorithms

You have probably seen big-Oh notation before, but it's certainly worthwhile to recap it. In addition, we'll see a couple of other related asymptotic notations. Chapter 4 of the textbook covers this material as well. I strongly suggest reading it.

Order of growth

Remember back to linear search and binary search. Both are algorithms to search for a value in an array with n elements. Linear search marches through the array, from index 0 through the highest index, until either the value is found in the array or we run off the end of the array. Binary search, which requires the array to be sorted, repeatedly discards half of the remaining array from consideration, considering subarrays of size n, n/2, n/4, n/8, …, 1, 0 until either the value is found in the array or the size of the remaining subarray under consideration is 0.

The worst case for linear search arises when the value being searched for is not present in the array. The algorithm examines all n positions in the array. If each test takes a constant amount of time—that is, the time per test is a constant, independent of n—then linear search takes time c₁n + c₂, for some constants c₁ and c₂. The additive term c₂ reflects the work done before and after the main loop of linear search. Binary search, on the other hand, takes c₃ log₂ n + c₄ time in the worst case, for some constants c₃ and c₄. (Recall that when we repeatedly halve the size of the remaining array, after at most log₂ n + 1 halvings, we've gotten the size down to 1.) Base-2 logarithms arise so frequently in computer science that we have a notation for them: lg n = log₂ n.

Where linear search has a linear term, binary search has a logarithmic term. Recall that lg n grows much more slowly than n; for example, when n = 1,000,000,000 (a billion), lg n is approximately 30.

If we consider only the leading terms and ignore the coefficients for running times, we can say that in the worst case, linear search's running time "grows like" n and binary search's running time "grows like" lg n. This notion of "grows like" is the essence of the running time. Computer scientists use it so frequently that we have a special notation for it: "big-Oh" notation, which we write as "O-notation."

For example, the running time of linear search is always at most some linear function of the input size n. Ignoring the coefficients and low-order terms, we write that the running time of linear search is O(n). You can read the O-notation as "order." In other words, O(n) is read as "order n." You'll also hear it spoken as "big-Oh of n" or even just "Oh of n."

Similarly, the running time of binary search is always at most some logarithmic function of the input size n. Again ignoring the coefficients and low-order terms, we write that the running time of binary search is O(lg n), which we would say as "order log n," "big-Oh of log n," or "Oh of log n."

In fact, within our O-notation, if the base of a logarithm is a constant (such as 2), then it doesn't really matter. That's because of the formula

log_a n = log_b n / log_b a

for all positive real numbers a, b, and c. In other words, if we compare log_a n and log_b n, they differ by a factor of log_b a, and this factor is a constant if a and b are constants. Therefore, even when we use the "lg" notation within O-notation, it's irrelevant that we're really using base-2 logarithms.

O-notation is used for what we call "asymptotic upper bounds." By "asymptotic" we mean "as the argument (n) gets large." By "upper bound" we mean that O-notation gives us a bound from above on how high the rate of growth is.

Here's the technical definition of O-notation, which will underscore both the "asymptotic" and "upper-bound" notions:

A running time is O(n) if there exist positive constants n₀ and c such that for all problem sizes n ≥ n₀, the running time for a problem of size n is at most cn.

Here's a helpful picture:

The "asymptotic" part comes from our requirement that we care only about what happens at or to the right of n₀, i.e., when n is large. The "upper bound" part comes from the running time being at most cn. The running time can be less than cn, and it can even be a lot less. What we require is that there exists some constant c such that for sufficiently large n, the running time is bounded from above by cn.

For an arbitrary function f(n), which is not necessarily linear, we extend our technical definition:

A running time is O(f(n)) if there exist positive constants n₀ and c such that for all problem sizes n ≥ n₀, the running time for a problem of size n is at most c f(n).

A picture:

Now we require that there exist some constant c such that for sufficiently large n, the running time is bounded from above by c f(n)

Actually, O-notation applies to functions, not just to running times. But since our running times will be expressed as functions of the input size n, we can express running times using O-notation.

In general, we want as slow a rate of growth as possible, since if the running time grows slowly, that means that the algorithm is relatively fast for larger problem sizes.

We usually focus on the worst case running time, for several reasons:

The worst case time gives us an upper bound on the time required for any input.
It gives a guarantee that the algorithm never takes any longer.
We don't need to make an educated guess and hope that the running time never gets much worse.

You might think that it would make sense to focus on the "average case" rather than the worst case, which is exceptional. And sometimes we do focus on the average case. But often it makes little sense. First, you have to determine just what is the average case for the problem at hand. Suppose we're searching. In some situations, you find what you're looking for early. For example, a video database will put the titles most often viewed where a search will find them quickly. In some situations, you find what you're looking for on average halfway through all the data…for example, a linear search with all search values equally likely. In some situations, you usually don't find what you're looking for.

It is also often true that the average case is about as bad as the worst case. Because the worst case is usually easier to identify than the average case, we focus on the worst case.

Computer scientists use notations analogous to O-notation for "asymptotic lower bounds" (i.e., the running time grows at least this fast) and "asymptotically tight bounds" (i.e., the running time is within a constant factor of some function). We use Ω-notation (that's the Greek leter "omega") to say that the function grows "at least this fast". It is almost the same as Big-Oh notation, except that is has an "at least" instead of an "at most":

A running time is Ω(f(n)) if there exist positive constants n₀ and c such that for all problem sizes n ≥ n₀, the running time for a problem of size n is at least c f(n).

We use Θ-notation (that's the Greek letter "theta") for asymptotically tight bounds:

A running time is Θ(f(n)) if there exist positive constants n₀, c₁, and c₂ such that for all problem sizes n ≥ n₀, the running time for a problem of size n is at least c₁ f(n) and at most c₂ f(n).

Pictorially,

In other words, with Θ-notation, for sufficiently large problem sizes, we have nailed the running time to within a constant factor. As with O-notation, we can ignore low-order terms and constant coefficients in Θ-notation.

Note that Θ-notation subsumes O-notation in that

If a running time is Θ(f(n)), then it is also O(f(n)).

The converse (O(f(n)) implies Θ(f(n))) does not necessarily hold.

The general term that we use for O-notation, Θ-notation, and Ω-notation is asymptotic notation.

Recap

Asymptotic notations provide ways to characterize the rate of growth of a function f(n). For our purposes, the function f(n) describes the running time of an algorithm, but it really could be any old function. Asymptotic notation describes what happens as n gets large; we don't care about small values of n. We use O-notation to bound the rate of growth from above to within a constant factor, and we use Θ-notation to bound the rate of growth to within constant factors from both above and below. (We won't use Ω-notation much in this course.)

We need to understand when we can apply each asymptotic notation. For example, in the worst case, linear search runs in time proportional to the input size n; we can say that linear search's worst-case running time is Θ(n). It would also be correct, but slightly less precise, to say that linear search's worst-case running time is O(n). Because in the best case, linear search finds what it's looking for in the first array position it checks, we cannot say that linear search's running time is Θ(n) in all cases. But we can say that linear search's running time is O(n) in all cases, since it never takes longer than some constant times the input size n.

Working with asymptotic notation

Although the definitions of O-notation an Θ-notation may seem a bit daunting, these notations actually make our lives easier in practice. There are two ways in which they simplify our lives.

I won't go through the math that follows in class. You may read it, in the context of the formal definitions of O-notation and Θ-notation, if you wish. For now, the main thing is to get comfortable with the ways that asymptotic notation makes working with a function's rate of growth easier.

Constant factors don't matter

Constant multiplicative factors are "absorbed" by the multiplicative constants in O-notation (c) and Θ-notation (c₁ and c₂). For example, the function 1000 n² is Θ(n²) since we can choose both c₁ and c₂ to be 1000.

Although we may care about constant multiplicative factors in practice, we focus on the rate of growth when we analyze algorithms, and the constant factors don't matter. Asymptotic notation is a great way to suppress constant factors.

Low-order terms don't matter, either

When we add or subtract low-order terms, they disappear when using asymptotic notation. For example, consider the function n² + 1000 n. I claim that this function is Θ(n²). Clearly, if I choose c₁ = 1, then I have n² + 1000 n ≥ c₁ n², and so this side of the inequality is taken care of.

The other side is a bit tougher. I need to find a constant c₂ such that for sufficiently large n, I'll get that n² + 1000 n ≤ c₂ n². Subtracting n² from both sides gives 1000 n ≤ c₂ n² − n² = (c₂ − 1) n². Dividing both sides by (c₂ − 1) n gives $\displaystyle \frac{1000}{c_2 - 1} \leq n$. Now I have some flexibility, which I'll use as follows. I pick c₂ = 2, so that the inequality becomes $\displaystyle \frac{1000}{2-1} \leq n$, or 1000 ≤ n. Now I'm in good shape, because I have shown that if I choose n₀ = 1000 and c₂ = 2, then for all n ≥ n₀, I have 1000 ≤ n, which we saw is equivalent to n² + 1000 n ≤ c₂ n².

The point of this example is to show that adding or subtracting low-order terms just changes the n₀ that we use. In our practical use of asymptotic notation, we can just drop low-order terms.

Combining them

In combination, constant factors and low-order terms don't matter. If we see a function like 1000 n² − 200 n, we can ignore the low-order term 200 n and the constant factor 1000, and therefore we can say that 1000 n² − 200 n is Θ(n²).

When to use O-notation vs. Θ-notation

As we have seen, we use O-notation for asymptotic upper bounds and Θ-notation for asymptotically tight bounds. Θ-notation is more precise than O-notation. Therefore, we prefer to use Θ-notation whenever it's appropriate to do so.

We shall see times, however, in which we cannot say that a running time is tight to within a constant factor both above and below. Sometimes, we can bound a running time only from above. In other words, we might only be able to say that the running time is no worse than a certain function of n, but it might be better. In such cases, we'll have to use O-notation, which is perfect for such situations.

Predicting running times

Big Θ-notation can help predict running times for big data sets based on the run times for small data sets. Suppose you have a program that runs the insertion sort algorithm, whose worst and average case run times are Θ(n²), where n is the number of items to sort. Someone has given you a list of 10,000 randomly ordered things to sort. Your program takes 0.1 seconds. Then you are given a list of 1,000,000 randomly ordered things to sort. How long do you expect your program to take? Should you wait? Get coffee? Go to dinner? Go to bed? How can you decide?

The expected run time for insertion sort on random data is Θ(n²). The means that the run time approaches cn² asymptotically, or is at least upper-bounded by such an expression. 10,000 is fairly big - the lower-order terms shouldn't have much effect. Therefore we approximate the actual function for the run time by f(n) = cn².

There are two ways we can solve this. The more straighforward way (and the way that will work for run times like Θ(n log n)) is to solve for c:

f(10,000) = c 10,000² = 0.1
c = 0.1/(10⁴)² = 10^-9

Therefore we can approximate f(n) as:

f(n) = 10^-9n²

Plugging in n = 1,000,000 we get:

f(1,000,000) = 10^-9 (10⁶)² = 1000

We therefore estimate 1000 seconds or a 16.666 minutes. Time enough to get a cup of coffee, but not to go to dinner.

We can use a shortcut for polynomials. If the run time is approximately cn^k for some k, then what happens to the run time if we change the input size by a factor of p? That is, the old input size is n and the new input size is pn. Thus if

f(n) = cn^k

then

f(pn) = c(pn)^k = p^kcn^k = p^kf(n)

Therefore changing the input size by a factor of p changes the run time by a factor of p^k. In the problem I gave you, p = 1,000,000/10,000 = 100 and k = 2, so the run time changes by a factor of 10,000. So the time is 10,000 times 0.1, or 1000 seconds.

Suppose that you were running mergesort instead of insertion sort in the example above. Merge sort's run time is Θ(n log n). We approximate the run time as f(n) = cn log n. Note that log₂10,000 = 13.29. Solving for c gives:

f(10,000) = c 10,000 log 10,000 = c 10,000 (13.29) = 0.1
c = 0.1/132,900 = 7.52 x 10^-7
f(n) = (7.53 x 10^-7) n log n

Plugging in n = 1,000,000 and using the fact that log 1,000,000 is a bit less than 20 gives:

f(1,000,000) = (7.53 x 10^-7) 1,000,000 log 1,000,000 = .753 x 20 = 15.05

So the run time in this case is 15 seconds. Might as well wait for it to finish.

Stacks

A stack is a LIFO (Last In, First Out) data structure. The book compares a stack to a spring-loaded stack of cafeteria trays or a PEZ dispenser. (Though I have to question how well the authors remember PEZ. They are candies, but definitely not mint.) The abstract data type (ADT) Stack has at least the following operations (in addition to a constructor to create an empty stack):

push: add to the top of the stack
pop: remove the top element from the stack and return it
peek: return the top element but do not remove it
isEmpty: return a boolean indicating whether the stack is empty

So what good is a stack? It has many, many applications. You already know that the run-time stack handles allocating memory in method calls and freeing memory on returns. (It also allows recursion). A stack provides an easy way to reverse a list or string: push each element on the stack then pop them all off. They come off in the opposite order that they went on. They are good for matching parentheses or braces or square brackets or open and close tags in HTML.

A stack is also how HP calculators handle reverse Polish notation. In this notation the operator follows both operands, and no parentheses are needed. So the expression (3 + 5) * (4 - 6) becomes 3 5 + 4 6 - *. To evaluate it, push operands onto the stack when you encounter them. When you reach an operator, pop the top two values from the stack and apply the operator to them. (The first popped becomes the second operand in the operation.) Push the result of the operation back on the stack. At the end there is a single value on the stack, and that is the value of the expression.

A stack is also how you can do depth-first search of a maze or a graph. Let's consider a maze. Start by pushing the start square onto the stack. The repeatedly do the following:

Pop the stack to get the next square to visit.
Push all unvisited adjacent squares onto the stack.

Quit when you reach the goal square.

Implementing a stack

Because Stack is an ADT, an interface should specify its operations. The CS10Stack interface in CS10Stack.java contains the operations given above.

The class java.util.Stack also has these operations, but instead of the name isEmpty they use empty. It also has an additional operation, size.

The book has its own version of the ADT, but instead of the name peek they use top, and they also add size. You would think that computer scientists could agree on a standard set of names. Yeah, not so much. At least we all agree on push and pop.

One question is how to handle pop or peek on an empty stack. Both Java and the book throw an exception. That seems a bit harsh, and so CS10Stack is more forgiving: it returns null.

How do we implement a stack? One simple option is to use an array. The implementation has two instance variables: an array called stack and an int called top that keeps track of the position of the top of the stack. In an empty stack top equals − 1. To push, add 1 to top and save the value pushed in stack[top]. To peek just return stack[top] (after checking that top is nonnegative). pop is peek but with top--.

This implementation is fast (all operations take O(1) time), and it is space efficient (except for the unused part of the array). The drawback is that the array can fill up, and when it does, you get an exception on push.

An alternative that avoids that problem uses a linked list. A singly linked list suffices. The top of the stack is the head of the list. The push operation adds to the front of the list, and the pop removes from the front of the list. All operations are take O(1) time in this implementation, also. You need to have space for the links in the linked list, but you never have empty space as you do in the array implementation.

Another way that avoids the problem of the array being full is to use an ArrayList. To push, you add to the end of the ArrayList, and to pop you remove the last element. The ArrayList can grow, so it never becomes full. The code for this implementation is in ArrayListStack.java. Note that you don't even need to keep track of the top. The ArrayList does it for you.

Do these operations all take O(1) time? It looks like it, as long as add and remove at the end of the ArrayList take O(1) time. The remove operation certainly takes O(1) time. The add usually does is, but sometimes can take longer.

To understand why an add operation can take more than constant time, we need to look at how an ArrayList is implemented. The underlying data structure is an array. There is also a variable to keep track of the size, from which we can easily get the last occupied position in the array. Adding to the end just increases the size by 1 and assigns the object added to the next position. However, what happens when the array is full? A new array is allocated and everything is copied to the new array. Doing so takes time Θ(n), where n is the number of elements in the ArrayList.

If we had to copy the entire ArrayList upon each add operation, the process would be very slow. It would in fact take time O(n²) to add n elements to the end of the ArrayList. That would be too slow. So instead, when the ArrayList is full, the new array allocated is not just one position bigger than the old one, but much bigger. One option is to double the size of the array. Then a lot of add operations can happen before the array needs to be copied again.

With this approach, n add operations will take O(n) time. In other words, the average time per operation is only O(1). We call this the amortized time. Amortization is what accountants do when saving up to buy an expensive item like a computer. Suppose that you want to buy a computer every 3 years and it costs 1500 dollars. One way to think about this is to have no cost the first two years and 1500 dollars the third year. An alternative is to set aside 500 dollars each year. In the third year you can take the accumulated 1500 dollars and spend it on the computer. So the computer costs 500 dollars a year, amortized. (In tax law it goes the other direction: you spend the 1500 dollars, but instead of being able to deduct the whole thing the first year you have to spead it over the 3 year life of the computer, and you deduct 500 dollars a year.)

For the ArrayList case, we can think in terms of tokens that can pay for copying something into the array. Suppose that we have just doubled the array size from n to 2n, which means that we have just copied n elements; let's call these n elements that were copied the "old elements." We spend all our tokens copying the old elements, so that immediately after copying, we have no tokens available. By the time the array has 2n elements and the array size doubles again, we must have one token for each of the 2n elements to pay for copying 2n elements.

Here's how we do it. We charge three tokens for each add:

One token pays for adding the element to the array.
One token stays with this element to pay for the first time it's copied.
One token goes to one of the n old elements.

Therefore, by the time the array has 2n elements, every element has a token, which pays for copying it.

By thinking about it in this way, we see that the cost per add operation is a constant (three tokens).

The array size does not have to double when the array fills. For example, it could increase by a factor of 3/2, in which case you can modify the argument to show that charging four tokens per add operation works.

Queues

A Queue is a FIFO (First In, First Out) data structure. "Queueing up" is a mostly British way of saying standing in line. And a Queue data structure mimics a line at a grocery store or bank where people join at the back, are served when they get to the front, and nobody is allowed to cut into the line. The ADT Queue has at at least the following operations (in addition to a constructor to create an empty Queue):

enqueue: add an element at the rear of the queue
dequeue: remove and return the first element in the queue
front: return the first element but don't remove it
isEmpty: return a boolean indicating whether the queue is empty

What do we use a Queue for? An obvious answer is that it is useful in simulations of lines at banks, toll booths, etc. But more important are the queues within computer systems for things like printers. When you submit a print job you are enqueued. When the print job gets to the front of the queue, it is dequeued and printed. Time-sharing systems use round-robin scheduling. The first job is dequeued and run for a fixed period of time or until it blocks (i.e., has to wait) for I/O or some other reason. Then it is enqueued. This process repeats as long as there are jobs in the queue. New jobs are enqueued. Jobs that finish leave the system instead of being enqueued. In this way, every job gets a fair share of the CPU. The book shows how to solve the Josephus problem using a queue. It is basically a round-robin scheduler, where every kth job is killed instead of being enqueued again.

A queue can also be used to search a maze. The same process is used as for the stack, but with a queue as the ADT. This leads to breadth-first search, and will find the shortest path through the maze.

Implementing a queue

An obvious way to implement a queue is to use a linked list. A singly linked list suffices, if it includes a tail pointer. Enqueue at the tail and dequeue from the head. All operations take Θ(1) time.

If you use a circular, doubly linked list with a sentinel, you can organize the list the opposite way: enqueue at the head and dequeue from the tail. If you were to try it this way for a singly linked list, you would keep having to run down the entire list to find the predecessor to the tail when dequeuing, and so this operation would take Θ(n) time.

The textbook presents a Queue interface and part of an implementation using a singly linked list. They also include a size method. The interface CS10Queue in CS10Queue.java has the methods given above, and LinkedListQueue.java is an implementation that uses a SentinelDLL. It could be changed to use an SLL by changing one declaration and one constructor call. All operations would still take Θ(1) time.

Java also has a Queue interface. It does not use the conventional names. Instead of enqueue it has add and offer. Instead of front it has element and peek. Instead of dequeue it has remove and poll. Why two choices for each? The first choice in each pair throws an exception if it fails. The second fails more gracefully, returning false if offer fails (because the queue is full) and null if peek or poll is called on an empty queue. At least isEmpty and size keep their conventional names.

Deques

A deque (pronounced "deck") is a double-ended queue. You can add or delete from either end. A minimal set of operations is

addFirst: add to the head
addLast: add to the tail
removeFirst: remove and return the first element
removeLast: remove and return the last element
isEmpty: return a boolean indicating whether the deque is empty

Additional operations include

getFirst: return the first element but do not remove it
getLast: return the last element but do not remove it
size: return the number of elements in the deque

A deque can be used as a stack, as a queue, or as something more complex. In fact, the Java documentation recommends using a Deque instead of the legacy Stack class when you need a stack. This is. because the Stack class, which extends the Vector class, includes non-stack operations (e.g. searching through the stack for an item). (Vector was replaced by ArrayList and is deprecated in recent Java releases.)

Implementing a deque

Implement a dequeue with a SentinelDLL. If you look at the methods you will see that all of these operations are already there except for the two remove operations and size(). The remove operations can be implemented by calling either getFirst or getLast and then calling remove. The size operation can be left out or can be implemented by adding a count instance variable to keep track of the number of items in the deque. Each of these operations requires Θ(1) time.

Once again, Java provides two versions of each of each deque operation. The two "add" operations have corresponding "offer" operations (offerFirst and offerLast). The two "remove" operations have corresponding "poll" operations, and the two "get" operations have corresponding "peek" operations. These alternate operations do not throw exceptions.