CS 10: Spring 2014

Lecture 11, April 16

Important organizational notes

There will no be no class this Friday (April 18). Also, office hours for Thursday, April 17 are canceled.

Code discussed in lecture

Short Assignment 8

Short Assignment 8 is due Friday.

Stacks

A stack is a LIFO (Last In, First Out) data structure. The book compares a stack to a spring-loaded stack of cafeteria trays or a PEZ dispenser. (Though I have to question how well the authors remember PEZ. They are candies, but definitely not mint.) The abstract data type (ADT) Stack has at least the following operations (in addition to a constructor to create an empty stack):

So what good is a stack? It has many, many applications. You already know that the run-time stack handles allocating memory in method calls and freeing memory on returns. (It also allows recursion). A stack provides an easy way to reverse a list or string: push each element on the stack then pop them all off. They come off in the opposite order that they went on. They are good for matching parentheses or braces or square brackets or open and close tags in HTML.

A stack is also how HP calculators handle reverse Polish notation. In this notation the operator follows both operands, and no parentheses are needed. So the expression (3 + 5) * (4 - 6) becomes 3 5 + 4 6 - *. To evaluate it, push operands onto the stack when you encounter them. When you reach an operator, pop the top two values from the stack and apply the operator to them. (The first popped becomes the second operand in the operation.) Push the result of the operation back on the stack. At the end there is a single value on the stack, and that is the value of the expression.

A stack is also how you can do depth-first search of a maze or a graph. Let's consider a maze. Start by pushing the start square onto the stack. The repeatedly do the following:

  1. Pop the stack to get the next square to visit.
  2. Push all unvisited adjacent squares onto the stack.

Quit when you reach the goal square.

Implementing a stack

Because Stack is an ADT, an interface should specify its operations. The CS10Stack interface in CS10Stack.java contains the operations given above.

The class java.util.Stack also has these operations, but instead of the name isEmpty they use empty. It also has an additional operation, size.

The book has its own version of the ADT, but instead of the name peek they use top, and they also add size. You would think that computer scientists could agree on a standard set of names. Yeah, not so much. At least we all agree on push and pop.

One question is how to handle pop or peek on an empty stack. Both Java and the book throw an exception. That seems a bit harsh, and so CS10Stack is more forgiving: it returns null.

How do we implement a stack? One simple option is to use an array. The implementation has two instance variables: an array called stack and an int called top that keeps track of the position of the top of the stack. In an empty stack top equals  − 1. To push, add 1 to top and save the value pushed in stack[top]. To peek just return stack[top] (after checking that top is nonnegative). pop is peek but with top--.

This implementation is fast (all operations take O(1) time), and it is space efficient (except for the unused part of the array). The drawback is that the array can fill up, and when it does, you get an exception on push.

An alternative that avoids that problem uses a linked list. A singly linked list suffices. The top of the stack is the head of the list. The push operation adds to the front of the list, and the pop removes from the front of the list. All operations are take O(1) time in this implementation, also. You need to have space for the links in the linked list, but you never have empty space as you do in the array implementation.

Another way that avoids the problem of the array being full is to use an ArrayList. To push, you add to the end of the ArrayList, and to pop you remove the last element. The ArrayList can grow, so it never becomes full. The code for this implementation is in ArrayListStack.java. Note that you don't even need to keep track of the top. The ArrayList does it for you.

Do these operations all take O(1) time? It looks like it, as long as add and remove at the end of the ArrayList take O(1) time. The remove operation certainly takes O(1) time. The add usually does is, but sometimes can take longer.

To understand why an add operation can take more than constant time, we need to look at how an ArrayList is implemented. The underlying data structure is an array. There is also a variable to keep track of the size, from which we can easily get the last occupied position in the array. Adding to the end just increases the size by 1 and assigns the object added to the next position. However, what happens when the array is full? A new array is allocated and everything is copied to the new array. Doing so takes time Θ(n), where n is the number of elements in the ArrayList.

If we had to copy the entire ArrayList upon each add operation, the process would be very slow. It would in fact take time O(n2) to add n elements to the end of the ArrayList. That would be too slow. So instead, when the ArrayList is full, the new array allocated is not just one position bigger than the old one, but much bigger. One option is to double the size of the array. Then a lot of add operations can happen before the array needs to be copied again.

With this approach, n add operations will take O(n) time. In other words, the average time per operation is only O(1). We call this the amortized time. Amortization is what accountants do when saving up to buy an expensive item like a computer. Suppose that you want to buy a computer every 3 years and it costs 1500 dollars. One way to think about this is to have no cost the first two years and 1500 dollars the third year. An alternative is to set aside 500 dollars each year. In the third year you can take the accumulated 1500 dollars and spend it on the computer. So the computer costs 500 dollars a year, amortized. (In tax law it goes the other direction: you spend the 1500 dollars, but instead of being able to deduct the whole thing the first year you have to spead it over the 3 year life of the computer, and you deduct 500 dollars a year.)

For the ArrayList case, we can think in terms of tokens that can pay for copying something into the array. Suppose that we have just doubled the array size from n to 2n, which means that we have just copied n elements; let's call these n elements that were copied the "old elements." We spend all our tokens copying the old elements, so that immediately after copying, we have no tokens available. By the time the array has 2n elements and the array size doubles again, we must have one token for each of the 2n elements to pay for copying 2n elements.

Here's how we do it. We charge three tokens for each add:

Therefore, by the time the array has 2n elements, every element has a token, which pays for copying it.

By thinking about it in this way, we see that the cost per add operation is a constant (three tokens).

The array size does not have to double when the array fills. For example, it could increase by a factor of 3/2, in which case you can modify the argument to show that charging four tokens per add operation works.

Queues

A Queue is a FIFO (First In, First Out) data structure. "Queueing up" is a mostly British way of saying standing in line. And a Queue data structure mimics a line at a grocery store or bank where people join at the back, are served when they get to the front, and nobody is allowed to cut into the line. The ADT Queue has at at least the following operations (in addition to a constructor to create an empty Queue):

What do we use a Queue for? An obvious answer is that it is useful in simulations of lines at banks, toll booths, etc. But more important are the queues within computer systems for things like printers. When you submit a print job you are enqueued. When the print job gets to the front of the queue, it is dequeued and printed. Time-sharing systems use round-robin scheduling. The first job is dequeued and run for a fixed period of time or until it blocks (i.e., has to wait) for I/O or some other reason. Then it is enqueued. This process repeats as long as there are jobs in the queue. New jobs are enqueued. Jobs that finish leave the system instead of being enqueued. In this way, every job gets a fair share of the CPU. The book shows how to solve the Josephus problem using a queue. It is basically a round-robin scheduler, where every kth job is killed instead of being enqueued again.

A queue can also be used to search a maze. The same process is used as for the stack, but with a queue as the ADT. This leads to breadth-first search, and will find the shortest path through the maze.

Implementing a queue

An obvious way to implement a queue is to use a linked list. A singly linked list suffices, if it includes a tail pointer. Enqueue at the tail and dequeue from the head. All operations take Θ(1) time.

If you use a circular, doubly linked list with a sentinel, you can organize the list the opposite way: enqueue at the head and dequeue from the tail. If you were to try it this way for a singly linked list, you would keep having to run down the entire list to find the predecessor to the tail when dequeuing, and so this operation would take Θ(n) time.

The textbook presents a Queue interface and part of an implementation using a singly linked list. They also include a size method. The interface CS10Queue in CS10Queue.java has the methods given above, and LinkedListQueue.java is an implementation that uses a SentinelDLL. It could be changed to use an SLL by changing one declaration and one constructor call. All operations would still take Θ(1) time.

Java also has a Queue interface. It does not use the conventional names. Instead of enqueue it has add and offer. Instead of front it has element and peek. Instead of dequeue it has remove and poll. Why two choices for each? The first choice in each pair throws an exception if it fails. The second fails more gracefully, returning false if offer fails (because the queue is full) and null if peek or poll is called on an empty queue. At least isEmpty and size keep their conventional names.

Deques

A deque (pronounced "deck") is a double-ended queue. You can add or delete from either end. A minimal set of operations is

Additional operations include

A deque can be used as a stack, as a queue, or as something more complex. In fact, the Java documentation recommends using a Deque instead of the legacy Stack class when you need a stack. This is. because the Stack class, which extends the Vector class, includes non-stack operations (e.g. searching through the stack for an item). (Vector was replaced by ArrayList and is deprecated in recent Java releases.)

Implementing a deque

Implement a dequeue is with a SentinelDLL. If you look at the methods you will see that all of these operations are already there except for the two remove operations and size(). The remove operations can be implemented by calling either getFirst or getLast and then calling remove. The size operation can be left out or can be implemented by adding a count instance variable to keep track of the number of items in the deque. Each of these operations requires Θ(1) time.

Once again, Java provides two versions of each of each deque operation. The two "add" operations have corresponding "offer" operations (offerFirst and offerLast). The two "remove" operations have corresponding "poll" operations, and the two "get" operations have corresponding "peek" operations. These alternate operations do not throw exceptions.

Iterators

In recent lectures we saw linked lists in which the notion of the current element was part of the SentinelDLL or SLL object. Although in some ways this is convenient, in others it is not. For example, if some method is going through the list and passes the linked list to another method, that method can change the current element. It would be nice if there were some way that each method could have its own independent concept of the element in the list that it is currently dealing with.

In fact, we don't have to incorporate current as an instance variable of SentinelDLL or SLL. We'll focus on modifying the SentinelDLL class today, and we'll see how to make a separate object that knows how to traverse and modify a given list. By making it a separate object, we can have any number of them active at any time. In other words, we could have 0, 1, 2, 3, or any other number of such objects around, and each could have its own notion of the current element of the list. Our modification of the SentinelDLL class will not have the instance variable current, nor will it have anything like current. Therefore it will not have get, remove, next, hasNext, previous, hasPrevious, add, or set methods.

This style of going through a data structure is so common that there's a name for it: an iterator. In fact, it's the basis of one of the standard interfaces in Java: the Iterator interface.

The Iterator interface

The Iterator interface consists of three methods:

Iterators apply to lots of different data structures, not just linked lists. There is a general style of using the Iterator interface. To demonstrate it, we need a class that allows us to get an Iterator for the contents of the collection. The ArrayList class is one such class. The driver in IteratedArrayList.java shows how.

If the Iterator interface is implemented properly, then creating an object that implements the Iterator interface starts an iteration. In IteratedArrayList.java, the iterator for an ArrayList needs a reference to the ArrayList, and it starts the iteration. Then we typically have a while-loop, whose header calls the hasNext method. Within the body of the while-loop, a call to next fetches the next element in the data structure, and the call to next may be followed by a call to remove. IteratedArrayList.java has two iterations through the ArrayList: one to print out all elements and remove every other one, and one to show that the first iteration removed every other element.

We will sometimes see iterators used in for-loops rather than while-loops, but that's OK. After all, a for-loop is just a while-loop in disguise.

You might see similarities between the foreach-loops that we used to run through arrays and iterators. In fact, a foreach-loop for a collection of objects translates into code that uses iterators!. Foreach-loops work for arrays, ArrayLists, and anything else that is "Iteratable." However, an iterator gives us one power that foreach-loops do not. It allows us to remove items.

Implementing an iterator for an ArrayList

How might we implement an iterator for an ArrayList? Here is one way to do it.

Create an object that has three instance variables:

We can then implement the Iterator methods as follows:

An iterator interface for a linked list

When we use an iterator in a linked list, we often want more functionality than the standard Iterator interface provides. In fact, Java supplies a standard ListIterator class. Its concept of "current" is different from the one we have seen. It has a "cursor position" between two elements in the list. A call to next returns the item after the cursor and moves the cursor forward. A call to previous returns the item before the cursor and moves the cursor backwards. Because of the way this works, alternating calls to next and previous will keep returning the same element. In addition to the methods in Iterator, the ListIterator interface requires the following methods:

Calls to the remove and set methods are invalid if there has never been a call to next or previous or if remove or add has been called since the most recent call to next or previous.

The ArrayList class has a method that returns a ListIterator, also. There is a separate class LinkedList, which behaves like our circular doubly-linked list with a sentinel. Both implement the interface List, which requires a number of methods, including all that we saw for ArrayList plus Iterator, and ListIterator. They differ in the amount of time operations take. For instance, a get, set, or add on a LinkedList requires time proportional to the distance that the index is from the nearest end of the list. That means that an add to either the front or end of a LinkedList takes constant time, unlike an ArrayList. If a ListIterator is used, the time required for any method in the interface is constant. For an ArrayList, the time for an add or remove is proportional to the number of items after the item added or removed, even if using a ListIterator.

Because the conventions and operations are different from what we have implemented in SentinelDLL we will show how to implement a ListIterator using this new concept of the current element. We extend the Iterator interface by declaring the CS10ListIterator interface in CS10ListIterator.java.

Because we have removed some of the methods from the SentinelDLL class, we need a new interface for the list class to implement. This new interface, CS10IteratedList in CS10IteratedList.java, is similar to the LinkedList interface in CS10LinkedList.java. The methods add, remove, get, next, and hasNext—all of which require access to the current instance variable—are gone.

There one new method: listIterator. This method will return an object that can iterate through the object whose class implements CS10ListIterator. This returned object starts an iteration.

The SentinelDLLIterator class

SentinelDLLIterator.java is a modified version of the circular, doubly linked list with a sentinel that includes an iterator. The first thing to notice is that the SentinelDLLIterator class implements the CS10IteratedList interface, and so the methods that were in LinkedList but not in CS10IteratedList are missing from SentinelDLLIterator.

The second thing to notice is that the SentinelDLLIterator class has just sentinel as an instance variable; there is no current instance variable, as there was in SentinelDLL.

But the most salient feature of our SentinelDLLIterator class implementation is the inner class DLLIterator, which implements the ListIterator interface. The DLLIterator class is private. Users of the SentinelDLLIterator can still get a DLLIterator by calling the listIterator method. Moreover, because DLLIterator implements the public CS10ListIterator interface, once any part of any program has a reference to a DLLIterator, it can call the public methods in CS10ListIterator on it. The constructor is private, however, so that the only way to create a DLLIterator object is to call the method listIterator on a SentinelDLLIterator object.

And, perhaps most importantly, by making DLLIterator an inner class of SentinelDLLIterator, the methods of DLLIterator can access anything that the methods of SentinelDLLIterator can access. That would include the instance variable sentinel, as well as anything that is public in the Element class (such as data, next, and previous).

The DLLIterator class has two instance variables:

From how we've defined current, it needs to be advanced in next before we return an object when moving forward and after determining the object to return when moving backward. In order for everything to work, current initially references the sentinel (rather than, say, sentinel.next).

I have included an equals method in DLLIterator, and it is set so that two DLLIterator objects are considered equal if they are currently referencing the same Element. The code checks to ensure that both objects involved are DLLIterator objects, and it returns false if they're not.

Returning to the SentinelDLLIterator class, there is a new method listIterator. It creates a new DLLIterator for the SentinelDLLIterator object and returns a reference to it. This listIterator method is made to be called from outside the SentinelDLLIterator class, and because it returns a reference to a DLLIterator, its return value may be assigned to CS10ListIterator or even Iterator (since ListIterator extends Iterator).

Using the iterator

In the SentinelDLL class, the toString method now uses the iterator. Notice how toString uses the iteration paradigm from before, with a while-loop whose test includes the call iter.hasNext and whose body includes the call iter.next.

The DLLIterator created in toString is independent of any other DLLIterator in existence. Where one DLLIterator's current is has no effect at all on where another DLLIterator's current is.

We can really see this independence in ListTestIterator.java. Here, our test driver creates a DLLIterator by the line

CS10ListIterator<String> iter = theList.listIterator();

The current instance variable of this DLLIterator is moved by next and previous and used by add. But when we call theList.toString, the DLLIterator created and used by toString does not affect the DLLIterator in main.

Similarly, the DLLIterator created and used in calls to addFirst and addLast are independent of all others. Therefore adding to the front or back of a list does not change the current item in iter.

I have also added a "clear" option that iterates through the list, removing all objects. (I could have used the clear method, but chose not to). I have added a "print reversed" option that runs through the list backwards, after advancing to the end.

The "nested print" option really shows the power of separate iterators. Here, we have two DLLIterators, outer and inner. For each list object traversed by outer, we perform a full traversal of the list with inner. This task would be impossible if we were limited only to the methods we had in our original linked list implementations.

Multiple iterators can still be dangerous

Having multiple iterators on the same object can be very useful, as we just saw. As long as none of them modifies the list everything is fine. Problems may arise, however, if any of the iterators modifies the list. In particular, if one iterator removes an element that is the current element of another iterator, things can get very messy. Even changing the list by using addFirst and addLast can change how things work, and calling clear is definitely a problem!

Multiple threads (streams of control) can really cause problems. Suppose that you are on the second to last item in the list, you call hasNext and true is returned, and then call next. Should be safe, right? Well, not if somebody else in another thread removed the last item between the two calls. (Maybe somebody clicked on a button or a Timer went off between the calls, and the method registered with the listener changed the list.)

Because of this potential, a bulletproof iterator should throw an exception if the list has been modified in any way except via the iterator's own operations. We won't worry about these situations for now.