Keeping order
We've stored data using the general-purpose List ADT, which is basically just a collection of objects that we can access, add to, and remove from. (We saw that there are efficiency trade-offs in patterns of access and modification, depending on implementation.) Today we will look at other list-like ADTs that have more specialized semantics of what it means to add and remove, and thus how we should do that. They maintain a particular type of order for different applications.
All the code files for today: ArrayListStack.java; MatchParens2.java; SLLQueue.java; SLLStack.java; SimpleQueue.java; SimpleStack.java
Stacks
A stack is a LIFO (Last In, First Out) data structure. The book compares a stack to a spring-loaded stack of cafeteria trays or a pez machine — the last one on is the first one off. A Stack ADT, as in our interface SimpleStack.java, has at least the following operations (in addition to a constructor to create an empty stack):
- push - add to top of stack
- pop - remove the top item on the stack and return it
- peek - return the top item but don't remove it
- isEmpty - return true iff the stack is empty
The class java.util.Stack also has these operations, but instead of the name isEmpty() they use empty(). It also has an additional operation, size(). The Java Stack is actually somewhat deprecated, and the documentation suggests the use of Deque (double-ended queue) as a better choice; it calls the operations addLast and removeLast if we think of the tail as the "top" of the stack, or addFirst/removeFirst if we think of the head as the top. The book has its own version of the ADT, but instead of the name peek() they use top(), and they also add size(). You would think that computer scientists could agree on a standard set of names! Well, at least we all agree on push and pop!
So what good is a stack? It has many, many applications. You already know that the run-time stack handles allocating memory in method calls and freeing memory on returns. (It also allows recursion). A stack is an easy way to reverse a list or string: push each item on the stack then pop them all off. They come off in the opposite order that they went on. We will see later that they support depth-first search of a maze or a graph (or think back to PS-1...).
They are good for matching parentheses or braces or square brackets or open and close tags in HTML. They nest, so that the most-recently-seen open paren should match the curren close paren (LIFO). Here's an example with a few different types of parentheses: MatchParens.java, for HTML, just handle and keep track of tags instead of parens (see the book). Each time we see an open paren, push it on the stack. Each time we see a close paren, make sure it has a partner at the top of the stack.
How do we implement a stack? A simple option is to use an array. The implementation has two instance variables: the array stack and an int called top that keeps track of the position of the top of the stack. In an empty stack top==-1. (Draw the picture.) To push() add 1 to top and save the thing pushed in stack[top]. To peek() just return stack[top] (after checking that top >= 0). pop() is peek() but with a top--.
This implementation is fast (all operations are O(1)) and it is space efficient (except for the unused part of the array). The drawback is that the array can fill up, and we might get an exception on push() as well.
To avoid the problem of the array being full, we can use an ArrayList. To push(), add to the end of the ArrayList and to pop(), remove the last item. The ArrayList can grow, so it never becomes full. The code to do this is in ArrayListStack.java. Note that we don't even need to keep track of the top! The ArrayList does it for us.
Do these operations all take O(1) time? It looks like it, as long as add() and remove() at the end of the ArrayList are O(1) time. The remove is O(1). The add usually is, but sometimes can take longer. Recall our analysis of ArrayList.
Now that we're thinking about list implementations, a singly-linked list works perfectly for a stack. We could just use an instance of the previous SLL class, or it's easy enough to code up from scratch: SLLStack.java. The top of the stack is the head of the list. The push() operation adds to the front of the list and pop() removes from the front. All operations are O(1) in this implementation. We need to have space for the links in the linked list, but we never have empty space as in the array implementation.
Queues
Queues are FIFO (First In, First Out). "Queueing up" is a mostly British way of saying standing in line. And a Queue data structure mimics a line at a grocery store or bank where people join at the back, are served when they get to the front, and nobody is allowed to cut into the line. A Queue ADT, like SimpleQueue.java has at at least the following operations (in addition to a constructor to create an empty Queue):
- enqueue - add at rear of the queue
- dequeue - remove and return the first item in the queue
- front - return at the first item but don't remove it
- isEmpty - return true iff the queue is empty
Java also has a Queue interface. It does not use the conventional names. Instead of enqueue() it has add() and offer(). Instead of front() it has element() and peek(). Instead of dequeue() it has remove() and poll(). Why two choices for each? The first choice in each pair throws an exception if it fails. The second fails more gracefully, returning false if offer() fails (because the queue is full) and null if peek() or poll() is called on an empty queue. At least isEmpty() and size() keep their conventional names.
What good is queue? An obvious answer is that it is useful in simulations of lines at banks, toll booths, etc. But more important are the queues within computer systems for things like printers. When you submit a print job you are enqueued. When you get to the front of the line you are dequeued and get to print. Time sharing systems use round-robin scheduling. The first job is dequeued and run for a fixed period of time or until it blocks for I/O or some other reason. Then it is enqueued. This repeats as long as there are jobs in the queue. New jobs are enqueued. Jobs that finish leave the system instead of enqueueing. This way every job gets a fair share of the CPU.
We'll also later explore the use of queues in searching; as with stacks, they would have been just fine for our region growing in PS-1 — keep pulling points from the queue and adding their (unvisited, correctly-colored) neighbors to the queue.
Let's think about implementing a queue with a linked list. As we saw, an SLL easily implements a stack, by adding to and removing from the head. So for a queue we need to add to one end and remove from the other. It's easy to do either add() or remove() from the head, so we need to figure out how to do the other from the tail. Suppose in addition to the head we keep track of the tail. Then we can add to the tail by setting its next, and then updating the tail pointer to that new element. (There's no easy way to remove from the tail, unless we have a doubly-linked list — with an SLL we'd have to march all the way down the list to find the element before the tail, in order to splice out the tail.) SLLQueue.java does just that. All operations are Θ(1).
There are a number of Java implementations: ArrayBlockingQueue, ConcurrentLinkedQueue, DelayQueue, LinkedBlockingQueue, and LinkedList among them. LinkedList is probably the most common implementation used for normal purposes.
A blocking queue is a bounded buffer used for interprocess communication. The Unix operating system makes it easy to connect programs using "pipes", where the output of the first program becomes the input of the second. The first program is the producer and the second program is the consumer. The producer calls offer() to enqueue and the consumer calls poll() to dequeue. If the queue is full when the producer calls offer the producer blocks (waits) until something is removed from the queue. If the queue is empty when the consumer calls poll the consumer blocks until something is added to the queue.
It may seem strange to have an ArrayBlockingQueue. An array or an ArrayList seem unpromising as a data structure for implementing a queue. We could have an array (or ArrayList) where we enqueue at the end and dequeue from the front, but each dequeue requires moving everything else in the array (or ArrayList) one position forward. Enqueuing at the front and dequeueing from the end makes dequeuing fast, but now enqueueing requires moving everything else in the array (or ArrayList) one postion back.
We can make things more efficient if when dequeueing from the front of an array we simply leave the first spot empty and remember that the front of the queue is at the second position. By having two indicies f and r (for front and rear) we can make both enqueue and dequeue Θ(1) time. (Demonstrate with a diagram on the board.) However, when r reaches the end of the array we are stuck.
Or are we? There are probably a lot of free spaces at the front of the array. Suppose that we wrap around and start using them to hold items in the queue? If r < f this indicates that the queue has wrapped around. The book shows how to implement this. They have f contain the index of the first item in the queue and r contain the index of the space beyond the last item (the place were a new item should be enqueued). Then the way to indicate an empty queue is to have f == r. But this would be the same thing as a full queue! Therefore we always have to leave an empty space. An interesting thing is that if N is the size of the array, then (N - f + r) % N is the size of the queue. This works whether the queue has f < r, f == r, or f > r.