Last time, we started looking at the heap data structure to implement a priority queue. Let's see how to implement one. An implementation of a min-heap in an ArrayList
is in HeapMinPriorityQueue.java. It shows how to implement the operations described above. Switching between max-heaps and min-heaps shouldn't throw you.
Let's look at the worst-case running times of the min-priority queue operations in this implementation. We express them in terms of the number n of elements that are in the min-priority queue when the operations occur.
isEmpty
just returns a boolean indicating whether the size of the ArrayList
is zero. This method takes constant time, or Θ(1).insert
first adds a new reference at the end of the ArrayList
, which takes Θ(1) amortized time. It then has to bubble the value up the heap until it is less than its parent. Each swap takes constant time, and the number of swaps is bounded by the height of the heap. Thus, insert
takes O(lg n) time.minimum
just returns what is in position 0 of the ArrayList
, taking Θ(1) time.extractMin
returns the element in position 0 and puts the last element in its place, taking Θ(1) time. It then has to restore the heap property, however, and so it has to bubble the new root down until it is smaller than both children or is a leaf. Like insert
, this procedure takes O(lg n) time.We can also use a heap as the basis of a sorting algorithm called heapsort. Its running time is O(n lg n), and it sorts in place. That is, it needs no additional space for copying values (as merge sort does) or for a stack of recursive calls (as needed in quicksort and merge sort).
Heapsort has two major phases. You can see all the steps in this PowerPoint presentation. First, given an array of values in an unknown order, we have to rearrange the values to obey the max-heap property. That is, we have to build a heap. Then, once we've built the heap, we repeatedly pick out the maximum value in the heap—which we know is at the root—swap it with the last leaf in the heap, and restore the max-heap property. When we put the maximum value into the array position that had held the last leaf, we consider that array position to no longer be part of the heap.
The code for heapsort is in Heapsort.java. We've written it to sort an array, rather than an ArrayList
, but you can easily modify it to sort an ArrayList
. Or you can use the overloaded version that takes an ArrayList
, converts it to an array, sorts the array, and then copies the sorted array back into the ArrayList
. At the bottom, you can see some private methods that help out other methods in the class: swap
, leftChild
, and rightChild
.
The obvious way to build a heap is to start with an unordered array. The first element is a valid heap. We can then insert the second element into the heap, then the third, etc. After we have inserted the last element, we have a valid heap. This idea works fine and leads to an O(n lg n)-time heapsort. We can avoid implementing the insert code and speed up the algorithm a bit, however, by building the heap from the bottom up rather than from the top down and using the same idea as when we restore the max-heap property during the extractMax
operation.
The code to restore the max-heap property is in the maxHeapify
method. It takes three parameters: the array a
holding the heap and indices i
and lastLeaf
into the array. The maxHeapify
method assumes that, when it is called, if you look at the subarray a[i..lastLeaf]
(the subarray starting at index i
and going through index lastLeaf
), the max-heap property holds everywhere in this subarray, except possibly among node i
and its children. maxHeapify
restores the max-heap property everywhere in the subarray.
maxHeapify
works as follows. It computes the indices left
and right
of the left and right children of node i
, if it has such children. Node i
has a left child if the index left
is no greater than the index lastLeaf
of the last leaf in the entire heap, and similarly for the right child.
maxHeapify
then determines which node, out of node i
and its children, has the greatest key value, storing the index of this node in the variable largest
. First, if there's a left child, then whichever of node i
and its left child has the larger value is stored in largest
. Then, if there's a right child, whichever of the winner of the previous comparison and the right child has the larger value is stored in largest
.
Once largest
indexes the node with the largest value among node i
and its children, we check to see whether we need to do anything. If largest
equals i
, then the max-heap property already is satisfied, and we're done. Otherwise, we swap the values in node i
and node largest
. By swapping, however, we have put a new, smaller value into node largest
, which means that the max-heap property might be violated among node largest
and its children. We call maxHeapify
recursively, with largest
taking on the role of i
, to correct this possible violation.
Notice that in each recursive call of maxHeapify
, the value taken on by i
is one level further down in the heap. The total number of recursive calls we can make, therefore, is at most the height of the heap, which is Θ(lg n). Because we might not go all the way down to a leaf (remember that we stop once we find a node that does not violate the max-heap property), the total number of recursive calls of maxHeapify
is O(lg n). Each call of maxHeapify
takes constant time, not counting the time for the recursive calls. The total time for a call of maxHeapify
, therefore, is O(lg n).
Now that we know how to correct a single violation of the max-heap property, we can build the entire heap from the bottom up. Suppose we were to call maxHeapify
on each leaf. Nothing would change, because the only way that maxHeapify
changes anything is when there's a violation of the max-heap property among a node and its children. Now suppose we called maxHeapify
on each node that has at least one child that's a leaf. Then afterward, the max-heap property would hold at each of these nodes. But it might not hold at the parents of these nodes. So we can call maxHeapify
on the parents of the nodes that we just fixed up, and then on the parents of these nodes, and so on, up to the root.
That's exactly how the buildMaxHeap
method in Heapsort.java works. It computes the index lastNonLeaf
of the highest-indexed non-leaf node, and then runs maxHeapify
on nodes by decreasing index, all the way up to the root.
You can see how buildMaxHeap
works on our example heap, including all the changes made by maxHeapify
, by running the slide show in the PowerPoint presentation. Run it for 17 transitions, until you see the message "Heap is built."
Let's analyze how long it takes to build a heap. We run maxHeapify
on at most half of the nodes, or at most n/2 nodes. We have already established that each call of maxHeapify
takes O(lg n) time. The total time to build a heap, therefore, is O(n lg n).
Because we are shooting for a sorting algorithm that takes O(n lg n) time, we can be content with the analysis that says it takes O(n lg n) time to build a heap. It turns out, however, that a more rigorous analysis shows that the total time to run the buildMaxHeap
method is only O(n). Notice that most of the calls of maxHeapify
made by buildMaxHeap
are on nodes close to a leaf. In fact, about half of the nodes are leaves and take no time, a quarter of the nodes are parents of leaves and require at most 1 swap, an eighth of the nodes are parents of the parents of leaves and take at most 2 swaps, and so on. If we sum the total number of swaps, it ends up being O(n).
The second phase of sorting is the while-loop in the heapsort
method in Heapsort.java. After heapsort
calls buildMaxHeap
so that the array obeys the max-heap property, the while-loop sorts the array. You can see how it works on the example by running the rest of the slide show in the PowerPoint presentation.
Let's think about the array once the heap has been built. We know that the largest value is in the root, node 0. And we know that the largest value should go into the position currently occupied by the last leaf in the heap. So we swap these two values, and declare that the last position—where we just put the largest value—is no longer in the heap. That is, the heap occupies the first n − 1 slots of the array, not the first n. The local variable lastLeaf
indexes the last leaf, and so we decrement it. By swapping a different value into the root, we might have caused a violation of the max-heap property at the root. Fortunately, we haven't touched any other nodes, and so we can call maxHeapify
on the root to restore the max-heap property.
We now have a heap with n − 1 nodes. The nth slot of the array—a[n-1]
—contains the largest element from the original array, and this slot is no longer in the heap. So we can now do the same thing, but now with the last leaf in a[n-2]
. Afterward, the second-largest element is in a[n-2]
, this slot is no longer in the heap, and we have run maxHeapify
on the root to restore the max-heap property. We continue on in this way, until the only node that we have not put into the heap is node 0, the root. By then, it must contain the smallest value, and we can just declare that we're done. (This idea is analogous to how we finish up selection sort, where we put the n − 1 smallest values into the first n − 1 slots of the array. We then declared that we were done, since the only remaining value must be the smallest, and it's already in its correct place.)
Analyzing this second phase is easy. The while-loop runs n − 1 times (once for each node other than node 0). In each iteration, swapping node values and decrementing lastLeaf
take constant time. Each call of maxHeapify
takes O(lg n) time, for a total of O(n lg n) time. Adding in the O(n lg n) time to build the heap gives a total sorting time of O(n lg n).
Java has a group of interfaces for holding collections of objects and classes that implement them. We have briefly touched up List
, which is an interface with two Java-provided implementations: ArrayList
and LinkedList
. Today will look at two other interfaces for holding collections of objects: Set
and Map
. Each has two Java-provided implementations. Set
is implemented by HashSet
and TreeSet
. Map is implemented by HashMap
and TreeMap
. We will be looking at their underlying data structures, hash tables and binary search trees, in the next few lectures.
List
interfaceAmong other methods, the List<E>
interface provides the following:
boolean add(E o)
true
.void add(int index, E o)
index
of this list.void clear()
boolean contains(Object o)
true
if this list contains the specified element, false
otherwise.E get(int index)
index
.boolean isEmpty()
true
if this list contains no elements, false
otherwise.int indexOf(Object o)
Iterator<E> iterator()
ListIterator<E> listIterator()
E remove(int index)
boolean remove(Object o)
true
if the element is present, false
otherwise.E set(int index, E element)
int size()
If both ArrayList
and LinkedList
implement this set of operations, why have both? Efficiencies differ. Access operations (set
and get
) take constant time in an ArrayList
, but require time proportional to the distance to the nearest end for a LinkedList
. (The LinkedList
is a doubly-linked circular linked list, and it's smart enough to start at the closest end.) On the other hand, modification operations (add
and remove
at a given index) require time proportional to the number of elements after the index in an ArrayList
, because all of these elements must be copied. But for a LinkedList
, they take constant time after the time to access the index (distance from nearest end). Therefore, all Iterator
or ListIterator
operations take constant time for a LinkedList
, but add
and remove
operations take time proportional to the number of remaining elements for an ArrayList
.
Set
interfaceA Set
differs from a List
in that a List
has a linear order, whereas a Set
does not. Furthermore, an element can appear multiple times in a List
but only once in a Set
.
Here are the primary operations on a Set<E>
:
boolean add(E o)
true
if o
was not in the set, false
if o
was already in the set.void clear()
boolean contains(Object o)
true
if this set contains the specified element, false
otherwise.boolean isEmpty()
true
if this set contains no elements, false
otherwise.Iterator<E> iterator()
boolean remove(Object o)
true
if the o
was present, false
otherwise.int size()
All of these methods are also part of the List
interface. So why have a separate interface?
The main reason is implementation efficiency. The contains
operation on either an ArrayList
or a LinkedList
with n elements takes O(n) time, and for an ArrayList
the remove
operation can take O(n) time. For applications such as a dictionary for a spell checker, these running times are too slow.
There are two implementations of Set
in the Java class library. Both implement the contains
operation more efficiently than it can be implemented for a List
.
The first implementation is TreeSet
, which uses a data structure called a balanced binary tree to store the data. You can think of it as a little like a linked list on which you can do binary search. We will talk about this data structure soon. The important point is that the add
, remove
, and contains
methods all take O(lg n) time for a set with n elements. It works only on Comparable
objects. The iterator is guaranteed to return the elements in increasing order by compareTo
and takes O(n) time to iterate through the entire set. Getting the first element from the iterator takes O(lg n) time.
The second is HashSet
, which uses a data structure called a hash table. We will talk about hash tables next time.
If the hash table is used properly, then the add
, remove
, and contains
operations all take O(1) time on average (although it is possible that they could take Θ(n) time if you were extremely unlucky). The iterator returns the elements in a somewhat arbitrary order.
As an example of the use of sets, consider the program SetDemo.java. It creates a set consisting of all of the keywords in Java. It then uses an iterator to go through the set and print each of the words. (Note that an iterator on a Set
is identical to an iterator on a List
.) Finally, it lets the user type words and determines if they are keywords by using contains
to see if they are in the set.
Map
interfaceThe Map
interface describes a data structure that can be thought of as a set where each element has associated data. Each data element is associated with a key. By looking up the key, you can get the associated data, just like a dictionary in Python. A key is typically something like your student ID number, and the associated data might be your student record. A Map
can be implemented using balanced a binary tree or a hash tables, just like a Set
.
Where K
is the generic type for the key and V
is the generic type for the associated data, The primary operations in a Map<K,V>
are the following:
void clear()
boolean containsValue(Object value)
true
if this map maps one or more keys that map to the specified value, false
otherwise.V get(Object key)
boolean isEmpty()
true
if this map contains no key-value mappings.Set<K> keySet()
Set
containing the keys contained in this map.V put(K key, V value)
null
if key was not in the map.V remove(Object key)
null
if key is not in the map).int size()
For an example of the use of a map, consider AnimalSounds.java. This program allows the user to insert animal names as keys and the sounds that they make as the associated data. The user can then ask for the sound that a given animal makes, or to remove an animal from the map.
Note the way the the print operation works. The code for this is
if (animalMap.isEmpty())
System.out.println("The map is empty");
else {
System.out.println("Here are the animals and their sounds:");
Set<String> animalNames = animalMap.keySet();
Iterator<String> iter = animalNames.iterator();
while (iter.hasNext()) {
animal = iter.next();
System.out.println(toTitleCase(getArticle(animal)) + " "
+ animal + " says " + animalMap.get(animal) + ".");
}
}
Note that the first step is to call keySet
to get all of the keys in the map. Then we create iterator for the set, and we use it to iterate through the set, printing each key and the value returned by get
for that key.
We probably won't have time to get to this example in class, but it's worth going through on your own.
The method of voting in which the candidate with the most votes wins the election has some drawbacks. If two conservatives get in a race against a liberal in a conservative district they could split the conservative vote and the liberal gets elected, even though he is the third choice of the majority of the voters in the election. Also, third parties have a hard time getting established, because voting for a third-party candidate can be throwing away your vote. If about a third of the 22,000 New Hampshire voters who voted for Nader in the Bush-Gore election had voted for Gore instead, he would have won the state and the presidency. Florida, and its hanging chads, would not have mattered.
Some states solve these problems by having a runoff election between the top two candidates if nobody gets a majority of the votes. But a runoff election costs time and money. A popular alternative suggestion is the instant runoff election.
In an instant runoff election, the voters fill out a ballot with an ordered list of candidates, from most favorable to least favorable. The election takes place in rounds. In the first round, each ballot awards a vote to the first candidate on the ballot. If nobody has a majority, then the candidate with the fewest votes is dropped from the election. (In case of ties we will chose one at random.) Then another round is run. This time, each ballot's vote is awarded to the first candidate in its list who has not been eliminated. The bottom candidate is dropped, and the process repeats until one candidate has a majority. (In fact, it can repeat until there is just one candidate left and get the same result. Once someone has a majority they will never be eliminated.)
How could we write a program to determine the winner of an instant runoff election? The first step is to determine what objects appear in the problem and how they interact with one another. One obvious choice is a ballot. We could say, "Oh, that is just a list" and not create an object for it. But let's take an object-oriented approach and say that there should be a Ballot
class.
Another object would be the set of all the ballots in the election. We could just say, "Make a set of lists," but let's make Election
a class, also.
A final object that might be less obvious is one that represents the results of the voting. Let's create a VoteTally
class. The alternative is to use a map from candidate names to the number of votes that they received.
We could have a class to keep track of the current set of candidates, but the Set
class seems to do everything that we are likely to need. Unless we discover an action that we need to do that the Set
class doesn't handle, we will just use a Set
.
What actions do we need to perform? We first need to get our set of candidates. Note that we can limit this set to candidates who get at least one first-round vote. Others will have zero votes and will be dropped before any of the candidates who got first-round votes. It sounds like Election
is the class that has access to the data to perform this, with help from the Ballot
class to get the first element on each ballot.
Next, we have to run a round of the election. This task requires going through all of the ballots, determining to whom each vote should go, and increasing that candidate's tally by 1. The Ballot
class has the data to determine who should get the vote. The Election
class has the ballots. The VoteTally
class should update itself by adding a vote for the candidate.
After running a round, we have to find the candidate with the fewest votes. The VoteTally
class has the information to do so. But what if there is a tie? Maybe we find a list of candidates who share the lowest vote total. We pick one at random and eliminate that candidate from the current candidate set.
We have to repeat running a round of the election and eliminating the candidate with fewest votes until we have only one candidate left. This procedure does not seem to be appropriate for any class. A method in a new class, InstantRunoff
, can do this.
So what sorts of things do we want to be able to do with a Ballot
object?
addCandidate
method to add candidates to the ballot.There are many other possible things we could do with a ballot. Getting all of the candidates in order is one possibility, and so we could supply an iterator. A toString
method could be useful. A way of getting the number of candidates on the ballot could be useful. But for now we will do the minimum. We can always come back later to add new methods.
What should we do with an Election
object? We need to create it, plus perform the jobs mentioned above.
Election
. A constructor to create an empty Election
and an addBallot
method could take care of this.What about the VoteTally
object?
The code in Ballot.java, Election.java, and VoteTally.java do these operations. The class InstantRunoffOO.java supplies the method to loop through the rounds and the main method for testing.
You can run this code using ballots.txt as input, a file I made with 200 randomly created ballots, but according to a probability distribution. You'll need to modify the string in ballotFileName
for your own computer.
An alternate approach is InstantRunoffProc.java. This code does the same thing as InstantRunoffOO.java, but through fixed data structures and static methods. It has less code, which is a plus. There are longer lists of parameters, as all of the data must be passed around "bare." We see data declarations such as List<List<String>> ballots
. These declarations are not easy to read and take getting used to. In short, there is no data encapsulation, which is a minus.
In a program this short, encapsulation and data hiding aren't that important. On the other hand, I originally had a Set
of ballots instead of a List
. The Set
of Ballot
objects in InstantRunoffOO.java wasn't a problem, because Ballot
did not override equals
. But in InstantRunoffProc.java, it was a problem, because two ballots with choices "Romney Huntsman" were entered into an election but Romney got only one vote. The two ArrayList
objects ended up being equal, so only one was kept in the set. Changing from Set<List<String>>
to List<List<String>>
required five changes spread out over four methods. Finding all of the appropriate changes in a much bigger program (and avoiding changes where the Set<List<String>>
wasn't dealing with ballots and may have been correct as it was) would have been tedious and error-prone. In contrast, making the change in InstantRunoffOO.java required two changes in Ballot.java: declaring the instance variable and initializing it in the constructor. I would have only needed to make those two changes, even if the program had millions of lines.
Scanner
classNote the use of the Scanner
class in the programs examined today. You can open it on an input stream (usually System.in
) or even on a String
. Then you can read any type of data. The next
method reads the next token as a String
. (Recall that tokens are like words, separated by white space. But you can also change the separator. The class is very flexible.) You can also call nextLine
, nextInt
, nextLong
, nextDouble
, nextFloat
, nextBoolean
, nextBigInteger
, nextBigDecimal
, nextShort
, and nextByte
. It will read characters from the input and convert them to the corresponding type. There is also a "has" version of each of these that returns true if the next thing in the input can be converted to the corresponding type (hasNext
, hasNextLine
, hasNextInt
, etc.).