CS 33 (Spring 2009): Homework 6: Extendable Hashing
Deadline: 10:00pm sharp on May 27, 2009
Please read through this page entirely and carefully, before
beginning your work.
There is just one problem in this homework. This is because you
are expected to be working on the course project as well.
- Implement the extendable hashing scheme from the textbook,
using buckets that can hold up to 3 records each. Use your
implementation to build a hash index for the data in
~cs33/data/film.txt on the film_name attribute,
reading in the records one by one, in the order in which they appear
in the file. The single quotes are not part of the film name.
Each record in each bucket should be a 256-byte string and should
hold the corresponding line of the data file verbatim (quotes,
commas and terminating newline included).
[30 points]
- Use the hash function cs33_hash found in
~cs33/data/murmur_hash.c to hash the keys (i.e., the film
names) to 32-bit values. Of course, you will only be using a prefix
of this hash value, consisting of its most significant few bits.
- Your program should behave in one of the following ways,
depending on the command line.
- If invoked with no command line argument: Immediately
after inserting the 10th record in the hash table, display the
current state of the table (both the bucket address table and the
contents of all allocated buckets) on standard output. You should
write a suitable pretty-printing procedure/function for this
purpose. Display the pointers in the bucket address table in
"%p" format (in printf notation). When displaying the
contents of a bucket, simply print out each record as a string.
After processing the entire file, print (on standard output) the
hash prefix length used in the bucket address table and the total
number of buckets created to hold the records. Use this format:
"Final hash prefix length = %d\nNumber of buckets = %d\n"
[25 points]
- If invoked with one command line argument: Process the
entire file silently, then use your hash table to search for the
film names given on the command line. Print (on standard output) the
entire contents of the bucket containing the record for this film
name (as specified above). If the film name is not found, print an
appropriate error message. Note: Since the film names have spaces in
them, you will have to invoke the program in one of the following
ways:
$ progname MUPPET\ MILE
$ progname 'MUPPET MILE'
[20 points]
- If invoked with two or more command line arguments:
Ignore all arguments except for the first one and behave as above.
- You must program in a language where you can control
memory allocation yourself: C and C++ are best for this. Perl and
Python are not too good for this. If you wish to use a language
other than C/C++, get clearance from the course staff first.
What to turn in
- Write a single program with all of the above functionality and
turn in its source code. You may use tar/gzip or zip if you have
multiple source files.
- Submit exactly one file using the homework
submission form. Keep in mind: this homework is numbered
"Homework 6".