CS 33 (Spring 2010): Homework 7: Extendable Hashing
Deadline: 10:00pm sharp on May 24, 2010
Please read through this page entirely and carefully, before
beginning your work.
There is just one problem in this homework. This is because you
are expected to be working on the course project as well.
- Implement the extendable hashing scheme from the textbook,
using buckets that can hold up to 3 records each. Use your
implementation to build a hash index for the data in
~cs33/data/film.txt on the film_name attribute,
reading in the records one by one, in the order in which they appear
in the file. The single quotes are not part of the film name.
Each record in each bucket should be a 256-byte string and should
hold the corresponding line of the data file verbatim (quotes,
commas and terminating newline included). You may assume that each
line of the data file is at most 255 characters long.
The grading of your implementation will take into account
the clarity and beauty of your code, in addition to its correctness.
[30 points]
- Use the hash function cs33_hash found in
~cs33/data/murmur_hash.c to hash the keys (i.e., the film
names) to 32-bit values. Of course, you will only be using a prefix
of this hash value, consisting of its most significant few bits.
- Your program should read exactly one argument from the command
line (report an error if the number of arguments is not one) and then
behave in one of the following ways, depending on the argument type.
- If the argument is a positive integer, n: Immediately
after inserting the nth record in the hash table (or the last
record, if the file contains fewer than n records in total),
display the current state of the table (both the bucket address table and
the contents of all allocated buckets) on standard output. You should
write a suitable pretty-printing procedure/function for this purpose.
Display the pointers in the bucket address table in "%p" format
(in printf notation). When displaying the contents of a bucket, simply
print out each record as a string. Continue processing the file, up to
the end. After processing the entire file, print (on standard output)
the hash prefix length used in the bucket address table and the total
number of buckets created to hold the records. Use this format:
"Final hash prefix length = %d\nNumber of buckets = %d\n"
[25 points]
- If the argument is any string other than a positive integer:
Treat the argument as the name of a film. Process the entire file
silently, then use your hash table to search for the film names given on
the command line. Print (on standard output) the entire contents of the
bucket containing the record for this film name (as specified above). If
the film name is not found, print an appropriate error message. Note:
Since the film names have spaces in them, you will have to invoke the
program in one of the following ways:
$ progname MUPPET\ MILE
$ progname 'MUPPET MILE'
[20 points]
- You must program in a language where you can control memory
allocation yourself: C and C++ are best for this (Java works too, though
you'll have to translate murmur_hash.c yourself, first). Perl and
Python are not too good for this. If you wish to use a language other than
C/C++/Java, get clearance from the course staff first.
What to turn in
- Write a single program with all of the above functionality and
turn in its source code. You may use tar/gzip or zip if you have
multiple source files.
- Provide instructions to compile and run your program on
"sunapee", either as comments on top of the source code, on in a
separate file with a ".txt" extension. If you wish, you may
provide a Makefile.
- Submit your files (there's a limit of 3) using the homework
submission form. Keep in mind: this homework is numbered
"Homework 7".