Lab 3 - Data structures in C

Due Thursday, April 21, at 10pm

In this lab you will develop some general-purpose data structures that, with modular design, can be re-used for other labs - most notably, the Tiny Search Engine.

Assignment
- Overview
- bag
- counters
- set
- hashtable
General notes
Hints
What to hand in, and how
- Submitting your lab

Grading will focus on CS50 coding style - including consistent formatting, selection of identifier names, and use of meaningful comments - in addition to correctness and testing.

Your C code must compile without producing any compiler warnings. You will lose points if the compiler produces warnings when using our CS50-standard compiler flags.

If your submitted code fails to compile, or triggers a segmentation fault, you will fail all/some of our correctness tests. Write defensive code: each function should check its pointer parameters for NULL, and take some appropriate (safe) action. Write solid unit tests, test drivers, and use regression testing as your development proceeds.

Assignment

Accept the assignment, which includes a starter kit that implements the bag, readlinep, and memory modules.

Clone the starter kit to your own work place:

[MacBook ~]$ ssh cs50
[plank ~]$ cd cs50/labs
[plank ~]$ git clone git@github.com:cs50-spring-2022/lab3-XXXXX.git
  Cloning into 'lab3-XXXXX'...

The clone step will create a new directory ~/cs50/labs/lab3-XXXXX.

Add three new modules, each of which defines a different data structure.

(30 points) set
(30 points) counters
(40 points) hashtable

Overview

The specific data structures are defined in the sections below.

In the table below, we compare these data structures with the two we explored in class. Most of these data structures allow you to store a collection of “items”. Both the set and hashtable are examples of an abstract data structure called a dictionary, which provide methods like insert(key, item) and item = retrieve(key); the key allows the structure to distinguish among the items it stores.

Behavior	linked list	bag	set	counters	hashtable
stores an item	yes	yes	yes	no	yes
uses a key	no	no	yes	yes	yes
keeps items in order	yes	no	no	no	no
retrieval	first item	any item	by key	by key	by key
insertion of duplicates	allowed	allowed	error	increment count	error

Notice that

a list keeps items in order, but a bag or a set does not.
a set and hashtable allow you to retrieve a specific item (indicated by its key) whereas a bag might return any item.
because the bag and list don’t distinguish among items they store, they can hold duplicates; the others cannot.
the counters data structure maintains a set of counters, each identified by a key, but it stores no items. Instead, it keeps a counter for each key; inserting a duplicate key increments the counter.

bag

A bag is an unordered collection of (items). The bag starts empty, grows as the caller adds one item at a time, and shrinks as the caller extracts one item at a time. It could be empty, or could contain hundreds of items. Items are indistinguishable, so the extract function is free to return any item from the bag.

The starter kit includes our bag module, in which bag.c implements a bag of void*, and exports exactly the following functions through bag.h:

/* Create a new (empty) bag; return NULL if error. */
bag_t *bag_new(void);

/* Add new item to the bag; a NULL bag is ignored; a NULL item is ignored. */
void bag_insert(bag_t *bag, void *item);

/* Return any data item from the bag; return NULL if bag is NULL or empty. */
void *bag_extract(bag_t *bag);

/* Print the whole bag; provide the output file and func to print each item.
 * If fp==NULL; do nothing. If bag==NULL, print (null).
 * If itemprint==NULL, print nothing for each item.
 */
void bag_print(bag_t *bag, FILE *fp,
               void (*itemprint)(FILE *fp, void *item));

/* Iterate over the whole bag; call the given function on each item,
 * passing both the item and an argument. Ignore if NULL bag or NULL itemfunc.
 */
void bag_iterate(bag_t *bag, void *arg,
                 void (*itemfunc)(void *arg, void *item) );

/* Delete the whole bag; ignore NULL bag.
 * Provide a function that will delete each item (may be NULL).
 */
void bag_delete(bag_t *bag, void (*itemdelete)(void *item) );

counters

A counter set is a set of counters, each distinguished by an integer key. It’s a set - each key can only occur once in the set - but instead of storing (key,item) pairs, it tracks a counter for each key. It starts empty. Each time counters_add is called on a given key, that key’s counter is incremented. The current counter value can be retrieved by asking for the relevant key.

Your counters.c should implement a set of integer counters with int keys (where keys must be non-negative) and export exactly the following functions through counters.h:

/* Create a new (empty) counter structure; return NULL if error. */
counters_t *counters_new(void);

/* Increment the counter indicated by key; key must be >= 0.
 * If the key does not yet exist, create a counter for it and initialize to 1.
 * Ignore if ctrs is NULL or key is negative.
 * Return the new value of the counter related to the inserted key
 */
int counters_add(counters_t *ctrs, const int key);

/* Return current value of counter associated with the given key;
 * return 0 if ctrs is NULL or if key is not found.
 */
int counters_get(counters_t *ctrs, const int key);

/* Set the current value of counter associated with the given key;
 * If the key does not yet exist, create a counter for it and initialize to
 * the given value. Return false if ctrs is NULL, if key < 0, or count < 0, or if out of memory, otherwise return true.
 */
bool counters_set(counters_t *ctrs, const int key, int count);

/* Print all counters; provide the output file.
 * Ignore if NULL fp. Print (null) if NULL ctrs.
 */
void counters_print(counters_t *ctrs, FILE *fp);

/* Iterate over all counters in the set (in undefined order):
 * call itemfunc for each item, with (arg, key, count).
 * If ctrs==NULL or itemfunc==NULL, do nothing.
 */
void counters_iterate(counters_t *ctrs, void *arg,
                      void (*itemfunc)(void *arg, const int key, int count));

/* Delete the whole counters. ignore NULL ctrs. */
void counters_delete(counters_t *ctrs);

set

A set maintains an unordered collection of (key,item) pairs; any given key can only occur in the set once. It starts out empty and grows as the caller inserts new (key,item) pairs. The caller can retrieve items by asking for their key, but cannot remove or update pairs. Items are distinguished by their key.

Your set.c should implement a set of void* with char* keys, and export exactly the following functions through set.h:

/* Create a new (empty) set; return NULL if error. */
set_t *set_new(void);

/* Insert item, identified by a key (string), into the given set.
 * The key string is copied for use by the set.
 * Return false if key exists, any parameter is NULL, or error;
 * return true iff new item was inserted.
 */
bool set_insert(set_t *set, const char *key, void *item);

/* Return the item associated with the given key;
 * return NULL if set is NULL, key is NULL, or key is not found.
 */
void *set_find(set_t *set, const char *key);

/* Print the whole set; provide the output file and func to print each item.
 * Ignore if NULL fp. Print (null) if NULL set.
 * Print a set with no items if NULL itemprint.
*/
void set_print(set_t *set, FILE *fp,
               void (*itemprint)(FILE *fp, const char *key, void *item) );

/* Iterate over all items in the set, in undefined order.
 * Call the given function on each item, with (arg, key, item).
 * If set==NULL or itemfunc==NULL, do nothing.
 */
void set_iterate(set_t *set, void *arg,
                 void (*itemfunc)(void *arg, const char *key, void *item) );

/* Delete the whole set; ignore NULL set.
 * Provide a function that will delete each item (may be NULL).
 */
void set_delete(set_t *set, void (*itemdelete)(void *item) );

hashtable

A hashtable is a set of (key,item) pairs. It acts just like a set, but is far more efficient for large collections.

Your hashtable.c should implement a set of void* with char* keys, and export exactly the following functions through hashtable.h:

/* Create a new (empty) hashtable; return NULL if error. */
hashtable_t *hashtable_new(const int num_slots);

/* Insert item, identified by key (string), into the given hashtable.
 * The key string is copied for use by the hashtable.
 * Return false if key exists in ht, any parameter is NULL, or error;
 * return true iff new item was inserted.
 */
bool hashtable_insert(hashtable_t *ht, const char *key, void *item);

/* Return the item associated with the given key;
 * return NULL if hashtable is NULL, key is NULL, key is not found.
 */
void *hashtable_find(hashtable_t *ht, const char *key);

/* Print the whole table; provide the output file and func to print each item.
 * Ignore if NULL fp. Print (null) if NULL ht.
 * Print a table with no items if NULL itemprint.
 */
void hashtable_print(hashtable_t *ht, FILE *fp,
                     void (*itemprint)(FILE *fp, const char *key, void *item));

/* Iterate over all items in the table; in undefined order.
 * Call the given function on each item, with (arg, key, item).
 * If ht==NULL or itemfunc==NULL, do nothing.
 */
void hashtable_iterate(hashtable_t *ht, void *arg,
               void (*itemfunc)(void *arg, const char *key, void *item) );

/* Delete the whole hashtable; ignore NULL ht.
 * Provide a function that will delete each item (may be NULL).
 */
void hashtable_delete(hashtable_t *ht, void (*itemdelete)(void *item) );

The starter kit provides code for the hash function.

General notes

Your modules must implement exactly the above interface. Do not modify those function prototypes.
If you need helper functions or data types (likely struct types), those should be defined within your module.c and marked static so they are not visible to users of the module.
Your modules must print nothing (except, of course, in the xxx_print() method). If you want to add debugging printfs, they must be protected by something like #ifdef DEBUG or #ifdef TEST. (You can see some examples in bag.c where I’ve protected some debugging code with #ifdef MEMTEST, and a spot in the bag/Makefile that controls that flag from the compiler command line.)
Your modules must have no global variables.
Your modules must have no main() function; as modules, they are meant to be used by other programs. Recall how the module defined by bag.c and bag.h is used by a test program bagtest.c.
Your modules store void* items; this type is C’s way of describing a “pointer to anything”.
- The caller (user of your module) must pass a pointer (address of some item) to you; your data structure holds that pointer, and later returns it to the caller in response to an ‘extract’ or ‘find’ operation.
- Your module doesn’t know, or doesn’t care, what kind of things are these items. Your module doesn’t allocate memory for items, free memory for items, or copy items - it just tracks the pointer to the item.
- The caller is responsible for the item pointer, which must be allocated (somehow) by the caller. The modules’ _delete function (like a destructor) allows the caller to provide a custom itemdelete function that knows how to free any memory consumed by an item.
- For this reason, the caller must avoid inserting the same item (same address) multiple times; later, the itemdelete method would be called multiple times on that item… which could lead to trouble.
Both set and hashtable work with string-type keys. When adding a new item with set_insert or hashtable_insert, both modules make their own copy of the string - presumably in memory allocated by malloc().
- The module is then responsible for this memory - and later freeing it - just like any other memory it allocates. This ‘copy’ semantics is convenient for the caller, who need not worry about how to allocate and manage the key string after inserting it into the set or hashtable.
- You may assume that a non-NULL key is a proper C string; that is, it is null-terminated.
Your code must have no memory leaks. We will check!
- You may find the memory module helpful, or use the native malloc/calloc and free.
- You may find valgrind useful.
Your modules must each have a Makefile to compile and test the module code.
Your module must have a unit-test mechanism, either included within the module code (see, for example, readlinep.c) or as a test driver (see, for example, bagtest.c in the starter kit).

Hints

You are encouraged to follow the style and layout of the bag module when developing new modules.

You can also learn a lot from our binary trees example. You are welcome to copy snippets of code from this (or any other) CS50 example code as long as you add a comment indicating you’ve done so.

We suggest implementing the set and counters as simplified linked lists, much like we did for bag. Each should be a separate implementation because they differ in detail and semantics.

Your hashtable module, on the other hand, should make use of the set data structure. Indeed, your hashtable will be an array of pointers to sets. Allocating an array of pointers can be tricky; consider these tips.

Linked lists were demonstrated in names5.c, although for a specialized case; you will need to generalize. They were also covered in CS10. Hashtables were also covered in CS10: slides.

What to hand in, and how

Make sure to compile and test your solutions on plank server before submission. If you choose to develop your solutions on some other system, you are responsible to ensure that the work you submit runs correctly on a CS system — which is where where we will test it!

In addition to your code, each of the four sub-directories of lab3 must include following files:

README.md, which describes how your program should be compiled and executed, along with an explanation of any assumptions you made about the assignment. See bag/README.md for an example.
TESTING.md, which describes how you tested the program, including any test inputs, test programs, and test scripts; these test files should be included in your submission.
Makefile, with the default target building an executable test program, and additional target make test that will run the test program with appropriate parameters and inputs.
.gitignore, which should list any files you do not want git to commit, ever. Notably, this should include the name of any executable binary programs your Makefile produces, and scratch files produced by your testing programs.

Your lab3 directory must contain the following:

├── Makefile    # provided by starter kit
├── README.md   # customized with your name and username
├── bag
│   ├── .gitignore  # provided by starter kit
│   ├── Makefile  # provided by starter kit
│   ├── README.md # provided by starter kit
│   ├── bag.c   # provided by starter kit
│   ├── bag.h   # provided by starter kit
│   ├── bagtest.c # provided by starter kit
│   └── TESTING.md # provided by starter kit
├── counters
│   ├── .gitignore
│   ├── Makefile
│   ├── README.md
│   ├── counters.c
│   ├── counters.h  # provided by starter kit
│   ├── counterstest.c  # required by testing
│   └── TESTING.md
├── hashtable
│   ├── .gitignore
│   ├── Makefile
│   ├── README.md
│   ├── hashtable.c
│   ├── hashtable.h # provided by starter kit
│   ├── hashtabletest.c # required by testing
│   ├── jhash.c   # provided by starter kit
│   ├── jhash.h   # provided by starter kit
│   └── TESTING.md
├── lib
│   ├── Makefile  # provided by starter kit
│   ├── README.md # provided by starter kit
│   ├── memory.c  # provided by starter kit
│   ├── memory.h  # provided by starter kit
│   ├── readlinep.c # provided by starter kit
│   └── readlinep.h # provided by starter kit
└── set
    ├── .gitignore
    ├── Makefile
    ├── README.md
    ├── set.c
    ├── set.h     # provided by starter kit
    ├── settest.c # required by testing
    └── TESTING.md

Submitting your lab

Make sure you will commit the correct set of files:

git status

Commit your changes:

git commit -m "SUBMIT"

This command will commit your files and mark this commit with a message “SUBMIT”, indicating that we should grade this commit.

Push your changes to GitHub:

git push

Make sure you left nothing unexpected behind:

git status

If you need to make updates, repeat the add, commit, push sequence.

You can git commit -m "SUBMIT" again, before the deadline, if you realize that you’d forgotten something. We will grade the last commit with message “SUBMIT” pushed before the deadline.

To see your Lab submissions, as we see them, visit classroom.github.com and then click on Lab 3. You will then be able to click on your specific Lab 3 repository, and see the files and commit history as GitHub has them. Remember that it is the commit that has a commit message, not files.

If you decide to submit late: If you did not push a commit with the message “SUBMIT”, before the deadline, we will look again at 24h, 48h, and 72h after the deadline. We will grade the first commit with the message “SUBMIT”. We will apply late penalty accordingly.

Late commits will be ignored if they follow any commit with message “SUBMIT”.