Lab 6 - Querier iterators
Assignment description | Requirements Spec | Example output | Chopping | Expressions | Iterators | Fuzz testing
In this unit, we discuss Iterators and their uses:
- what is an Iterator, and why to use an Iterator
- how to code an Iterator in the context of our data structures
- how it leverages function pointers in C
- passing arguments to iterators
- several uses of iterators
- using iterators to merge sets
Why iterators?
Iterators are a powerful concept, especially when writing an abstract data structure that represents a collection. We have seen several such data structures - tree, bag, set, hashtable, and counters. If the collection implements an iterator, we can apply some function to every item in that collection… e.g., for printing, counting, modifying, and even combining multiple collections.
In Lab 3 we provide an _iterate()
method in the bag
module and you add a similar method for your set
, hashtable
, and counters
modules.
In Lab 5 these iterators are helpful in writing the contents of the Index to the index file.
In Lab 6 these iterators are helpful in combining the set of matches for one word with the set of matches for another word.
Coding an iterator
Recall the bag
module, which has a bag_iterate()
method:
void
bag_iterate(bag_t* bag, void* arg, void (*itemfunc)(void* arg, void* item) )
{
if (bag != NULL && itemfunc != NULL) {
// call itemfunc with arg, on each item
for (bagnode_t* node = bag->head; node != NULL; node = node->next) {
(*itemfunc)(arg, node->item);
}
}
}
Notice that the code begins with defensive programming - in case the caller accidentally calls it with NULL parameters.
Otherwise, the function is a simple for
loop, stepping through each item in the bag.
This iterator makes no promise about the order in which it processes items; after all, a ‘bag’ is an unordered, unlabeled collection of ‘things’.
Function pointers
Recall our discussion of function pointers from an earlier unit.
The second parameter to bag_iterate
declares itemfunc
as a pointer to a function that itself takes two parameters: an arg
and a data
.
Both are void pointers, that is, pointers to some unspecified type.
Because the iterator receives a function pointer from its caller, and the arg/data parameters are arbitrary pointers, this iterator can work on items of any type, and compute any sort of function on those items, making it truly general-purpose.
Look inside the for
loop, where we call itemfunc
.
Here, we dereference the function pointer to get a function, then call it with two parameters: the arg
provided to us, and the data for this item.
(Syntactically, we have to wrap the dereference in parentheses, but otherwise, it’s just like any other function call.)
Arguments
Sometimes, though not always, the caller will need a way of communicating other information to the itemfunc
- not just the information about the item our iterator can provide.
Thus, the iterator takes arg
, a pointer to arbitrary something, and passes it right on through to the itemfunc
.
This mechanism is general-purpose:
- pass
arg=NULL
if the caller has no need to send additional arguments to itsitemfunc
; - pass a pointer to a simple variable if the caller just needs to get information into the
itemfunc
; - indeed, in that case, the variable is passed by reference and thus the
itemfunc
can update the variable if needed; - furthermore, if the caller needs to send multiple things to the
itemfunc
, it can pass a pointer to astruct
holding those things.
Let’s look at some examples.
Examples with iterators
Printing.
First, let’s suppose we don’t have (or don’t like) the set_print()
method, which tends to print some text/formatting around each item.
Our settest.c
uses the iterator to print just the keys:
...
printf("\nSimpleprint:\n");
set_iterate(set, stdout, simpleprint);
printf("\n");
...
/* print the given item to the given file;
* just print the key
*/
static void
simpleprint(void* arg, const char* key, void* item)
{
FILE* fp = arg;
if (fp != NULL) {
fprintf(fp, "%s ", key);
}
}
Here, we pass the file pointer through the arg
parameter.
Notice how simpleprint
immediately copies its arg
parameter into a local variable of the right type.
In this case, simpleprint
ignores the item
pointer; that’s fine.
Counting.
Even simpler, we could just count the items.
But where do we put the counter?
Define a local variable and pass its address as the arg
as seen in bagtest.c
:
printf("\nCount (should be %d): ", namecount);
bagcount = 0;
bag_iterate(bag1, &bagcount, itemcount);
printf("%d\n", bagcount);
...
/* count the non-null items in the bag.
* note here we don't care what kind of item is in bag.
*/
static void itemcount(void* arg, void* item)
{
int* nitems = arg;
if (nitems != NULL && item != NULL)
(*nitems)++;
}
Multiple arguments.
What if we want two counters?
In my version of settest
I read in weather reports, storing a struct weather
as the item for each weather station; that struct includes a member tempF
representing the current temperature in Fahrenheit.
Let’s write an iterator to count the number of hot spots and cold spots.
// a little structure to carry two counters.
struct hotcold {
int hot;
int cold;
};
...
printf("\nSummary: ");
struct hotcold counts = {0, 0};
set_iterate(set, &counts, tempcount);
printf("%d hot, %d cold\n", counts.hot, counts.cold);
...
static void
tempcount(void* arg, const char* key, void* item)
{
if (arg != NULL && item != NULL) {
struct hotcold* ctrs = arg;
struct weather* wp = item;
if (wp->tempF > 80) ctrs->hot++;
if (wp->tempF < 32) ctrs->cold++;
}
}
Once the structure is defined, our use of the iterator is much like the previous example.
We define and initialize a local variable, pass a pointer to that variable as our arg
, and then the itemfunc
copies that pointer into a pointer of the relevant type so it can access (and update) the contents.
Notice that it is not necessary to malloc
space in order to pass a pointer to the iterator - in this example, &counts
is a pointer to a local variable.
That variable just happens to be a struct
instead of an int
.
Merging sets with set_iterate()
The ‘bag’ module is nice because it is very simple, but the ‘set’ module allows us to do more interesting things when the items have a key as well. Let’s look at some more complex examples.
Merging two sets.
We’ll start with a simple case.
Here the sets represent schools, where the key
is the name of the school.
See set_iterate1.c.
(Note also set_iterate.makefile.)
set_t* setA;
set_t* setB;
set_t* result;
... initialize each set with set_new
... fill setA and setB with set_insert
printf("\nMerge of setA into result: \n");
set_merge(result, setA);
set_print(result, stdout, itemprint);
putchar('\n');
printf("\nMerge of setB into result: \n");
set_merge(result, setB);
set_print(result, stdout, itemprint);
putchar('\n');
...
/* Merge the second set into the first set;
* the second set is unchanged.
*/
static void
set_merge(set_t* setA, set_t* setB)
{
set_iterate(setB, setA, set_merge_helper);
}
/* Consider one item for insertion into the other set.
*/
static void
set_merge_helper(void* arg, const char* key, void* item)
{
set_t* setA = arg;
if (set_insert(setA, key, item))
printf("\t%s added\n", key);
else
printf("\t%s exists\n", key);
}
Notice how the above approach iterates over one set (setB) and, for each item in that set, tries to insert or update its value in the first set (setA). At the end, setB is unchanged but setA should have all items from both sets.
Merging two sets and their data. Now a more interesting case, in which the set items each hold data we want to combine. In this simple test, that datum is just an integer - actually, a pointer to an integer. See set_iterate2. (Note also set_iterate.makefile.)
In this example, setA and setB each contain a set of school names, and a number for each school (say, perhaps, the number of people you know at each school). When merging two sets you want the data in the resulting set to represent the sum of the values in each set.
set_insert(setA, "Dartmouth", intsave(20));
...
printf("\nMerge of setA into result: \n");
set_merge(result, setA);
set_print(result, stdout, itemprint);
putchar('\n');
printf("\nMerge of setB into result: \n");
set_merge(result, setB);
set_print(result, stdout, itemprint);
putchar('\n');
...
/* Merge the second set into the first set;
* the second set is unchanged.
*/
static void
set_merge(set_t* setA, set_t* setB)
{
set_iterate(setB, setA, set_merge_helper);
}
/* Consider one item for insertion into the other set.
* If the other set does not contain the item, insert it;
* otherwise, update the other set's item with sum of item values.
*/
static void
set_merge_helper(void* arg, const char* key, void* item)
{
set_t* setA = arg;
int* itemB = item;
// find the same key in setA
int* itemA = set_find(setA, key);
if (itemA == NULL) {
// not found: insert it
set_insert(setA, key, intsave(*itemB));
printf("\t%s added\n", key);
} else {
// add to the existing value
*itemA += *itemB;
printf("\t%s exists\n", key);
}
}
static int*
intsave(int item)
{
int* saved = mem_assert(malloc(sizeof(int)), "intsave");
*saved = item;
return saved;
}
The overall structure of the code is identical to the prior example; the difference here is that we need to look up the key in the destination set first and then perhaps update its data - because we get a pointer to the data, we can easily reach in and update its value!
Merging counter sets with counters_intersect
Intersection is tricky (and inefficient!) with the functions available in the counters module. To accomplish an intersection of two counters, we need pseudo code like this:
result = counters_new()
for each (key,item) in c1
if key found in c2
add (key,item) to result
else
add (key, 0) to result
That second line is like calling counters_iterate(c1, ...)
, but then how does its helper function accomplish the contents of that loop?
It needs access to both c2
and result
.
Remember how a previous lecture passed two sets into an iterator and hence into the helper function?
We can do the same here; declare a small struct
to hold pointers to both result
and another
.
struct twocts { counters_t *result, *another } ; //store result in first counter
struct twocts args = { c1, c2 };
counters_iterate(c1, &args, intersect_helper);
The pointer to args
will pass through counters_iterate
and be available to intersect_helper
and thus that function can implement remainder of our pseudocode above.
Intersecting two sets and their data
/******** local data types *******/
struct twocts {
counters_t *result;
counters_t *another;
};
/******** local function prototypes *******/
void counters_intersect(counters_t* ct1, counters_t* ct2);
void intersect_helper(void *arg, const int key, const int count);
static inline int min(const int a, const int b) {
return (a < b ? a : b);
}
/******** main *******/
int main()
{
// create two counters for demo
counters_t *c1 = mem_assert(counters_new(), "counters_new() failed");
counters_t *c2 = mem_assert(counters_new(), "counters_new() failed");
// init counters 1
counters_set(c1, 3, 6);
counters_set(c1, 4, 7);
counters_set(c1, 5, 1);
counters_set(c1, 7, 4);
// init counters 2
counters_set(c2, 1, 3);
counters_set(c2, 3, 2);
counters_set(c2, 5, 4);
counters_set(c2, 6, 6);
counters_set(c2, 7, 3);
// take the intersection, store the results in c1
counters_intersect(c1, c2);
counters_print(c1, stdout);
printf("\n");
// clean up
counters_delete(c1);
counters_delete(c2);
return 0;
}
void counters_intersect(counters_t* ct1, counters_t* ct2)
{
mem_assert(ct1, "counters 1 invalid");
mem_assert(ct2, "counters 2 invalid");
struct twocts args = {ct1, ct2};
counters_iterate(ct1, &args, intersect_helper);
}
void intersect_helper(void *arg, const int key, const int count)
{
struct twocts *two = arg;
counters_set(two->result, key, min(count, counters_get(two->another, key)));
}