With your group, revisit the Lab 5 indexer following the principle of cohesion and coupling.

We ask ourselves this question: How do we decompose indexer.c into functions so that each function contains cohesive operations while the connection among functions is loose?

Lab 5 review

From the DESIGN document: The indexer (indexer.c) will run as follows:

parse the command line, validate parameters
call indexBuild, passing pageDirectory
save index to file
clean up data structures

where indexBuild: takes a pageDirectory parameter and returns an index data structure:

  creates a new 'index' object
  loops over document ID numbers, counting from 1
    loads a webpage from the document file 'pageDirectory/id'
    if successful, 
      passes the webpage and docID to indexPage

where indexPage:

 steps through each word of the webpage,
   skips trivial words (less than length 3),
   normalizes the word (converts to lower case),
   looks up the word in the index,
     adding the word to the index if needed
   increments the count of occurrences of this word in this docID

To implement this functionality, in the Lab 5 documentation under Hints and tips

We strongly recommend you add an index module to the common library – a module to implement an abstract index_t type that represents an index in memory, and supports functions like index_new(), index_delete(), index_save(), and so forth. Tip: much of it is a wrapper for a hashtable.

The index module implements the index data structure. From the Lab 5 DESIGN document major data structures

The key data structure is the index, mapping from word to (docID, #occurrences) pairs. The index is a hashtable keyed by word and storing counters as items. The counters is keyed by docID and stores a count of the number of occurrences of that word in the document with that ID.

Activity

Work with your group, discuss how you’d implement such a wrapper and then how you would use it to implement the indexer described above. Also consider how you would further decompose indexBuild to functions, considering cohesion and coupling.

Solution

A potential solution