CS 50 Software Design and Implementation

Lab5 Rubric

Remember a program or a function should do one thing and one thing well. You should start to write good readable code from the start. Break your programs into meaningful functions. Be defensive when dealing with user input. Exit gracefully and if an error has been encountered – IO failed – inform the user in a meaningul but not verbose way – remember the flip side: silence is goldern.

Defensive coding: you should always assume that the user will enter incorrect input. So how are you going to deal with it? Tips: make sure you check the number of parameters entered against what you expect and their correctness.

Modular code: beakdown your code in to meaningful functions, e.g., convertAddress(). Make it more reabale. Reusable, easier to test and debug. The functional decomposition of crawler covers: file processing, (url) list processing, dictionary processing, html processing (e.g., parsing). Make sure that the major functional (processing) decomposition is reflected in meaningful source files (e.g., dictionary.[ch]) and functions (e.g., getDataWithKey()).

Debugging trick bugs with valgrind and gdb: You encounter many different bugs while coding crawler. Use valgrind and gdb to clear bugs associated with dynamic memory management and pointers.

Exit gracefully: clean up and inform the user, assert conditions.

There is the detailed rubric:

Correctness (1-10) [70%]

WARNING: DO NOT SUBMIT SEGFAULTED CODE:

In all cases below there is no grade given for a program that segfaults. A student will be asked to resubmit their segfaulted code. The resubmitted code will be graded out of 50% of the original grade. This is a significant penalty so make sure that you double check that your code compiles correctly and runs correctly without segfaults. Make your code is defensive to incorrect user input.

The index operates in two modes: 1) building the index and saving into a file; and 2) rebuilding the index from the saved file. Check that your indexer˙test.sh script implements all the required tests as discussed in the lab description and takes care of user input.

– Is the program correct (i.e., does it work) and error free?

1. write a makefile to build crawler [6%]

– make -4%

– make clean - 1%

– make debug -1%

2. write a bash test script (index˙test.sh) [12%]

to test command line arguments and run crawler on the SEED directory at level 0, 1, 2 and 3. Your script should:

– Create a log file to save all output of the test script -2%

– Do a clean build (make clean followed by a make) -2%

– Do input argument testing on crawler

— TEST1: check for wrong number of arguments -2%

— TEST2: check for wrong directory -2%

- Index a directory, recorded it into a file (index.dat) and sort. Then read index.dat to memory and write it back to see whether program can read in and write out index storage file correctly. -4%

In all cases exit the script if you get incorrect $?

3. Your indexer should build the index successfully on the data crawled at http://www.cs.dartmouth.edu/ campbell/ and rebuild and save it. [20%]

– The indexer should write the index, read index.dat to rebuild the index and then save the rebuilt -5%

– a “diff” of index.dat (the save index) and re-built/saved new index should give 0 (that is they should be identical files) - 5%

- If the number of words in your index.dat is greater than or less than 10% of the solution -5%

4. Entries for the same word should be on the same line. All word should be in lowercase. [5%].

5. The indexer should not contain any strange characters indicating errors in getNextWord processing. [2%]

6. The indexer should contain document counts. [5%]

7. You should have a good file decomposition and modular design. [10%]

– no design and implementation spec - 2%

– no pseudocode description of functions -2%

– little file and function decomposition - 2%

– no file decomposition - 4%

Clarity (1-10) [10%]

– Is the code easy to read, well commented, use good names

– for variables and functions. In essence, is it easy to understand and use

– [K&P] Clarity makes sure that the code is easy to understand for people and machines.

– Too much mysterious variable names with comments -

Simplicity (1-10) [10%]

– Is the code as simple as possible.– [K&P] Simplicity keeps the program short and manageable

– a program or function should do one thing and one thing well.

Generality (1-10) [10%]

– Can the program easily adapt to changes or modifications.

– [K&P] Generality means the code can work well in a broad range of situations and adapt

Extra Credit [10%]

There should be no memory leaks in your code.