Lab 2 - C and Bash

Due Tuesday, April 12, at 10pm

This second lab consists of a bash script and three small C programs.

Reminder
Preparation
Assignment
What to hand in, and how

Reminder

Grading will focus on CS50 coding style - including consistent formatting, selection of identifier names, and use of meaningful comments - in addition to correctness and testing.

Your C code must compile without producing any compiler warnings. You will lose points if the compiler produces warnings when using our CS50-standard compiler flags.

Preparation

[MacBook ~]$ ssh cs50
[plank ~]$ cd cs50/labs

Clone the starter kit: visit GitHub Classroom, accept the assignment, and clone the repository to your labs directory. It will look something like this, assuming your GitHub username is XXXXX:

$ git clone git@github.com:cs50-spring-2022/lab2-XXXXX.git
  Cloning into 'lab2-XXXXX'...

The clone step will create a new directory ~/cs50/labs/lab2-XXXXX.

Create sub-directories for this lab:

$ cd lab2-XXXXX
$ mkdir regress chill word histo

Next, edit README.md to remove instructions, add your name, add your username. You can use this file to provide any overall comments you want to convey to the grader.

Remember that you can preview Markdown files with various Markdown-editing or -rendering tools (see: Markdown resources) but we will read it on GitHub.com, so before you make your final submission decision be sure to check it there.

Assignment

Please follow the CS50 coding style.

Design, write, document, and fully test the following three separate C programs and one bash script.

Point breakdown:

(25 points) regress.sh
(25 points) chill.c
(25 points) words.c
(25 points) histo.c

regress.sh

Regression testing is important to any quality software-development process. As a software project evolves, each new revision is tested against a thorough suite of test cases to ensure that the new revision still performs correctly where it had before. As new functionality is added, new tests are added to the suite.

Write a bash script regress.sh to perform regression testing. Its command line looks like

./regress.sh dirname testfilename...

where dirname is the name of a directory containing test results, and
where each testfilename is the name of a file containing bash command(s) for one test case. Note that test files can be in other directories.

The script verifies the validity of its arguments (exit with non-zero status on any error):

there must be at least two arguments;
if something by the name dirname exists, it must be a directory;
each testfilename must be a regular file and be readable.

After checking its arguments, the script creates new directory whose name has the form YYYYMMDD.HHMMSS, representing the current date and time, in the current directory. (For example, 20220404.203038.) If any error, exit with non-zero status.

The script then runs each test case with bash, redirecting stdin from /dev/null, producing three files for each case:

YYYYMMDD.HHMMSS/testfilename.test - a copy of testfilename
YYYYMMDD.HHMMSS/testfilename.status - the exit status of bash testfilename
YYYYMMDD.HHMMSS/testfilename.output - the stdout and stderr from bash testfilename

If the directory dirname does not exist, YYYYMMDD.HHMMSS is renamed dirname. Exit 0 if success, non-zero if any error.

If the directory dirname already exists, YYYYMMDD.HHMMSS is compared with dirname to provide a brief listing of any differences - or the simple statement “no differences”. Exit 0 if no differences, non-zero if differences.

In typical usage, the first time the developer runs regress.sh, the script creates a directory by name dirname; in subsequent runs, it compares the new test results with those from the prior run. Over time, directories YYYYMMDD.HHMMSS accumulate, providing a historical record of test results.

Output and Exit:

Brief but helpful progress/outcome messages.

Useful error messages, if any errors occur.

Non-zero if any error, or any differences from the earlier dirname directory.

Zero if dirname created successfully, or there were no differences from an existing dirname.

Example:

Suppose I used regress.sh to support development of my query.sh. My regress/testdir directory contains the script and four test files, each with a one-line command. regress.sh is under the regress/ directory. I start out by listing the four test cases, then I run regress.sh twice, then I add a test case, then I change a test case. Finally, I test some erroneous cases.

[lab2/regress/testdir]$ ls
query.sh*  test0  test1  test2  test3
[lab2/regress/testdir]$ cat test?
cat query.sh
./query.sh
./query.sh MA
./query.sh MA 02/01/2022
[lab2/regress/testdir]$ ../regress.sh base test?
saved test results in base
[lab2/regress/testdir]$ ../regress.sh base test?
saved test results in 20220404.204718
comparing 20220404.204718 with base...
no differences
[lab2/regress/testdir]$  echo "./query.sh TX 02/02/2022" > test4
[lab2/regress/testdir]$ ls
20220404.204718/  base/  query.sh*  test0  test1  test2  test3  test4
[lab2/regress/testdir]$ ../regress.sh base test?
saved test results in 20220404.205006
comparing 20220404.205006 with base...
Only in 20220404.205006: test4.output
Only in 20220404.205006: test4.status
Only in 20220404.205006: test4.test
[lab2/regress/testdir]$ ls
20220404.204718/  base/      test0  test2  test4
20220404.205006/  query.sh*  test1  test3
[lab2/regress/testdir]$ 
######## now some error cases
[lab2/regress/testdir]$ ../regress.sh 
usage: regress.sh dirname testfilename...
[lab2/regress/testdir]$ ../regress.sh base
usage: regress.sh dirname testfilename...
[lab2/regress/testdir]$ ../regress.sh test?
first argument ('test0') is not a directory
[lab2/regress/testdir]$ ../regress.sh /base test?
mv: cannot create directory '/base': Permission denied
failed to save test results in /base; they remain in 20220404.205113
[lab2/regress/testdir]$ ../regress.sh base testing
test case 'testing' is not a file (or not readable)
[lab2/regress/testdir]$ ../regress.sh base base
test case 'base' is not a file (or not readable)
[lab2/regress/testdir]$ chmod -r test?
[lab2/regress/testdir]$ ../regress.sh base test?
test case 'test0' is not a file (or not readable)
[lab2/regress/testdir]$

Note above how I use the bash globbing syntax ? to indicate a wildcard that matches any single character; thus, test? expands to

test0 test1 test2 test3 test4

Note how test0 simply prints the current copy of query.sh, which adds nicely to the historic record.

The name of test files is not important to regress.sh, but a development team may want to agree on a naming convention. For example, suppose you chose to name them all with extension .test. If you had saved the first run of regress.sh in a directory named base, you could then run future tests as
./regress.sh base *.test

Just to be clear, each testfile contains bash command(s), and your regress.sh script should execute those commands by running bash and providing testfilename as an argument. But you should run each test only once within any given run of regress.sh – not only is that more efficient, it’s possible that the commands within some test files might actually not be amenable to being run multiple times (e.g., if they have side effects like creating or removing files).

It’s easily possible to redirect the stdin, stdout, and stderr, all in one run of a test - and to catch the exit status of that run.

Assumptions:

dirname may be a pathname referring to any directory on the computer - it may not be a subdirectory of the current directory.
the testfilenames are pathnames referring to files anywhere - not necessarily files in the current directory.
regress.sh itself might not be in the current directory; it could be in the PATH or could be launched by providing its pathname.

Hints:

Check out the date command and its + option.

Check out the basename command.

Check out the diff --brief command form.

Check out the shift built-in bash command and this example.

chill.c

Write a program to calculate “wind chill” based on the current temperature and wind speed. The standard formula for this calculation is:

    Wind Chill = 35.74 + 0.6215T - 35.75(V^0.16) + 0.4275T(V^0.16)

where T is the temperature in degrees Fahrenheit (when less than 50 and greater than -99) and V is the wind velocity in miles per hour. The ^ character denotes exponentiation. Note that the above formula is not in C programming language syntax.

Input:

No input files; stdin is ignored.

The user may run your program with no arguments, one argument, or two arguments as explained below.

Output (no arguments):

If the user provides no arguments to your program, it should print out a table of temperatures (from -10 to +40 by 10’s) and and wind speeds (from 5 to 15 by 5’s). Your output should look similar to the following, with nice columns and titles:

    $ ./chill
    Temp    Wind    Chill
    ----    ----    -----
     -10       5    -22.3
     -10      10    -28.3
     -10      15    -32.2

       0       5    -10.5
       0      10    -15.9
       0      15    -19.4

      10       5     1.2
      10      10    -3.5
      10      15    -6.6

      20       5     13.0
      20      10     8.9
      20      15     6.2

      30       5     24.7
      30      10     21.2
      30      15     19.0

      40       5     36.5
      40      10     33.6
      40      15     31.8

Output (one argument):

If the user provides one argument, it will assumed to be a temperature (expressed as a floating-point number). If that temperature is less than 50 and greater than -99, it is acceptable; chill then prints a table of wind speeds (from 5 to 15 by 5’s) and the calculated wind chills for that temperature only. Your program’s output for one argument should look like this:

    $ ./chill 32
     Temp   Wind   Chill
    -----   ----   -----
      32      5     27.1
      32     10     23.7
      32     15     21.6

Output (two arguments):

If the user provides two arguments, they will be temperature and velocity, respectively (expressed as floating-point numbers). The temperature must be less than 50 and greater than -99. The velocity must be greater than or equal to 0.5.

If the arguments are acceptable, then your program should calculate and print the wind chill for that temperature and velocity only.

Your program’s output for two arguments should look like this:

    $ ./chill 5 20
     Temp    Wind   Chill
     -----   ----   -----
        5     20    -15.4

If either argument is out of range, your program should issue a message and exit. Here’s an example:

    $ ./chill 55
    Temperature must be less than 50 degrees Fahrenheit

    $ ./chill 10 0
    Wind velocity must be greater than or equal to 0.5 MPH

In the preceding examples some values were printed as integers and some as decimal fractions. You may print everything in the format “x.y”, if you wish, but do not print more than one decimal place. Indeed, it may be wise to use this format when the user specifies temperature or windspeed, because the user may specify a non-integral value and it may be misleading to print it as an integer.

Output (more than two arguments):

print a “usage” line and exit with error status.

Exit:

If the program terminates normally, it exits with a return code of 0. Otherwise, it terminates with a documented, non-zero return code.

Compiling:

You will likely need the math library. To use it, add #include <math.h> to your chill.c file, and add -lm to your mygcc command. (That is “dash ell emm”, which is short for “library math”.)

mygcc chill.c -lm -o chill

words.c

Write a C program called words that breaks its input into a series of words, one per line. It may take input from stdin, or from files whose names are listed as arguments.

Usage:

words [filename]...

Input:

When no filenames are given on the command line, words reads from stdin.

When one or more filenames are given on the command line, words reads from each file in sequence.

If the special filename - is given as one of the filenames, the stdin is read at that point in the sequence.

Output:

In any case, the stdout should consist of a sequence of lines, with exactly one word on each output line (i.e., each output line contains exactly one word and no other characters). A “word” is a sequence of letters or a single letter.

Although you may be tempted to think of the input as a sequence of lines, it may be helpful to think of it as a sequence of characters.

Note it is possible for the output to be empty, if there are no words in any of the input files.

Any error messages are written to stderr.

Exit:

If the program terminates normally, it exits with a return code of 0. Otherwise, it terminates with a documented, non-zero return code.

Hints:

Check out man ctype.

Consider a function that processes a file, given a FILE* as parameter.

Remember that stdin is just a FILE* and can be used anywhere a FILE* might be used for reading. Remember that stdin is not always attached to the keyboard - the input of words may be from a pipe or a file (e.g., ./words < thesis.txt).

histo.c

Write a program that reads a series of positive integers from stdin, and prints out a histogram. There should be 16 bins in your histogram. The catch? You don’t know in advance the range of input values; assume the integers range from 0 to some unknown positive maximum. Thus, you will need to dynamically scale the bin size for your histogram. An example is below.

Usage:

There are no command-line arguments.

Requirements:

You must begin with bin size 1, and double it as needed so all positive integers observed on input fit within the histogram.

You must have 16 bins. The number ‘16’ should appear only once in your code.

Input:

Input is read from stdin, whether from the keyboard, redirected from a file, or piped in from another command. Assume the input contains only integers, separated by white space (space, tab, newline). Assume the smallest integer is zero; ignore any negative integers. If there is non-integer non-space content in the file, it is ok for the program to treat that as the end of input; the program should not crash, or enter an infinite loop – it should just silently behave as if there are no more integers. (These assumptions make it easy to use scanf for your input.)

As always, any other assumptions you make should be documented in your README file and your testing procedure should be documented in your TESTING file.

Output:

See examples below.

Exit:

This program has no arguments and does not check its input for errors, so it should always exit with zero status.

Examples:

Here we compile and run the program, and type a set of numbers (spread over three lines, but it doesn’t matter as long as I put space or newline between numbers), ending with ctrl-D on the beginning of a line. (That sends EOF to the program.) It then printed a histogram, nicely labeling each line with the range of values assigned to that bin, and printing the count of values that fell into that bin.

$ mygcc histo.c -o histo
$ ./histo
16 bins of size 1 for range [0,16)
3 -4 5 1 7 0
8 0 15 12 3 5 
3 3 3 3 3 
^D
[ 0: 0] 2
[ 1: 1] 1
[ 2: 2] 
[ 3: 3] 7
[ 4: 4] 
[ 5: 5] 2
[ 6: 6] 
[ 7: 7] 1
[ 8: 8] 1
[ 9: 9] 
[10:10] 
[11:11] 
[12:12] 1
[13:13] 
[14:14] 
[15:15] 1
$

The notation [a,b) includes all values x such that a <= x < b, that is, the range includes a but does not include b. For example, [0,4) = {0, 1, 2, 3}.

Now watch what happens if I input a number outside the original range of [0,16).

$ ./histo
16 bins of size 1 for range [0,16)
3 -4 5 1 7 0
8 0 15 12 3 5 
18
16 bins of size 2 for range [0,32)
19 20 30 7 12
50
16 bins of size 4 for range [0,64)
34
32
19
44
^D
[ 0: 3] 5
[ 4: 7] 4
[ 8:11] 1
[12:15] 3
[16:19] 3
[20:23] 1
[24:27] 
[28:31] 1
[32:35] 2
[36:39] 
[40:43] 
[44:47] 1
[48:51] 1
[52:55] 
[56:59] 
[60:63] 
$

Each time it sees a number outside the current range, it doubles the range and doubles the size of each bin. (Notice also the [low:high] labels in the histogram; this notation includes both low and high and everything in between.) It might have to repeat the doubling if I put in a number well past the current bin size:

$ ./histo
16 bins of size 1 for range [0,16)
150
16 bins of size 2 for range [0,32)
16 bins of size 4 for range [0,64)
16 bins of size 8 for range [0,128)
16 bins of size 16 for range [0,256)
^D
[  0: 15] 
[ 16: 31] 
[ 32: 47] 
[ 48: 63] 
[ 64: 79] 
[ 80: 95] 
[ 96:111] 
[112:127] 
[128:143] 
[144:159] 1
[160:175] 
[176:191] 
[192:207] 
[208:223] 
[224:239] 
[240:255] 
$

Here’s an example using bash syntax to generate a list of numbers, and piping the output to histo:

$ echo {1..16} 150 | ./histo
16 bins of size 1 for range [0,16)
16 bins of size 2 for range [0,32)
16 bins of size 4 for range [0,64)
16 bins of size 8 for range [0,128)
16 bins of size 16 for range [0,256)
[  0: 15] 15
[ 16: 31] 1
[ 32: 47] 
[ 48: 63] 
[ 64: 79] 
[ 80: 95] 
[ 96:111] 
[112:127] 
[128:143] 
[144:159] 1
[160:175] 
[176:191] 
[192:207] 
[208:223] 
[224:239] 
[240:255] 
$

Although we scale the bin size, I’m not asking you to scale the bin count, which is fixed to be 16.

I took some pains to format the [low:high] range indicators for each row, using a fixed-width field just wide enough to hold the biggest number. It’s a nice touch (read man printf for some clues) but it’s ok if you make a simpler assumption (e.g., always use 6-digit field width).

Representing a histogram:

You will need an array of 16 bins to represent the number of integers observed in each bin. You’ll need to keep track of the bin size and the range of the histogram. If you observe a value outside the range, you should double the bin size and range - but first you need to compress the current 16 bins into the first 8 bins. You’ll likely need one loop to compute the new values for the lower half of the bins (each bin receiving the sum of two bins’ counts), and then another to assign the new value (0) to the upper half of the bins.

(Again: the number ‘16’ may only occur once in your code; scattering hard-coded numbers around your code is bad style.)

Notice that the number of bins, bin size, and histogram range are all powers of 2.

What to hand in, and how

Make sure to compile and test your solutions on Thayer plank server before submission. If you choose to develop your solutions on some other system, you are responsible to ensure that the work you submit runs correctly on the plank server — which is where where we will test it.

When your lab2-XXXXX directory should contain a README.md file and four subdirectories. You should add only the necessary files to your repo. (Do not commit any compiled C programs!) Consider adding a .gitignore file in each sub-directory to ignore the binary file (e.g., *.o files and the executable file) and other unrelated files (e.g., temporary files). Your lab2-XXXXX directory should have following structure:

lab2-XXXXX
├── README.md
├── chill
│   ├── .gitignore
│   ├── README.md (optional)
│   ├── TESTING.md
│   └── chill.c
├── histo
│   ├── .gitignore
│   ├── README.md (optional)
│   ├── TESTING.md
│   ├── histo.c
├── regress
│   ├── .gitignore
│   ├── README.md (optional)
│   ├── TESTING.md
│   └── regress.sh
└── words
    ├── .gitignore
    ├── README.md (optional)
    ├── TESTING.md
    └── words.c

This listing was produced by the tree command. Neat, huh?

For example, you might add the chill files with

cd chill; git add README.md .gitignore chill.c TESTING.md

Make sure you will commit the correct set of files:

git status

It will list the files that will be committed, near the top, and the files that will not be committed, or are ‘untracked’, near the bottom. Make sure the desired files are listed at the top! Make sure no binaries (compiled programs) or editor temporary files (like chill.c~) will be committed - they should be ignored by git via .gitignore files.

Study the status output carefully: if you miss adding a file we need, you’ll lose points, and if you add a scratch or binary file that should not be in the repo, you’ll lose points.

Commit your changes:

git commit -m "SUBMIT"

This command will commit your files and mark this commit with a message “SUBMIT”, indicating that we should grade this commit.

Push your changes to GitHub:

git push

Make sure you left nothing unexpected behind:

git status

If you need to make updates, repeat the add, commit, push sequence.

You can git commit -m "SUBMIT" again, before the deadline, if you realize that you’d forgotten something. We will grade the last commit with message “SUBMIT” pushed before the deadline.

To see your Lab submissions, as we see them, visit classroom.github.com and then click on Lab 2. You will then be able to click on your specific Lab 2 repository, and see the files and commit history as GitHub has them. Remember that it is the commit that has a commit message, not files.

If you decide to submit late: If you did not push a commit with the message “SUBMIT”, before the deadline, we will look again at 24h, 48h, and 72h after the deadline. We will grade the first commit with the message “SUBMIT”. We will apply late penalty accordingly.

Late commits will be ignored if they follow any commit with message “SUBMIT”.