CS 50 | Software Design and Implementation

Today we talk about testing. We’ll cover a number of different types of testing, including integration, regression, fuzz and acceptance testing, but we will focus on unit testing.

Read about some famous bugs, some resulting in life-threatening conditions or death, in this lecture extra

Goals

Understand different types of test
Become versed in unit testing.

Activity

In today’s activity your group will design a unit test for one of our other modules.

In this unit we emphasize the importance of testing. We revisit how to turn on/off test code with Makefiles and #ifdef, mentioned earlier in about conditional compilation. In subsequent units we’ll look at some specific methods and examples.

Common errors

“A 2016 survey found that null pointers caused more crashes than any other coding error among the top 1,000 Android apps, and on Facebook’s app.” IEEE Spectrum

Image: Facebook, copied from IEEE Spectrum

Debugging vs. testing

“Testing can demonstrate the presence of bugs but not their absence.” – Edsger Dijkstra

This quote is a good to keep in mind when you are developing code. What is the difference between debugging and testing? You debug when you know or have identified problems in your code. Testing is the art of systematically trying to break code, which you think (hope?) is bug free. We test throughout the software life cycle because it is typically much less expensive to fix a bug earlier in the software’s life than later.

These CS50 units are strongly influenced by The Practice of Programming, by Brian Kernighan and Rob Pike [K&P 1999], an outstanding book that will advance your knowledge on good programming practices. The notes use a number of programming examples from their Chapter 6 on Testing.

Motivational example

Sometimes software updates add new bugs that wouldn’t be caught without running all the tests again. Here’s an example from February 2014 in Apple MacOS X and iOS where hackers could trick those systems into accepting SSL (TLS) certificates that should be rejected.

    . . .
    hashOut.data = hashes + SSL_MD5_DIGEST_LEN;
    hashOut.length = SSL_SHA1_DIGEST_LEN;
    if ((err = SSLFreeBuffer(&hashCtx)) != 0)
        goto fail;
    if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail;  /* MISTAKE! THIS LINE SHOULD NOT BE HERE */
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;

    err = sslRawVerify(...);
    . . .

Yes, C does have a goto statement. It works as expected, providing an immediate transfer to the specified label (but you should not use it!).

Here the programmer is verifying the certificate; if any of the checks fail the err variable is set non-zero and the code exits to the failure handler. Unfortunately, the programmer inserted two goto failstatements instead of just one. As a result, even if err==0 the code transfers to the fail section. That returned with a return code of err which, if err==0 means success, means the certificate is reported as valid, and ultimately accepted. This bug could allow website spoofing or the acceptance of a bogus certificate containing a mismatched private key.

This bug was overlooked because the second goto is indented - but it is not part of the if statement. Proper use of CS50 style (putting brackets around every code block) would have prevented this bug from manifesting itself!

This bug emphasizes the need for good style and for careful unit testing. For a lengthy but interesting commentary, see Goto Fail, Heartbleed, and Unit Testing Culture.

For more examples, read the next unit about famous bugs.

Types of testing

There are many kinds of testing, each with its own purpose. At a high level we can describe different types of testing in term of glass-box testing and black-box testing. In glass-box testing you can peek inside the thing being tested and see the source code. In black-box testing you can only access the thing’s public interface: you can send the box input and see only what comes out of the box — you do not have access to the source code.

Unit testing

The goal of unit testing is to verify that each functional unit of the system is working properly. It’s written from the unit developer’s perspective, and aims to test the internal workings of the unit. If this testing isn’t done well, all of the subsequent testing is more painful, slow, and sometimes meaningless! This type of testing typically glass box. We focus the latter half of this unit on unit testing.

Functional testing

The goal of functional testing is to verify that the unit, module, sub-system, or system is doing what its user wants. For a system, the ‘user’ is the customer; for a unit, the ‘user’ may be a developer of other components that depend on this unit. This testing usually requires “scaffolding” or a “test harness” to exercise the device or unit under test. Functional testing is typically done as black-box testing.

“Unit tests tell a developer that the code is doing things right; functional tests tell a developer that the code is doing the right things.” – Jeff Canna, Testing, Fun, Really.

Integration testing

After unit testing, the integration test verifies that the units (modules) can communicate properly with each other. Interfaces defined by the units (modules) are properly implemented and used. No special channels or connections should be used, even if they make things run faster!

System testing

Also known as “big bang” testing, this is where you put the whole thing together and run tests to ensure that it meets the stated requirements without causing any unexpected side-effects. If you skip over the integration tests, you may encounter serious problems in system tests - problems that may be very expensive to fix.

Regression testing

Regression testing is an automated means of running (and re-running) unit tests, integration tests, and sometimes system tests, each time you prepare a new release. (For unit and integration tests, each time you ‘git push’! Indeed, some source-code control systems automatically run unit and integration tests before a commit is accepted, rejecting the commit if any tests fail.) The goal is to determine if any changes made to the system broke other, previously working, functionality.

Usability testing

This is testing with real users. Not other programmers or engineers, and not friends and neighbors, but the real users. When this isn’t possible, you have to really work hard to find objective testers who will act like the end users. People who have (or can simulate) the same level of experience, biases, and past experience are the kinds of usability testers you want.

In some products, you must also conduct accessability testing to determine whether your software is accessible for people with various disabilities.

Security testing

Security tests verify security properties, e.g., to ensure that sensitive data are always encrypted, passwords and credit-card numbers are masked, and sensitive data is securely erased immediately after use. Sometimes, the team hires outside testers to conduct penetration tests, in which the tester tries to break the system or leak sensitive information. Unfortunately, one can never prove that a system is ‘secure’, that is, it has no bugs that lead to the failure of security or privacy properties. Remember Dijkstra’s quote about testing: one can only demonstrate the presence of bugs, not their absence. This is especially true for security!

Performance testing

Most software needs to perform ‘well’ according to one or more metrics, whether implicit or explict. Performance metrics include speed (how long does it take to complete a task, sometimes called latency), throughput (how many tasks completed per second), memory (code size, stack size, heap usage), network bandwidth, energy (battery drain), or cost (in dollars, such as when using cloud resources billed by usage). Performance testing subjects the software to various loads, measuring the relevant metrics, to determine whether the system performs within acceptable bounds.

Fuzz testing

Usually, previous tests are based on a carefully constructed series of test cases, devised to test all code sequences and push on the edge cases (see unit testing). Such tests, however, are only as good as the test writer - who must study the code (for glass-box testing) or the specs (for black-box testing) to think of the suitable test cases. It’s possible they will miss some important cases.

Another solution, therefore, is fuzz testing, a form of testing in which you fire thousands of random inputs at the program to see how it reacts. The chances of triggering an unconsidered test case is far greater if you try a lot of cases! We look at fuzz testing in more detail in Lab 6.

Acceptance testing

The ultimate test: whether the customer accepts your product. Most of the time these tests are similar to the system tests. However, you will occasionally encounter new, previously unstated “requirements” during acceptance testing.

Finally, the most important tip of all on testing from The Pragmatic Programmer:

Find bugs once.

Once you or one of your testers (or even a customer) find a bug, that should be the last time a human finds that bug. Once identified and fixed, your automated tests should be extended to check for that same bug every time they are run. Finding a bug during regression testing is a lot better than how it was found the first time.

Tips for Testing

Once again, the difference between debugging and testing:

Testing is a determined, systematic attempt to break a program that you think should be working. As discussed above, tools and a test harness can automate this process.
Debugging is what you do when you know that the program crashes (e.g., segfaults), fails (e.g., answers queries incorrecly), underperforms (e.g., runs slowly) or acts inconsistently (e.g., intermittently fails on certain conditions). These are all bugs that testing can find. Better to find them and fix them.

Test Early. Test Often. Test Automatically.

That’s a Pragmatic Programmer Tip.

We should begin testing as soon as we have something to test (Makefile, parsing arguments, initialization, scaffolding, first units). The sooner you find your bugs, the cheaper it will be in terms of your time, others’ time, and downstream support costs.

Pragmatic Programmer Tip:

The coding ain’t done ‘til all the tests run.

Furthermore:

Code that isn’t tested does not work.
Code that isn’t regression tested eventually stops working.
Code that isn’t automatically regression tested to prove it works, does not work.

Thus, you should build automated testing code (unit tests, Makefile rules, shell scripts, test data) from the very beginning.

I write bug free code - testing is a waste of time

Many people have the attitude that testing code is a waste of time and boring - why do it? As Fred Brooks said, “All programmers are optimists”!

Consider the software lifecycle: requirements, design, coding, testing, debugging, integration, and delivery. Professional programmers will tell you that testing and debugging take the largest chunk of time. Thus, it’s smart to develop tools to automate the testing of code.

Perhaps the most important tools are for regression testing, which systematically retest the code periodically. Each of those words is important:

retest: they test older parts of the code even as you develop new parts of the code - just in case your new code broke the old code.
systematic: they run automatically through a large number of tests, validating the results of each test, so neither carelessness nor laziness causes you to overlook important tests.
periodic: they run at critical points in the development process; certainly, right before a new release, but in many organizations they run as part of a “nightly build” that compiles and tests the entire system and all of its units. Woe to the programmer who “breaks the nightly build”!

If all the regression tests pass then you have some confidence that your new code did not introduce additional problems. A more accurate statement might be “the changes didn’t reintroduce any of the bugs that you already knew might exist.”

Write the unit-test code first, and keep it with the object/functions

The goal of effective unit testing is to isolate each functional part of the system and to demonstrate that each of those parts is working properly.

“Developers write unit tests to determine if their code is doing things right. Customers write acceptance tests (sometimes called functional tests) to determine if the system is doing the right things. “ – Roy W. Miller, Christopher T. Collins, Acceptance testing.

Many developers believe it’s best to write the test code before you write the real code. This ensures that you’re paying attention to the specifications first, and not just leaping into the code. When writing these unit tests, try to avoid any testing or dependence on things outside of the unit. Aim to produce a unit test for every function or method that some other code or user will invoke.

Test as you write code

The earlier you find a problem with your code the better; it will save significant time and your company money. You will also get respect from other programmers.

I expect most of you sit at the terminal and hack at the code to reveal and fix bugs - you hack in some printfs and run your code again, iterating hundreds of times. This brute-force method is wasteful, slow, and error prone - you’ll get no respect in industry for this approach. Read your code, think like a computer, and write test code as you write the real code.

Use C idioms and be consistent when you write code

You already know many C idioms even if we haven’t always labeled them that way. In C, there are many ways to write a simple loop:

  // example 1
  i = 0;
  while (i <= n-1)
      array[i++] = 1.0;

  // example 2
  for (i = 0; i < n; )
      array[i++] = 1.0;

  // example 3
  for (i = n; --i >= 0; )
      array[i] = 1.0;

As in any human language, C has idioms, that is, conventional ways that experienced programmers write common pieces of code. Idioms aid understanding because they are immediately recognizable and any errors in their use is also quickly discernible.

All of the loops above would work… but they do not follow the C idiom for such loops. The idiomatic C for loop is:

  // Idiomatic form
  for (i = 0; i < n; i++ )
      array[i] = 1.0;

Being consistent when programming C will help enormously. For example, when you see a piece of code that is not idiomatic you should stop and take a close look at that code: code that is not idiomatic suggests poor code (which may be buggy) … or perhaps there is a good reason why the code does not follow a common idiom; such cases should be labeled with an explanatory comment.

Here are some more examples of C idioms that you should be familiar with:

  // Infinite loop idioms
  for (;;) {
     ....
  }

  // or

  while (1) {
     ....
  }

  // malloc, string copy idiom
  char* newp = malloc(strlen(buf)+1);
  if (newp != NULL)
      strcpy(newp, buf);
  else 
      // handle the malloc error

  // copying characters from stdin to stdout until end of file
  while ((c = getchar()) != EOF)
       putchar(c);

  // traversal of a null-terminated string
  for (char* p=string; *p != '\0'; p++)
      // do something with *p  

Write in idioms as much as possible; it makes it easier to spot bugs - and harder to create them.

Test code at its boundaries

So-called boundary bugs occur at the ‘boundary cases’ of a loop or function (sometimes called ‘edge cases’). For example,

a loop that executes zero times
a function called with a NULL pointer
code expecting a string that receives an empty string
code reading a line of input that receives an empty line
code handling an integer n where n = 0
the base case of a recursive function
a command-line with no arguments
a function to print, iterate, or delete over a data structure that is empty

Every time you write a loop, or a function, think to yourself: “what are the boundary cases? what will this code do in each case?”

Break building blocks on purpose

Consider one of your important ‘building block’ functions. Now break it! Have it still operate, but just produce goofy results. For example, in a hashtable implementation, suppose the hash function always returned the same hash value? Would the hashtable work, perhaps more slowly, or would it crash?

Some of you may like the mem module to help track the balance of malloc and free calls. If you use them for every memory allocation and free, they are a great place to insert debugging code and breakpoints. They are also a great place for testing!

For example, you could cause it to cause an out-of-memory situation after some number of calls (as suggested by Kernighan and Pike):

// Usage example: gcc -DMEMORYTESTLIMIT=50 ...
void*
mem_malloc(size_t size)
{
#ifdef MEMORYTESTLIMIT
  if (nmalloc >= MEMORYTESTLIMIT)
    return NULL; // fake an out-of-memory condition
#endif

  void *ptr = malloc(size);
  if (ptr != NULL)
    nmalloc++;
  return ptr;
}

Test pre- and post-conditions

It is always a good idea to test for pre- and post-conditions: pre-conditions are those you expect to be true before some block of code; post-conditions are those you expect to be true after some block of code executes. For example, our example code often checks whether input values are with in range – an example of pre-condition testing. Let’s look at another simple example out of [K&P 1999] that computes the average of n elements in an array a[]. Closer inspection of the code reveals that there is a problem if n is less than or equal to 0.

float
avg(float a[], int n)
{
  float sum = 0.0;
  for (int i = 0; i < n; i++)
    sum += a[i];
  return sum / n;
}

A natural question is what to do if someone calls avg() with n=0? An array of zero elements does not make much sense but an average of 0 does. Should our code catch the division by zero, perhaps with an assert or abort, or complain, or be silent? One reasonable approach is to just return 0 as the average if n is less than or equal to zero. While the code is idiomatic in style we need to tweak it to test pre-condition, as shown below.

float
avg(float a[], int n)
{
  if (n <= 0)
    return 0;

  float sum = 0.0;
  for (int i = 0; i < n; i++)
    sum += a[i];
  return sum / n;
}

Use ‘assert’

C provides an assertion facility in assert.h useful for pre- and post-condition testing. Asserts are usually used for unexpected failure where there is no clean way to recover the logic control. The avg() function above could the assert function:

#include <assert.h>
...
float
avg(float a[], int n)
{
  assert(n > 0);

  float sum = 0.0;
  for (int i = 0; i < n; i++)
    sum += a[i];
  return sum / n;
}

If the assertion is violated (the condition is false) it will cause an abort and standard message to be printed out:

Assertion failed: n > 0, file avgtest.c, line 7 
Abort(crash)

Assertions should be used only for testing “this should never happen” kinds of conditions… other errors, especially those caused by the user, should be handled more gracefully.

We provide an assert-like function in the mem module, a wrapper for malloc() called mem_malloc_assert():

void*
mem_assert(void* p, const char* message)
{
  if (p == NULL) {
    fprintf(stderr, "NULL POINTER: %s\n", message);
    exit (99);
  }
  return p;
}

void*
mem_malloc_assert(const size_t size, const char* message)
{
  void* ptr = mem_assert(malloc(size), message);
  nmalloc++;
  return ptr;
}

Normally, it acts just like malloc(). In the rare instance where malloc returns NULL, however, it prints a message to stderr and exits.

I use another assert-like function from the mem module at the top of most functions to verify that all inbound pointers are not NULL (where NULL is not valid or expected):

  mem_assert(page, "page_save gets NULL page");
  mem_assert(page->url, "page_save gets NULL page->url");
  mem_assert(page->html, "page_save gets NULL page->html");
  mem_assert(pageDirectory, "page_save gets NULL pageDirectory");

The function mem_assert() checks its first parameter; if NULL, it prints a message to stderr and exits non-zero.

Test for the unexpected: aka, defensive programming

When coding, your code should always “expect the unexpected”. Checking for NULL pointers, error conditions returned by library functions, or bad parameters (like n < 0 in avg()), are all common examples. Another is below:

  if (grade < 0 || grade > 100)  // can't happen at Dartmouth
    letter = '?';
  else if (grade >= 90)
    letter = 'A';
  else
    ....

These checks are especially important for physical systems: the controlling software must be aware of the limitations of the physical system. Otherwise, seeming valid, if perhaps unusual, software interactions could lead to disaster: See this video of the effect of a hack on industrial machinery.

Plan for automate-able and repeatable tests

Kernighan and Pike [K&P 1999] recommend that the test output should always contain all of the input parameter settings so that the results can be matched to the parameter settings. Furthermore, you need to be able to precisely recreate the conditions of that test as accurately as possible. Some situations complicate this goal. If your program uses random numbers, you will need some way to set and print the starting seed so that you can ensure the same sequence of random numbers for each test run. Alternatively, you can have a file of “random” numbers for use during test - the random number function just reads the next one from the file.

Another situation that poses a challenge here is testing in the presence of asynchronous processes. These could be as simple as a collection of independent and variable length timers or as complicated as a distributed real-time process control system managing the flow of product through a refinery process.

Always check error returns from functions

A good programmer will always check the return status from functions, system calls, and libraries. If you neglect to look at the return status … how do you know that the function really worked? If your code assumes it did not fail but it did, then the resulting segfault or error will be hard to debug. Better to always check the error status returned by functions.

Another example from Kernighan and Pike [K&P 1999]:

  fp = fopen(outfile, "w");
  if (fp == NULL) {
    // file could not be opened
  }
  while (some expression) {
    fprintf(fp, ...);  // but... what if there is an error?
  }
  if (fclose(fp) == EOF) {
    // some output error occurred
  }

The error status of fprintf is not checked; what if it fails? The data written may have been lost. In CS50 we often ignore the return status of functions like printf and fclose, but if you are coding production code that must be absolutely robust, your code should check the return value of every function that returns a value.

Summary

Test incrementally and build confidence in your code.
Write unit tests that can be re-run once fixes or changes have been made.
Write self-contained unit tests
- Test inputs and outputs.
- Test the dataflow through the program.
- Test all the execution paths through the program.
Stress-test the code; start simple and advance.
Don’t implement new features if there are known bugs in the system.
The target runtime environment is as important a design and implementation point as the purpose of the code. Design and test with that environment in mind.
Test for portability: run code and tests on multiple machines/OSs.
Before shipping code make sure that any debug/test modes are turned off.

If you follow at least 50% of the tips in these notes you will write better code and it will have considerably fewer bugs.

Unit testing

This unit shows a more sophisticated unit-testing approach – more sophisticated than you will likely need in CS50. We include it here for those who may be interested.

This approach makes sophisticated use of the C preprocessor to implement glass-box unit testing.

The bag unit test used one simple preprocessor macro; in unittest.h we define some far-more sophisticated macros. Each #define defines a fragments; the first two take parameters.

Because each definition must appear on one “line”, I had to use a line continuation character (backslash in the last character of the line) to let me format the definitions in a human-readable way. The backslashes are lined up so they all look neat.

// each test should start by setting the result count to zero
#define START_TEST_CASE(name) int _failures=0; char* _testname = (name);

// Check a condition; if false, print warning message.
// e.g., EXPECT(dict->start == NULL).
// note: the preprocessor 
//   converts __LINE__ into line number where this macro was used, and
//   converts "#x" into a string constant for arg x.
#define EXPECT(x)                                               \
  if (!(x)) {                                                   \
    _failures++;                                                \
    printf("Fail %s Line %d: [%s]\n", _testname, __LINE__, #x); \
  }

// return the result count at the end of a test
#define END_TEST_CASE                                                   \
  if (_failures == 0) {                                                 \
    printf("PASS test %s\n\n", _testname);                              \
  } else {                                                              \
    printf("FAIL test %s with %d errors\n\n", _testname, _failures);    \
  }

#define TEST_RESULT (_failures)

The preprocessor defines a special macro __LINE__ that is set to the line number of the original source file, as each source line is processed; this is great for printing out the line number where our test case failed.

The preprocessor also has special syntax #parameter that substitutes a C string constant for the text of the parameter. You can see it right at the end of the EXPECT macro. Thus, EXPECT(tree != NULL) will produce code that ends with "tree != NULL"); enabling us to print the line number and the condition that failed. You can’t do that with C, only with the preprocessor!

Warning: I strongly discourage the use of preprocessor macros. There are times, however, where they are the right tool for the job, and this is one of those times.

The macros are meant to be used for constructing small unit tests like this one on the lecture extra treeA

/////////////////////////////////////
// create and validate an empty tree
int test_newtree0()
{
  START_TEST_CASE("newtree0");      
  tree_t* tree = tree_new();        
  EXPECT(tree != NULL);
  EXPECT(tree->root == NULL);

  EXPECT(tree_find(tree, "hello") == NULL);

  tree_delete(tree, NULL);
  EXPECT(mem_net() == 0);

  END_TEST_CASE;
  return TEST_RESULT;
}

In the above test, I create a new (empty) tree, try to find something in it, and delete the tree. Notice, though, that I actually peek inside the struct tree to verify that all its members are set correctly.

Note, too, how I used those new macros - using START_TEST_CASE() to give the test a name and initialize everything, EXPECT() to indicate the conditions I expect to be true, END_TEST_CASE to print the summary and clean up, and return TEST_RESULT to provide a return value for this function. Here’s how that code looks after running it through the preprocessor with gcc -DUNIT_TEST -E tree.c:

int test_newtree0()
{
  int _failures=0; char* _testname = ("newtree0");;
  tree_t* tree = tree_new();
  if (!(tree != 
# 244 "tree.c" 3 4
 ((void *)0)
# 244 "tree.c"
 )) { _failures++; printf("Fail %s Line %d: [%s]\n", _testname, 244, "tree != NULL"); };
  if (!(tree->root == 
# 245 "tree.c" 3 4
 ((void *)0)
# 245 "tree.c"
 )) { _failures++; printf("Fail %s Line %d: [%s]\n", _testname, 245, "tree->root == NULL"); };

  if (!(tree_find(tree, "hello") == 
# 247 "tree.c" 3 4
 ((void *)0)
# 247 "tree.c"
 )) { _failures++; printf("Fail %s Line %d: [%s]\n", _testname, 247, "tree_find(tree, \"hello\") == NULL"); };

  tree_delete(tree, 
# 249 "tree.c" 3 4
                   ((void *)0)
# 249 "tree.c"
                       );
  if (!(mem_net() == 0)) { _failures++; printf("Fail %s Line %d: [%s]\n", _testname, 250, "mem_net() == 0"); };

  if (_failures == 0) { printf("PASS test %s\n\n", _testname); } else { printf("FAIL test %s with %d errors\n\n", _testname, _failures); };
  return (_failures);
}

If you look closely, you can see the original bits of code (like tree_delete(tree, NULL) (with NULL expanded!) as well as the expanded EXPECT and other macros.

The code in treeA for tree.c contains four such unit test cases. Then the main() program runs the series of these four unit tests, and prints an error if any of them failed:

int
main(const int argc, const char* argv[])
{
  int failed = 0;

  failed += test_newtree0();
  failed += test_newtree1();
  failed += test_treeleft();
  failed += test_treefind();

  if (failed) {
    printf("FAIL %d test cases\n", failed);
    return failed;
  } else {
    printf("PASS all test cases\n");
    return 0;
  }
}

Here’s what the output looks like when everything passes:

$ make unit
gcc -Wall -pedantic -std=c11 -ggdb -DTESTING -DUNIT_TEST tree.c mem.o -o unittest
./unittest
End of tree_delete: 1 malloc, 1 free, 0 free(NULL), 0 net
PASS test newtree0

After tree_insert: 4 malloc, 1 free, 0 free(NULL), 3 net
End of tree_delete: 4 malloc, 4 free, 0 free(NULL), 0 net
PASS test newtree1

After tree_insert: 7 malloc, 4 free, 0 free(NULL), 3 net
After tree_insert: 9 malloc, 4 free, 0 free(NULL), 5 net
After tree_insert: 11 malloc, 4 free, 0 free(NULL), 7 net
After tree_insert: 13 malloc, 4 free, 0 free(NULL), 9 net
End of tree_delete: 13 malloc, 13 free, 0 free(NULL), 0 net
PASS test treeleft

After tree_insert: 16 malloc, 13 free, 0 free(NULL), 3 net
After tree_insert: 18 malloc, 13 free, 0 free(NULL), 5 net
After tree_insert: 20 malloc, 13 free, 0 free(NULL), 7 net
After tree_insert: 22 malloc, 13 free, 0 free(NULL), 9 net
  ann(1)
   bob(2)
    cheri(3)
 dave(4)
End of tree_delete: 22 malloc, 22 free, 0 free(NULL), 0 net
PASS test treefind

PASS all test cases

To see what it looks like when a failure occurs, I could either break the tree code (which I’d rather not do!) or break the test code; I’ll do the latter by changing one line

  EXPECT(tree_find(tree, "abcd") == &data);

  EXPECT(tree_find(tree, "abcd") == NULL);

and run the test again:

$ make unit
gcc -Wall -pedantic -std=c11 -ggdb -DTESTING -DUNIT_TEST tree.c mem.o -o unittest
./unittest
End of tree_delete: 1 malloc, 1 free, 0 free(NULL), 0 net
PASS test newtree0

After tree_insert: 4 malloc, 1 free, 0 free(NULL), 3 net
Fail newtree1 Line 271: [tree_find(tree, "abcd") == NULL]
End of tree_delete: 4 malloc, 4 free, 0 free(NULL), 0 net
FAIL test newtree1 with 1 errors

After tree_insert: 7 malloc, 4 free, 0 free(NULL), 3 net
After tree_insert: 9 malloc, 4 free, 0 free(NULL), 5 net
After tree_insert: 11 malloc, 4 free, 0 free(NULL), 7 net
After tree_insert: 13 malloc, 4 free, 0 free(NULL), 9 net
End of tree_delete: 13 malloc, 13 free, 0 free(NULL), 0 net
PASS test treeleft

After tree_insert: 16 malloc, 13 free, 0 free(NULL), 3 net
After tree_insert: 18 malloc, 13 free, 0 free(NULL), 5 net
After tree_insert: 20 malloc, 13 free, 0 free(NULL), 7 net
After tree_insert: 22 malloc, 13 free, 0 free(NULL), 9 net
  ann(1)
   bob(2)
    cheri(3)
 dave(4)
End of tree_delete: 22 malloc, 22 free, 0 free(NULL), 0 net
PASS test treefind

FAIL 1 test cases
Makefile:29: recipe for target 'unit' failed
make: *** [unit] Error 1

Notice how Make exited with error; that’s because unittest exited with non-zero status: note the code at end of main().

Professional test frameworks

Although we don’t have time to study any professional unit testing frameworks, they extend our unittest.h considerably. There are several C unit testing frameworks available on the Internet: Check and CUnit. Here’s an example using CUnit.

Google’s C++ Testing Frameworks puts the goals of testing this way (and it’s clearly applicable to C programming and unit testing as well as functional and system testing):

Tests should be independent and repeatable
Tests should be portable and reusable
When tests fail, they should provide as much information as possible about the failure
The testing framework should handle all the tracking of successful and unsuccessful tests for you
Tests should be as fast as possible