CS 50 | Software Design and Implementation

“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” - Brian Kernighan

As we turn our attention towards larger, more complex C programs we stress the importance of good style, good documentation, and strong testing. The goal is to avoid bugs through careful design and good style - and to discover what bugs remain through strong testing.

Once you discover the existence of a bug, how do you track it down so you know why the program is misbehaving and then how to fix it?

We strongly recommend learning gdb for debugging C programs. It takes a bit of practice, but its use will save you lots of time in CS50 and subsequent courses.

Goals

Learn about the GNU debugger called gdb:

Set breakpoints
Step into or over code line by line
Examine variables at each step
Squash bugs!

Activity

In today’s activity use gdb to find and fix bugs in purposely sorting code.

Techniques for limiting those pesky bugs

“Don’t Panic” – Hitchhiker’s Guide to the Galaxy

The trouble with bugs is that no two are the same. Bugs can be simple: bad pointers and array subscript errors; while others are sometime difficult to debug: the systems might run for days and then fail because of a slow memory leak or numeric overflow problem. Still others might depend upon a subtlety in the timing of events, such as messages arriving over the network or the user hitting ‘Enter’ at just the right moment.

Programmers aim to understand the nature of the bug they are trying to swat: is it reproducible? (does it always fail under the same set of conditions), does it always manifest itself in the same way?, and so on. These are clues that help track down those pesky bugs in complex systems.

The complexity of a program is related to the number of interacting components; we have already seen programs with multiple functions and code spread across two or three files, and that use one or more libraries (stdio, stdlib, and the math library). One rule of thumb is that the number of bugs grows with the number of interactions. Reducing the complexity and interactions enables us to focus in on the location of bugs in code. Gordon Bell summed it up this way:

“The cheapest, fastest, and most reliable components of a computer system are the ones that aren’t there.” – Gordon Bell

His point is that the importance of a simple design cannot be overemphasized.

Techniques that help reduce debugging time include:

a good design and design methodology;
consistent style (e.g., use C program idioms as much as possible);
boundary condition tests;
assertions and sanity testing;
defensive programming;
designing for testing;
avoid files that have a large number of functions, and functions that have a large number of lines; Aim for functions that do one thing, and do it well!
limit use of global variables whenever possible; and
leverage desk-checking tools.

Approaches to debugging

Insanity: doing the same thing over and over again and expecting different results.
– Unknown

When tracking down pesky bugs we can think of the following steps to finding and correcting them - a sort of “bug lifecycle”:

Testing: discovering what bugs exist. We have already designed some simple tests for programs in this class.
Stabilization: find a minimal input sequence that reliably reproduces the buggy behavior.
Localization: identify the function/line of the code responsible.
Correction: fix the code!
Verification: re-test the code fix and confirm it works… not just on the sequence that generated the buggy behavior, but on all possible tests to ensure your bug fix did not break some other behavior!
Extrapolation: imagine other examples that are related to the one that caused this bug to occur and test those too.

There are many ways that people approach debugging - not all of which are effective.

Ignore the bug; assume it will never happen again.
Sift through warning/error messages; once all of the messages are gone, assume the program is correct.
Insert printf statements throughout the code to inspect of variables and control flow.
Use specialized debugging tools (e.g., plugins integrated into your favorite IDE, commandline tools like gdb and valgrind).

The first approach is clearly not a good idea; bugs will recur if given the chance.

Eliminating all of the warnings and errors is a good idea (and indeed is required when submitting assignments in cs50 :). Without proper testing, however, there is no guarantee that your program is correct.

Print-style debugging can be useful for simple situations, but can lead to ugly, unreadable code. If you take this approach, even a little, use discipline to enable/disable such messages with a single switch (as shown below).

Let’s look at better approaches.

Code Inspection

Many times people rush and “hack” the debug phase and sit at the terminal hoping to eventually track down that bug via trial and error. Inexperienced programmers do this as their first resort. You will find this approach to be very time consuming - put more plainly, it will take longer than other techniques.

One of the most effective debug tools is you: sit down and read your code!

Pretend you are a computer and execute the code with a pen and paper. As you read your code, keep some of the following tips in mind:

Draw diagrams! Especially for data structures.
Regarding for/while loops, and recursion, think about the base case, and the boundary conditions, and work inductively toward the general case. Errors most often occur at the base case or at the boundary cases.

Code inspection is very useful. Good programmers closely trace through their code in detail. Look for boundary conditions of structures, arrays, loops, and recursion; bugs often exist at the boundary. Look at edge cases – “empty” or “full” conditions.

Once you have read your code and convinced yourself it works and bugs remain, you need to instrument your code and start the detective work.

Sometimes while debugging you will discover other, unrelated bugs that haven’t yet manifested themselves. FIX THEM!

Pragmatic Programmer Tip:

Don’t live with broken windows: Fix bad designs, wrong decisions, and poor code when you see them.

Pragmatic Programmer Tip:

Fix the Problem, Not the Blame: It doesn’t really matter whether the bug is your fault or someone else’s – it is still your problem, and it still needs to be fixed.

The printf approach to debugging

“All I need is printf(), right?”

You may have been using print statements to help you debug your code. That method can only get you so far. Sometimes, the underlying bug may even interfere with printf’s limited contribution to your efforts. For example, if you have a segfault that occurs after your printf is executed but its string never gets displayed because the process crashes - you might think that the bug occurs before your printf when really the bug happens much later. The takeaway here is that printf is not your friend in these examples; too often it’s a red herring.

What happens if your system runs for hours and only under a certain set of system conditions the code fails? Working your way through thousands of printf outputs may not help. When a bug is buried deep in the execution of your software you need sophisticated tools to track those down. You need more than printf to attack these bugs.

Just because stdout shows some of your prints were executed doesn’t always mean that the last message written to stdout is from the last printf before the program had a problem.
Unix output is often lazy, meaning that the system will eventually send the message to stdout but only when it is ready (e.g., some minimum number of characters to print to make it worthwhile, when the system is doing output for your process as well as others, etc.). This may seem unimportant, but it means that your program may execute the code following the printf() before the output appears. So, if you are using printf() for debugging, you should follow it with a fflush(stdout) which tells the system “print it NOW” before your program continues.

If you do use printf debugging, please use the C preprocessor to conditionally turn on/off the debugging output with one switch; that is, using #ifdef DEBUG and conditional compilation (see slides).

The GNU Debugger (gdb)

Note: before using gdb, ensure you compile all C source files with the -ggdb flag - our standard .bash_profile file defines mygcc with this flag, and your Makefile should include this flag in its definition of CFLAGS. This flag ensures that useful metadata is packaged with your executable at compile time that gdb needs to help you debug your programs.

The gdb debugger is invoked with the shell command gdb; it then prints its own prompt and accepts its own wide range of commands. Once started, it reads commands from the terminal until you tell it to exit with the gdb command quit. You can get online help from gdb itself by using the command help.

$ gdb
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
...
(gdb) help
List of classes of commands:

aliases -- Aliases of other commands
breakpoints -- Making program stop at certain points
data -- Examining data
files -- Specifying and examining files
internals -- Maintenance commands
obscure -- Obscure features
running -- Running the program
stack -- Examining the stack
status -- Status inquiries
support -- Support facilities
tracepoints -- Tracing of program execution without stopping the program
user-defined -- User-defined commands

Type "help" followed by a class name for a list of commands in that class.
Type "help all" for the list of all commands.
Type "help" followed by command name for full documentation.
Type "apropos word" to search for commands related to "word".
Command name abbreviations are allowed if unambiguous.
(gdb) 

You can run gdb with no arguments or options; but the most usual way to start GDB is with one argument, specifying an executable program as the argument:

$ gdb program

GDB demo

In the following examples we will use a lot of the basic gdb commands - break, run, next, step, continue, display, print, and frame (read about stack frames; this is an important concept in C and very useful for debugging and poking around in your code and looking at variables).

I strongly recommend that you go through the sequence of steps below and use these debugging commands. Don’t worry, you can’t break anything. Just like the shell commands you’ll only need a subset of the the complete set of gdb commands to become an effective debugger.

We will be working with bugsort.c.

/* 
 * bugsort.c - a buggy implementation of insertion sort
 *
 * This program was developed to demonstrate gdb for debugging.
 * 
 * usage: bugsort < inputfile
 *    where the stdin is assumed to include a sequence of numbers.
 * 
 * David Kotz 2019, 2021
 * CS 50, Fall 2022
 */

#include <stdio.h>
#include <stdlib.h>


/* ******************* main ***************** */
int
main()
{
  const int numSlots = 10;  // number of slots in array
  int sorted[numSlots];   // the array of items
  
  /* fill the array with numbers */
  for (int n = 0; n < numSlots; n++) {
    int item;     // a new item
    scanf("%d", &item);   // read a new item
    for (int i = n; i > 0; i--) {
      if (sorted[i] > item) {
        sorted[i+1] = sorted[i]; // bump it up to make room
      } else {
        sorted[i] = item; // drop the new item here
      }
    }
  }
  
  /* print the numbers */
  for (int n = 0; n < numSlots; n++) {
    printf("%d ", sorted[n]);
  }
  putchar('\n');
}

The program is simple: it reads ten integers from the stdin, and inserts each into an array of integers such that the array is in sorted order. It then prints them out, separated by spaces. Easy, right?

$ mygcc -o bugsort bugsort.c
$ ./bugsort
1
2
3
4
5
6
7
8
9
0
0 9 9 9 9 9 9 9 9 9 

Um, I guess not.

Let’s try running our program in gdb. When gdb starts up it prints out a bunch of information about its version and license, then drops into the gdb “shell.”

$ gdb bugsort
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
...
Reading symbols from bugsort...done.
(gdb)

Notice that last line about “reading symbols”; gdb is reading special debug-related information the compiler produced, about all the “symbols” in the program.

A symbol is a function name, variable name, data type name, etc. This information may be stored inside the executable file (here, bugsort), or (as on MacOS) in an adjacent folder (bugsort.dSYM). The compiler saves this information because our alias mygcc includes the -ggdb argument.

One of a debugger’s most powerful features is the ability to set “breakpoints” in our code; when we run our program and the debugger encounters a breakpoint, the execution of the program stops at that point. Let’s set a few breakpoints:

(gdb) b main
Breakpoint 1 at 0x11b6: file bugsort.c, line 20.
(gdb) list 25
20	{
21	  const int numSlots = 10;  // number of slots in array
22	  int sorted[numSlots];   // the array of items
23	  
24	  /* fill the array with numbers */
25	  for (int n = 0; n < numSlots; n++) {
26	    int item;     // a new item
27	    scanf("%d", &item);   // read a new item
28	    for (int i = n; i > 0; i--) {
29	      if (sorted[i] > item) {
(gdb) break 28
Breakpoint 2 at 0x12ac: file bugsort.c, line 28.
(gdb) break 38
Breakpoint 3 at 0x130d: file bugsort.c, line 38.
(gdb) 

Notice that we can list the code around a line number by specifying that line number. Notice further that you can just hit “enter” at the gdb commandline to mean “do it again”, or in the case of list, “list some more”.

Notice that we can set breakpoints by identifying the name of a function (e.g., “main”), or by specifying a particular line in our source code (e.g., lines 27 and 37).

When you are debugging programs with multiple files you can also set breakpoints in different files by specifying the file as well as the function name/line of code where you’d like to enable a breakpoint.

If you want to see the breakpoints you’ve currently created, run info break (as shown above).

(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000000011b6 in main at bugsort.c:20
2       breakpoint     keep y   0x00000000000012ac in main at bugsort.c:28
3       breakpoint     keep y   0x000000000000130d in main at bugsort.c:38
(gdb) 

You can also clear all of your breakpoints (clear), clear specific breakpoints (clear function or clear line), or even disable breakpoints so that you can leave them in place, but temporarily disabled.

((gdb) disable 2
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000000011b6 in main at bugsort.c:20
2       breakpoint     keep n   0x00000000000012ac in main at bugsort.c:28
3       breakpoint     keep y   0x000000000000130d in main at bugsort.c:38
(gdb) 

Notice under the “Enb” column the second breakpoint is disabled.

At this point we’ve started gdb and told it about some breakpoints we want set, but we haven’t actually started running our program.

Let’s run our program now:

(gdb) run
Starting program: /thayerfs/home/d84xxxx/cs50/activities/day14/bugsort 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main () at bugsort.c:20
20	{
(gdb) 

As expected, the debugger started our program running but “paused” the program as soon as it hit the breakpoint that we set at the main function. Once the program has stopped we can “poke around” a bit.

Now let’s step one line of code at a time; we type step and then, for convenience, just hit Enter each time to go one more step.

(gdb) step
20	  const int numSlots = 10;  // number of slots in array
(gdb) step
21	  int sorted[numSlots];   // the array of items
(gdb) 
24	  for (int n = 0; n < numSlots; n++) {
(gdb) 
26	    scanf("%d", &item);   // read a new item
(gdb) 
__isoc99_scanf (format=0x555555556004 "%d") at ./stdio-common/isoc99_scanf.c:25
25	./stdio-common/isoc99_scanf.c: No such file or directory.
(gdb) 

Oops! Stepping line by line is nice but gdb’s step command allowed us to walk right down into the icky details of scanf! It is cool that we can “step” into functions but scanf does a lot of work that we aren’t interested in - and we don’t have the source code anyway. If you find yourself deep down in some function that you accidentally stepped into, use the finish command to start the program running again until just after the function in the current stack frame returns.

((gdb) finish
Run till exit from #0  __isoc99_scanf (format=0x555555556004 "%d")
    at ./stdio-common/isoc99_scanf.c:25
11 22 33 44 55 66 77 88 99 00
main () at bugsort.c:28
28	    for (int i = n; i > 0; i--) {
Value returned is $1 = 1
(gdb) 

Of course, when scanf continued it expected me to enter some input. I proceeded to enter 10 numbers (11 to 00), just as in our prior experiment. (In this case the input is from the keyboard, and in the prior case it was from a pipeline; either way it is coming via stdin and scanf does not care.)

Now that we are back up in main function, at line 28. It also conveniently prints the return value for scanf, i.e., Value returned is $1 = 1 (because scanf successfully read 1 item matching the pattern %d). I can examine what number was read by printing the variable value:

(gdb) print item
$2 = 11

We’re about to enter inner for loop. Let’s take one step.

(gdb) step
24	  for (int n = 0; n < numSlots; n++) {

Huh? we never entered the inner for loop - we came right back around and are about to re-execute line 25. (Think about why.)

To avoid stepping into functions we can use the alternative gdb command called next which is similar to step in that it executes one line of code and then pauses at the next line of code, however next will step over functions so that we don’t end up deep down in some code that isn’t relevant to us (i.e., deep inside of the details of scanf); let’s try that now:

(gdb) next
26	    scanf("%d", &item);   // read a new item
(gdb) next
27	    for (int i = n; i > 0; i--) {
(gdb) print item
$3 = 22
(gdb) print n
$4 = 1
(gdb) 

Back at that breakpoint, and this time item=22 and n=1.

Moving on,

(gdb) next
28	      if (sorted[i] > item) {
(gdb) 
31	        sorted[i] = item; // drop the new item here
(gdb) 
27	    for (int i = n; i > 0; i--) {
(gdb) 
24	  for (int n = 0; n < numSlots; n++) {
(gdb) 
26	    scanf("%d", &item);   // read a new item
(gdb) print sorted[0]
$5 = 0
(gdb) print sorted[1]
$6 = 22
(gdb) 

Ahah, this time we went into the inner loop and dropped in our item. We then came back around to the top of the main loop. As you can see, I printed contents of two elements of the array, too.

We can print the memory address of these variables:

(gdb) print &n
$7 = (int *) 0x7fffffffdde0
(gdb) print &sorted
$8 = (int (*)[10]) 0x7fffffffdda0
(gdb) 

Pretty cool, right? Notice that gdb is nice enough to also give us information about the type of the thing that we are looking at!

If we step a bit further, and into scanf, I can show you the backtrace command:

(gdb) step
25	    scanf("%d", &item);		// read a new item
(gdb) step
__isoc99_scanf (format=0x555555556004 "%d") at ./stdio-common/isoc99_scanf.c:25
25	./stdio-common/isoc99_scanf.c: No such file or directory.
((gdb) backtrace
#0  __isoc99_scanf (format=0x555555556004 "%d") at ./stdio-common/isoc99_scanf.c:25
#1  0x00005555555552ac in main () at bugsort.c:27
(gdb) 

Which shows the function-call stack, from inner to outer. Above we are inside __isoc99_scanf (aka scanf) and that was called from main.

Let’s finish scanf:

(gdb) finish
Run till exit from #0  __isoc99_scanf (format=0x555555556004 "%d")
    at ./stdio-common/isoc99_scanf.c:25
main () at bugsort.c:28
28	    for (int i = n; i > 0; i--) {
Value returned is $9 = 1
(gdb) print item 
$10 = 33
(gdb)

Notice I did not need to type any input because scanf is still chewing on that input I provided the first time it asked me for input.

OK, I’m getting tired of stepping. Rather than stepping line by line, I want to start the program running again (at least until it hits the breakpoint again) so that I can speed up the process getting back to the code where I can enter a password and verify the changes. To do this I can simply use the continue command which will continue the execution of the program until it is stopped again for some reason. First, I’m going to re-enable that breakpoint I disabled earlier.

(gdb) enable 2
(gdb) continue
Continuing.

Breakpoint 2, main () at bugsort.c:28
27	    for (int i = n; i > 0; i--) {
(gdb) 

It ran a bit further then hit that breakpoint. Let’s automate things a little better, by providing some commands that should be run on certain breakpoints:

(gdb) commands 2
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>print n
>print item
>end
(gdb) continue
Continuing.

Breakpoint 2, main () at bugsort.c:28
27	    for (int i = n; i > 0; i--) {
$11 = 4
$12 = 55
(gdb) continue
Continuing.

Breakpoint 2, main () at bugsort.c:28
27	    for (int i = n; i > 0; i--) {
$13 = 5
$14 = 66
(gdb) continue
Continuing.

Breakpoint 2, main () at bugsort.c:28
27	    for (int i = n; i > 0; i--) {
$15 = 6
$16 = 77
(gdb) 
Continuing.

Breakpoint 2, main () at bugsort.c:28
27	    for (int i = n; i > 0; i--) {
$17 = 7
$18 = 88
(gdb) 
Continuing.

Breakpoint 2, main () at bugsort.c:28
27	    for (int i = n; i > 0; i--) {
$19 = 8
$20 = 99
(gdb) 
Continuing.

Breakpoint 2, main () at bugsort.c:28
27	    for (int i = n; i > 0; i--) {
$21 = 9
$22 = 0
(gdb) 
Continuing.

Breakpoint 3, main () at bugsort.c:38
37	  for (int n = 0; n < numSlots; n++) {
(gdb) 

Now I just hit Enter each time, and it took another loop and shows me the values of n and item. Handy! The last one broke out of the initial loop and landed me at breakpoint 3, just before the values will be printed. I can explore a bit more before that loop runs.

(gdb) print sorted[0]
$23 = 0
(gdb) print sorted[9]
$24 = 32767
(gdb) continue
Continuing.
0 99 99 99 99 99 99 99 32767 32767 
[Inferior 1 (process 38631) exited normally]
(gdb) 

At this point we’ve seen some useful gdb commands and you are now equipped to do some debugging on your own. Keep poking at the program and see if you can find the errors.

You may find it helpful to store sample input in a file, e.g.,

$ echo 11 22 33 44 55 66 77 88 99 00 > nums
$ gdb bugsort
...
Reading symbols from bugsort...done.
(gdb) run < nums
Starting program: /thayerfs/home/d84607y/cs50/activities/day14/bugsort < nums
0 99 99 99 99 99 99 99 32767 32767 
[Inferior 1 (process 40061) exited normally]
(gdb) 

Some cool things to note about gdb:

Every time you enter a command at the gdb “shell” that is successful, the output value is stored in a variable denoted $N where N increments by 1 for each command that you run. You can use those variables at a later point if you want (e.g., print $3).
gdb supports auto-completion on function names and variable names! Go ahead and try it out!
Also similar to the regular shell, the gdb shell allows you to arrow up/down to revisit past commands.
Many of the gdb commands have abbreviated forms (e.g., run is r, continue is c, next is n); see the gdb quick reference guide to see other commands that have abbreviated forms.
You can re-run the previous command simply by hitting the Enter (return) key.

Frequently used `gdb` commands

Below are some of the more common gdb commands that you will need. See also this printable gdb quick reference guide.

command	shortcut	purpose
`run [arglist]`	r	Start your program running (with arglist, if specified)
`break [file:]function`	b	Set a breakpoint at function (in file)
`commands NN`		A list of commands to run every time breakpoint #NN is reached
`list [file:]function`	l	Type the text of the program in the vicinity of where it is presently stopped
`backtrace`	bt	Display the program call stack
`frame [args]`	f	The frame command allows you to move from one stack frame to another, and to print the stack frame you select. args may be either the address of the frame or the stack frame number. Without an argument, frame prints the current stack frame
`print expr`	p	Display the value of an expression
`continue`	c	Continue running your program (after stopping, e.g. at a breakpoint)
`next`	n	Execute next program line (after stopping); step over any function calls in the line
`step`	s	Execute next program line (after stopping); step into any function calls in the line
info break	i b	List breakpoints
disable N		Disable breakpoint N where N shown in info break
enable N		Enable breakpoint N where N shown in info break
delete N		Remove breakpoint N where N shown in info break
info locals	i loc	Print the values of all local variables
`help [name]`	h	Show information about GDB command name, or general information about using GDB
`quit`	q	Exit from GDB