CS 50 Software Design and Implementation

Lecture 5

The C Programming Language

The first four lectures have been a crash course in the shell and shell programming. Now we move to the C language. We will spend the rest of the course developing our C and systems programming skill set by first understanding the basics of the language and then through examples study good code and write our own.

This lecture will serve as an introduction to the C language. Why is C an important language 30 years after its development?

Goals

We plan to learn the following from today’s lecture:

OK. Let’s get started.

Reading from “A First Book on ANSI C”

We intend to use the text book more as a reference than working through the book in a stepwise fashion - no time for that. We will relate to sections and use example code from the book from time to time. I would suggest that students start reading the book - it is very readable. Please read chapters 2 and 3 of the text. If you already have some knowledge of C you can skip this reading assignment. This type of reading is different from assigned reading of articles which we typically will discuss in class. Reading from the course book is more of a back up to what we discuss or for you to fill in your knowledge of things we don’t have time to dive deeply into in class. Read as much as time permits.

C

C can be correctly described as a successful, general purpose programming language, a description also given to Java and C++. C is a procedural programming language, not an object-oriented language like Java or C++. Programs written in C can of course be described as “good” programs if they are written clearly, make use of high level programming practices, and are well documented with sufficient comments and meaningful variable names. Of course all of these properties are independent of C and are provided through many high level languages.

C has the high level programming features provided by most procedural programming languages - strongly typed variables, constants, standard (or base) datatypes, enumerated types, a mechanism for defining your own types, aggregate structures, control structures, recursion and program modularization.

C does not support sets of data, Java’s concept of a class or objects, nested functions, nor subrange types and their use as array subscripts, and has only recently added a Boolean datatype.

C does have, however, separate compilation, conditional compilation, bitwise operators, pointer arithmetic and language independent input and output. The decision about whether C, C++, or Java is the best general purpose programming language (if that can or needs to be decided), is not going to be an easy one.

C is the programming language of choice for most systems-level, engineering, and scientific programming. The world’s popular operating systems - Linux, Windows and Mac OS-X, their interfaces and file-systems, are written in C; the infrastructure of the Internet, including most of its networking protocols, web servers, and email systems, are written in C; software libraries providing graphical interfaces and tools, and efficient numerical, statistical, encryption, and compression algorithms, are written in C; and the software for most embedded devices, including those in cars, aircraft, robots, smart appliances, sensors, mobile phones, and game consoles, are written in C.

Operators

Nearly all operators in C are identical to those of Java. However, the role of C in system programming exposes us to much more use of the shift and bit-wise operators than in Java.

Assignment

=

Arithmetic

+, -, *, /, %, unary -

Priorities may be overridden with ( )s.

Relational

>, >=, <, <= (all have same precedence)
== (equality) and != (inequality)

Logical

&& (and), || (or), ! (not)

Pre- and post- decrement and increment

Any (integer, character or pointer) variable may be either incremented or decremented before or after its value is used in an expression.

For example :

–fred will decrement fred before value used.
++fred will increment fred before value used.
fred– will get (old) value and then decrement.
fred++ will get (old) value and then increment.

Let’s write some C code to look at pre and post increment and decrement.

C code: increment.c

Where to get the C source code examples from: Note, all the source code examples used in these C programming notes can be copies to your local machine from my public_html/cs50 directory. The snippet below creates a local directory called examples and copies the C source code files over. After that you can compile and execute the code with the notes – make sure you do this for all C source code examples in the notes.


$ cd cs50
$ mkdir examples
$ cd examples
$ cp ~campbell/public_html/cs50/*.c .
$ ls
apples.c           command.c             files.c         operator.c
arguments.c        conEchoServer.c       fixed-strcpy.c  pointer-examples.c
array-address.c    crawler.c             getchar.c       pointers.c
array.c            danger.c              hash.c          print_i.c
assert.c           data-types.c          hello.c         random.c
auto.c             deadlock.c            hello-class.c   scope.c
buffer-overflow.c  debug0.c              html.c          sort.c
bug.c              debug1.c              increment.c     static.c
buggy_sort0.c      debug2.c              leaky.c         strcpy.c
buggy_sort1.c      debug3.c              memory.c        string.c
buggy_sort2.c      debug4.c              mutex.c         swap.c
buggy_sort3.c      dictionary-example.c  okswap.c        toss.c
buggy_sort4.c      dontdothis.c          old_bug.c       trig.c
buserr.c           echoClient.c          oldbug.c        unsafe.c
checker.c          echoServer.c          oldcinfo.c
cinfo.c            efence.c
$

You can copy my bash files over in the same manner if you want my set up and aliases.

What follows below is copying the my bash files over to the examples directory. I recommend that you do this and then cut and paste sections of my bash files (e.g., aliases, mygcc, etc.) to your own bash files in your home directory as you wish.



$ cp ~campbell/.bash* .
$ ls -a .bash*
.bash_history  .bash_profile   .bashrc   .bashrc~
.bash_logout   .bash_profile~  .bash_rc

OK now let’s look at increment.c



/*

  file: increment.c

  Description: Illustrate pre and post increment and decrement.

*/

#include <stdio.h>

int main() {

     int fred = 3, a=3;

     printf("Start; fred = %d and a = %d\n", fred, a);
     a = --fred;
     printf("a = --fred; fred = %d and a = %d\n", fred, a);
     a = ++fred;
     printf("a = ++fred; fred = %d and a = %d\n", fred, a);
     a = fred--;
     printf("a = fred--; fred = %d and a = %d\n", fred, a);
     a = fred++;
     printf("a = fred++; fred = %d and a = %d\n", fred, a);

     return 0;

}

Once we have the C code we have to compile it with gcc with the various compiler switches we discussed in Lecture 1. Let’s compile the code using:

Using mygcc filename.c -o filename as the convention. The compiler produces an executable filename.

You do not have to use chmod to make it an executable. The compiler takes care of that when it creates an executable with the correct permission for file filename.


[atc@Macintosh-8 l5] alias mygcc
alias mygcc=’gcc -Wall -pedantic -std=c99’

$ mygcc increment.c -o increment
$ ls increment
increment
$ ls -l increment
-rwxr-xr-x   1 atc  admin  13344 Jan 14 21:51 increment
$ ./increment
Start; fred = 3 and a = 3
a = --fred; fred = 2 and a = 2
a = ++fred; fred = 3 and a = 3
a = fred--; fred = 2 and a = 3
a = fred++; fred = 3 and a = 2


Check it out: Save the file in your directory cs50/code/ Compile and run the code. Check the output.

Bitwise operators and masking

& (bitwise and), | (bitwise or), ~ (bitwise negation).
To check if certain bits are on (fred & MASK), etc.
Shift operators << (shift left), >> (shift right).

Combined operators and assignment

a += 2; a -= 2;
a *= 2
May be combined as in a += b; a = a+b;

Type coercion

C permits assignments and parameter passing between variables of different types using type casts or coercion. Casts in C are not implicit, and are used where some languages require a “transfer function”.

Precedence of operators

Expressions are all evaluated from left-to-right, and the default precedence may be overridden with brackets.

() coercion (highest)
++ – !
* / %
+ -
<< >>
!= ==
&
|
&&
||
? :
=
, (lowest)

Variable names

Variable names (and type and function names as we shall see later) must commence with an alphabetic or the underscore character A-Z a-z _ and be followed by zero or more alphabetic, underscore or digit characters A-Z a-z 0-9. Most C compilers, such as gcc, accept and support variable, type and function names to be up to 256 characters in length. Some older C compilers only supported variable names with up to 8 unique leading characters and keeping to this limit may be preferred to maintain portable code. It is also preferred that you do not use variable names consisting entirely of uppercase characters uppercase variable names are best reserved for #define-ed constants, as in MAXSIZE above. Importantly, C variable names are case sensitive and MYLIMIT, mylimit, Mylimit and MyLimit are four different variable names.

Base Datatypes

Variables are declared to be of a certain type, this type may be either a base type supported by the C language itself, or a user-defined type consisting of elements drawn from Cs set of base types. Cs base types and their representation on our labs Pentium PCs are:

bool an enumerated type, either true or false
char the character type, 8 bits long
short the short integer type, 16 bits long
int the standard integer type, 32 bits long
long the longer integer type, also 32 bits long
float the standard floating point (real) type, 32 bits long (about 10 decimal digits of precision)
double the extra precision floating point type, 64 bits long (about 17 decimal digits of precision)
enum the enumerated type, monotonically increasing from 0

Very shortly, we will see the emergence of Intels IA64 architecture where, like the Power-PC already, long integers occupy 64 bits.

We can determine the number of bytes required for datatypes with the sizeof operator. In contrast, Java defines how long each datatype may be. Cs only guarantee is that:

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)

Let’s write some C code to look at these base data types. We will use the sizeof operator and the printf function. We will also define vriables of each of the base types and print the initialzed values as part of the data-types.c code.

C code: data-types.c

The contents of data-types.c looks like this:


/*

  file: data-types.c

  Description: Sets up variables for different base data types, intialises them
  and print the data and the size of the base data types in number of bytes.

  Revised version of code pg. 96 (Program 2.10) (Bronson) "First Book on ANSI C"

*/

#include <stdio.h>

int main() {

  char ch = ’a’;
  int in = 2;
  short sh = 3;
  long lo = 4;
  float fl = 1000.256734;
  double db = 11e+23;
  long double ld = 10e+30;

  printf("-------contents ------- sizeof()------\n\n");

  printf("contents of char is %c --- sizeof %ld bytes\n\n", ch, sizeof(char));
  printf("contents int is  %d --- sizeof  %ld bytes\n\n", in, sizeof(int));
  printf("contents short is  %d --- sizeof %ld bytes\n\n", sh, sizeof(short));
  printf("contents long is %ld --- sizeof %ld bytes\n\n", lo, sizeof(long));
  printf("contents float is %f --- sizeof %ld bytes\n\n",fl, sizeof(float));
  printf("contents double is  %e --- sizeof %ld bytes\n\n",db, sizeof(double));
  printf("contents long double is  %LG --- sizeof %ld bytes\n",ld, sizeof(long double));

  return 0;

}

Let’s compile and run the code.


[atc@Macintosh-8 l5]$ mygcc data-types.c -o data-types

[atc@Macintosh-8 l5]$ ./data-types

-------contents ------- sizeof()------

contents of char is a --- sizeof 1 bytes

contents int is  2 --- sizeof  4 bytes

contents short is  3 --- sizeof 2 bytes

contents long is 4 --- sizeof 4 bytes

contents float is 1000.256714 --- sizeof 4 bytes

contents double is  1.100000e+24 --- sizeof 10 bytes

contents long double is  1E+31 --- sizeof 20 bytes

Check it out: Save the file in your directory cs50/code/ Compile and run the code. Check the output.

Storage modifiers of variables

Base types may be preceded with one of more storage modifier :

auto the variable is placed on the stack (default, deprecated)
extern the variable is defined outside of the current file
register request that the variable be placed in a register (ignored)
static the variable is placed in global storage with limited visibility
typedef introduce a user-defined type
unsigned storage and arithmetic is only of/on positive integers

Initialization of variables

All scalar auto and static variables may be initialized immediately after their definition, typically with constants or simple expressions that the compiler can evaluate at compile time. The C99 language defines that all uninitialized global variables, and all uninitialized static local variables will have the starting values resulting from their memory locations being filled with zeroes - conveniently the value of 0 for an integer, and 0.0 for a floating point number.

Scope rules of global variables

Scope is defined as the section (e.g., function, block) of the program where the variable is valid and known.

In Java, a variable is simply used as a name by which we refer to an object. A newly created object is given a name for later reference, and that name may be re-used to refer to another object later in the program. In C, a variable more strictly refers to a memory address (or contiguous memory address starting from the indicated point) and the type of the variable declares how that memorys contents should be interpreted and modified.

C only has two true lexical levels, global and function, though sub-blocks of variables and statements may be introduced in sub-blocks in many places, seemingly creating new lexical levels. As such, variables are typically defined globally (at lexical level 0), or at the start of a statement block, where a functions body is understood to be a statement block.

Variables defined globally in a file, are visible until the end of that file. They need not be declared at the top of a file, but typically are. If a global variable has a storage modifier of static, it means that the variable is only available from within that file. If the static modifier is missing, that variable may be accessed from another file if part of a program compiled and linked from multiple source files.

The extern modifier is used (within “our” file) to declare the existence of the indicated variable in another file. The variable may be declared as extern in all files, but must be defined (and not as a static!) in only a single file.

Scope rules of local variables

Variables may also be declared at the beginning of a statement block, but may not be declared anywhere other than the top of the block. Such variables are visible until the end of that block, typically until the end of the current function. A variables name may shadow that of a global variable, making that global variable inaccessible. Blocks do not have names, and so shadowed variables cannot be named. Local variables are accessible until the end of the block in which they are defined.

Local variables are implicitly preceded by the auto modifier as control flow enters the block, memory for the variable is allocated on the run-time stack. The memory is automatically deallocated (or simply becomes inaccessible) as control flow leaves the block. The implicit auto modifier facilitates recursion in C each entry to a new block allocates memory for new local variables, and these unique instances are accessible only while in that block.

If a local variable is preceded by the static modifier, its memory is not allocated on the run-time stack, but in the same memory as for global variables. When control flow leaves the block, the memory is not deallocated, and remains for the exclusive use by that local variable. The result is that a static local variable retains its value between entries to its block. Whereas the starting value of an auto local variable (sitting on the stack) cannot be assumed (or more correctly, should be considered to contain a totally random value), the starting value of a static local variable is as it was when the variable was last used.

Examples of global and local variables

Let’s look at some code snippets to reinforce the ideas of local and global variables and the issue of the scope of these variables in a sectin of code. The example comes from the book and illustrates the ideas nicely.

C code: scope.c


/*

  File: scope.c

  Description: Illustrates the use of global and local variables and
               global function prototypes.


   Revised code taken from pg. 330 (Program 7.1) (Bronson) "First Book on ANSI C"

*/

#include <stdio.h>

/* firstnum is a global variable not defined on the main() stack. It has full scope
   of all functions in the file scope.c. Any code in the file can read and write to it.
   Once main() teriminates the variable is dellocated and no longer exists
*/

int firstnum; /* create a global variable named firstnum */

void valfun(); /* global function prototype */


int main()
{
  int secnum; /* create a local variable named secnum */
  firstnum = 10; /* store a value into the global variable */
  secnum = 20; /* store a value into the local variable */

  printf("\nFrom main(): firstnum = %d",firstnum);
  printf("\nFrom main(): secnum = %d\n",secnum);

  valfun(); /* call the function valfun */

  printf("\nFrom main() again: firstnum = %d",firstnum);
  printf("\nFrom main() again: secnum = %d\n",secnum);

  return 0;
}


void valfun() /* no values are passed to this function */
{

  /* secum is a local variable created on the stack when valfun() executes.
     When valfun() exits the stack is deallocated and the variable no
     longer exists. It is local and its scope is valfun() */

  int secnum; /* create a second local variable named secnum */
  secnum = 30; /* this only affects this local variable’s value */


  printf("\nFrom valfun(): firstnum = %d",firstnum);
  printf("\nFrom valfun(): secnum = %d\n",secnum);
  firstnum = 40; /* this changes firstnum for both functions */

}


If we run the code the output is as follows:



$ mygcc scope.c -o scope
$ ./scope

From main(): firstnum = 10
From main(): secnum = 20

From valfun(): firstnum = 10
From valfun(): secnum = 30

From main() again: firstnum = 40
From main() again: secnum = 20

Study the output. Is it what you expected? Now read on.

The first thing to note about the source code is that it defines a global variable firstnum that’s scope is the complete file and therefore is accessible from main() and the valfun(). Note, scope.c has a main() and a valfun() function. The prototype for valfun() is declared at the top of the file giving it global scope in the file scope.c. We will talk about prototypes later. Both main() and valfun() update and print the value of firstnum which represents a variable with a memory address (space is not allocated on the stack as in the case of auto variables such as secnum). Note that main() and valfun() both have local variables called secnum. This is not a problem and causes no clash because of the scope of these two different local variables (that happen to have the same name) only have local scope inside the main() and valfun() functions, respectively. There instances are private to main() and valfun(), respectively. They have no association other than having the same names. They are auto variables created on the stack and no longer exist when the function exists. For example, valfun() creates a variable for secnum of integer type on its local stack when it executes but when it returns control to main() the stack is deallocated and the variable no longer exists. In contrast, the global variable firstnum and its current value are not changed when valfun() exits.

Examples of auto and static local variables

That leads us to another storage operator that is impacted by scope, i.e., static. Here the variable is placed in global storage with limited visibility depending on where it is defined. Let’s look at two code snippets that illustrates the use of local auto and static variables. These represent to important cases in C.

First, let’s look at the case of auto local variables.

C code: auto.c


/*

  File: auto.c

  Description: Illustrates the auto local variables

  Code taken from pg. 336 (Program 7.2) (Bronson) "First Book on ANSI C"

*/


#include <stdio.h>

void testauto(); /* function prototype */

int main()
{
  int count; /* create the auto variable count */
  for(count = 1; count <= 3; count++)
  testauto();

  return 0;
}

void testauto()
{
  int num = 0; /* create the auto variable num */
               /* and initialize to zero */

  printf("The value of the automatic variable num is %d\n", num);
  num++;
}

If we run the code the output is as follows:


$ ./auto
The value of the automatic variable num is 0
The value of the automatic variable num is 0
The value of the automatic variable num is 0

Study the output. Is it what you expected? Now read on.

Now let’s look at the case when num is defined as static inside the scope of the function teststat(). Note, that the value of num is now persistent across multiple invocations of the function. This is in direct contrast to the auto local varable of the last code snippet - i.e., auto.c. In essence, the operator static allocates memory to the variable of type int that is outside the stack just like a global variable in scope.c - i.e., firstnum. However, the distinction here is that static is not global. It is only accessible in the function teststat(). Hope that clarifies the issue of scope, local and global variables and the issue of auto variables and static variables.

First, let’s look at the case of static local variables.

C code: static.c


/*

  File: static.c

  Description: Illustrates the use of static variable.

   Revised code taken from pg. 337 (Program 7.3) (Bronson) "First Book on ANSI C"

*/


#include <stdio.h>

void teststat(); /* function prototype */

int main()
{
  int count; /* count is a local auto variable */

  for(count = 1; count <= 3; count++)
    teststat();

  return 0;
}

/*  Note, that the varuable num in teststat() is only set to zero once. The value set by
    the local scope static variable num detains its value when teststat() returns.
*/

void teststat()
{
  static int num = 0; /* num is a local static variable */

  printf("The value of the static variable num is now %d\n", num);
  num++;

}

If we run the code the output is as follows:


$ ./static
The value of the static variable num is now 0
The value of the static variable num is now 1
The value of the static variable num is now 2

Is this what you expected?

Question: If I have defined “static int num;” at the top of static.c how would that change the scope of the static variable? Is it different to “int num” as defined as gobal variable (like firstnum in scope)?

Flow of control in a C program

Control flow within C programs is almost identical to the equivalent constructs in Java. However, C provides no exception mechanism, and so C has no try, catch, and finally, constructs.

Conditional execution


  if ( expression )
       statement1;
  if ( expression ) {
       statement1;
       statement2;
       ......
  }

  if ( expression )
       statement
  else
       statement

Of significance, and a very common cause of errors in C programs, is that pre C99 has no Boolean datatype. Instead, any expression that evaluates to the integer value of 0 is considered false, and any nonzero value as true. A conditional statements controlling expression is evaluated and if non-zero (i.e. true) the following statement is executed. Most errors are introduced when programmers (accidently) use embedded assignment statements in conditional expressions:


  if (loop_index = MAXINDEX )
       statement;

   /* instead of ... */

   if (loop_index == MAXINDEX )
      statement;

A good habit to get into is to place constants on the left of (potential) assignments:


  if (0 = value )
       statement;

When compiling with gcc -std=c99 -Wall -pedantic ... the only way to “shut the compiler up” is to use extra parenthesis:


  if ( ( loop_index = MAXINDEX ) )
       statement;

Cs other control flow statements are very unsurprising:


  while ( conditional-expression ) {
       statement1;
       statement2;
       ......
  }

  do {
       statement1;
       statement2;
       ......
  } while ( conditional-expression );


  for( initialization ; conditional-expression ; adjustment ) {
       statement1;
       statement2;
       ......
  }

Examples of code snippets:

Loops: using the for statement

#define ARRAY_LENGTH 100

long array[ARRAY_LENGTH];
int i;

for ( i = 0; i < ARRAY_LENGTH; i++) {

   array[i] = 4 * i;
   printf(‘Value of i is %ld\n", i);

}

Any of the components may be missing, If the conditional-expression is missing, it is always true. Infinite loops may be requested in C with for( ; ; ) ... or with while(1) ...

The equivalence of for and while


  for ( expression1 ; expression2 ; expression3 ) {
         statement1;
  }


  expression1;
  while ( expression2 ) {
        statement1;
        expression3;
  }


Example of equivalence using the while statement

i = 0;

while (i < ARRAY_LENGTH) {

   array[i] = 4 * i;
   printf(‘Value of i is %ld\n", i);
   i++;

}
===

The switch statement


  switch ( expression ) {
       case const1 : statement1; break;
       case const2 : statement2; break;
       case const3 :
       case const4 : statement4;
       default : statementN; break;
  }

One of the few differences here between C and Java is that C permits control to drop down to following case constructs, unless there is an explicit break statement.

C code: operator.c

The contents of opeator.c looks like this:


/*

  File: operator.c

  Description: Implements basic operations (multiplication, division, addition, modulos.


  Revised version of code pg. 191 (Program 4.6) (Bronson) "First Book on ANSI C"

*/

#include <stdio.h>

int main() {

  int opselect;
  float fnum, snum;

  printf("Please type in two numbers: ");
  scanf("%f %f", &fnum, &snum);
  printf("Enter a select code:");
  printf("\n 1 for addition");
  printf("\n 2 for multiplication");
  printf("\n 3 for division : ");
  printf("\n 4 for modulus  : ");
  scanf("%d", &opselect);

  switch (opselect)  {

    case 1:
      printf("The sum of the numbers entered is %6.3f\n", fnum + snum);
      break;

    case 2:
      printf("The product of the numbers entered is %6.3f\n", fnum * snum);
      break;

    case 3:
      if (snum != 0.0)
        printf("The first number divided by the second is %6.3f\n",fnum / snum);
      else
      printf("Division by zero is not allowed\n");
      break;

    case 4:
      printf("The modulus of the numbers entered is %d\n", (int)(fnum % snum));
      break;

    default:
      printf("Need to enter a number between 1-4\n\n");

  }

  return 0;

}

Introducing a software bug and the gnu debugger (gdb) to find it

In this class we will use the gdb command line debugger. However, Apple replaced gdb (sadly) in 2015 with its own command line debugger called lldb – it’s very similar to gdb. If you want to develop code on your mac with I strongly recommend then you will have to also learn lldb. If you don’t want that taxation then use gdb but develop your code on a Linux machine in the lab. Here is a tutorial on lldb command-line debugger and here are examples of gdb commands with the lldb counterparts . The example below uses gdb.

You can use printf statements to help with inspection of variables and control flow through your program. However, this is a very primative way to debug your code. We will have a lecture on the art of debugging and how to use the gnu debugger (gdb) soon. But for now I would like to introduce gdb and use it to find a software bug.

You can checkout the lecture notes on gdb, type man gdb at the command line or google for more information.

First, let’s create a bug that will cause a segmentation fault (segfault) when we run operator.c: by changing the line

 scanf("%d", &opselect);

to

 scanf("%d", opselect);

notice we have just removed &

First, let’s make our mygcc alias has the gdb flag as shown below. If you use mygcc it will catch this errror as shown below - which is good right. It would save you debugging this problem. I will walk through the following sequence in class. We will use a number of basic gdb commands to find where the segfault occurs but first we will use the debugger just to step through the code and inspect some of the variables. Then we will introduce our error - this is a simple error easily made. After that we will use the backtrace command to find where the problem is.

Couple of good places to go on the web for information on gdb other than the manual pages. First for a detailed expose check out: the GDB manual]or a very short primer by Takashi Okumura that will provide a little for detail when understanding some of the gdb commands below.

Like the shell commands, gdb commands can be terse and difficult to remember. Here is a very good quick reference to gdb commands - all you need to know in terms of command syntax is here.

This simple debugging example is a start but you will have to become accomplished in gdb to be a good hacker. DO NOT RELY on printf - it is not your real friend, gdb is!

We will use a lot of the basic gdb commands in the example below such as break, run, next (use step if you want to step into a function and next - n for short - to execute the function but not step into it - subtle difference there), continue, display, printf, x (examine memory), backtrace (bt for short), and frame ( checkout stack frames - this is an important concept in c and very usefulf or debugging and poking around in your code and looking at variables) and list. These are most of the common commands.

I strongly recommend that you go through the sequence of steps below and use these debugging commmands. Don’t worry you can’t break anything. Google gdb or man gdb for more information of the commands above. Just like the shell commands you’ll only need a subset of the the complete set of gdb commands to become a very effective debugger. Again, printf is for dummies (or OK to get started) and not part of hacker’s parlance or the necessary tools in your toolkit: gdb is!

OK. Let’s get started on this walk through.


1) Check we have the right flags notice -ggdb

$ alias mygcc
alias mygcc=’gcc -Wall -pedantic -std=c99 -ggdb’

Note, if you debug on mac you will see a file (it’s actually a directory) in your current directory called filename.dSYM (where filename is the name of the compiled file). This directory stores the debug symbols for your app used by the debugger. So don’t remove it. For production code you would remove this directory before shipping your code. The mygcc flag -ggdb tells the compiler to produce the dSYM file – dSYM stands for Xcode’s Debug SYmbols file.


2) Let’s look at the man notes

$ man gdb

You can use GDB to debug programs written in C, C++, and Modula-2.  Fortran  support
will be added when a GNU Fortran compiler is ready.

GDB is invoked with the shell command gdb.  Once started, it reads commands from the
terminal until you tell it to exit with the GDB command quit.  You  can  get  online
help from gdb itself by using the command help.

You can run gdb with no arguments or options; but the most usual way to start GDB is
with one argument or two, specifying an executable program as the argument:

gdb program

You can also start with both an executable program and a core file specified:

gdb program core

Here are some of the most frequently needed GDB commands:

       break [file:]function
               Set a breakpoint at function (in file).

       run [arglist]
              Start your program (with arglist, if specified).

       bt     Backtrace: display the program stack.

      frame [args]  The frame command allows you to move from one stack frame to another, and to print the stack frame you select. args may be either the address of the frame or the stack frame number. Without an argument, frame prints the current stack frame.

       print expr
               Display the value of an expression.

       c      Continue running your program (after stopping, e.g. at a breakpoint).

       next   Execute next program line (after stopping); step over any function  calls  in
              the line.

       edit [file:]function
              look at the program line where it is presently stopped.

       list [file:]function
              type  the  text  of  the  program  in  the  vicinity of where it is presently
              stopped.

       step   Execute next program line (after stopping); step into any function  calls  in
              the line.

       help [name]
              Show  information  about GDB command name, or general information about using
              GDB.

       quit   Exit from GDB.

3) Let’s compile our code and run a debug session on good code to use some of the commands.

$ mygcc operator.c -o operator
$ gdb ./operator
GNU gdb 6.3.50-20050815 (Apple version gdb-1510) (Wed Sep 22 02:45:02 UTC 2010)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type ‘‘show copying’’ to see the conditions.
There is absolutely no warranty for GDB.  Type ‘‘show warranty’’ for details.
This GDB was configured as ‘‘x86_64-apple-darwin’’...Reading symbols for shared libraries .. done

(gdb) list
10*/
11
12#include <stdio.h>
13
14int main() {
  15
  16  int opselect;
  17  float fnum, snum;
  18
  19  printf(‘‘Please type in two numbers: ‘‘);
(gdb) break 19
Breakpoint 1 at 0x100000bcc: file operator.c, line 19.
(gdb) n
The program is not being run.
(gdb) run
Starting program: /Users/atc/teaching/cs50/notes/l5/operator
Reading symbols for shared libraries +. done

Breakpoint 1, main () at operator.c:19
19  printf(‘‘Please type in two numbers: ‘‘);
(gdb) n
20  scanf(‘‘%f %f’’, &fnum, &snum);
(gdb) n
Please type in two numbers: 5 5
21  printf(‘‘Enter a select code:’’);
(gdb) n
22  printf(‘‘\n 1 for addition’’);
(gdb) n
Enter a select code:
23  printf(‘‘\n 2 for multiplication’’);
(gdb) n
 1 for addition
 24  printf(‘‘\n 3 for division : ‘‘);
(gdb) n
 2 for multiplication
 25  printf(‘‘\n 4 for modulus  : ‘‘);
(gdb) n
 3 for division :
 26  scanf(‘‘%d’’, &opselect);
(gdb) n
 4 for modulus  : 2
 28  switch (opselect) {
(gdb) n
   34      printf(‘‘The product of the numbers entered is %6.3f\n’’, fnum * snum);
(gdb) n
The product of the numbers entered is 25.000
35      break;
(gdb) display fnum
1: fnum = 5
(gdb) display snum
2: snum = 5
(gdb) display opselect
3: opselect = 6
(gdb)# you can use display to look at the address of a variable using &
(gdb) display &opselect
4: &opselect = (int *) 0x7fff5fbff5ec
(gdb)# and importantly you can use x (examine) to examine memory
(gdb)# in this case we look at what is in the contents of  0x7fff5fbff5ec
(gdb)# the contents are 6. Recall that 0x.... represents a hexadecimal number
(gdb)# and in this case an address on the stack frame of main()
(gdb) x 0x7fff5fbff5ec
0x7fff5fbff5ec:0x00000006
(gdb)# x/uw just means e(x)amine memory (u)sing or display (w)ords; we could
(gdb)# have displayed bytes or floats. Check out the syntax of examine
(gdb) x/uw 0x7fff5fbff5ec
0x7fff5fbff5ec:6
(gdb)# We could also use printf which is almost the same syntax as c’s printf
(gdb)# We use % format converters for a pointer i.e., %p
(gdb) printf ‘‘print out the address of where variable opselect is stored %p\n’’, &opselect
print out the address of where variable opselect is stored 0x7fff5fbff5ec
(gdb) printf ‘‘%d\n’’, opselect
6
(gdb) # you can also use printf or display to look at memory too other than x (examine)
(gdb) printf ‘‘%d\n’’, *(0x7fff5fbff5ec)
6
(gdb) continue
Continuing.

Program exited normally.
(gdb) quit

4) Now, let’s add the error to operator.c by removing & and compile using mygcc

$ mygcc operator.c -o operator
operator.c: In function main:
operator.c:26: warning: format %d expects type int *, but argument 2 has type int

5) emacs operator.c and goto line 26 where the bug is - use ‘‘ESC g g’’ to gotoline.
We will not fix our error yet.

$ gdb ./operator
GNU gdb 6.3.50-20050815 (Apple version gdb-1510) (Wed Sep 22 02:45:02 UTC 2010)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type ‘‘show copying’’ to see the conditions.
There is absolutely no warranty for GDB.  Type ‘‘show warranty’’ for details.
This GDB was configured as ‘‘x86_64-apple-darwin’’...Reading symbols for shared libraries .. done

(gdb) run
Starting program: /Users/atc/teaching/cs50/notes/l5/operator
Reading symbols for shared libraries +. done
Please type in two numbers: 5 5
Enter a select code:
 1 for addition
 2 for multiplication
 3 for division :
 4 for modulus  : 2

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000
0x00007fff82bed80e in __svfscanf_l ()
(gdb) traceback
Undefined command: ‘‘traceback’’.  Try ‘‘help’’.
(gdb) backtrace
#0  0x00007fff82bed80e in __svfscanf_l ()
#1  0x00007fff82c3ae5b in scanf ()
#2  0x0000000100000c5f in main () at operator.c:32
(gdb) list 32
27  // scanf(‘‘%d’’, &opselect);
28
29  // This line below as an intentional bug. It is commented out.
30  // it causes a segmentation fault (segfault for short)
31
32  scanf(‘‘%d’’, opselect);
33
34  switch (opselect) {
  35
  36    case 1:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000
0x00007fff82bed80e in __svfscanf_l ()
(gdb) backtrace
#0  0x00007fff82bed80e in __svfscanf_l ()
#1  0x00007fff82c3ae5b in scanf ()
#2  0x0000000100000c5f in main () at operator.c:32
(gdb) list 32
27  // scanf(‘‘%d’’, &opselect);
28
29  // This line below as an intentional bug. It is commented out.
30  // it causes a segmentation fault (segfault for short)
31
32  scanf(‘‘%d’’, opselect);
33
34  switch (opselect) {
  35
  36    case 1:
(gdb) # This is a stack and reflect the call record in LIFO
(gdb) # at line 32 in my code we call scanf which calls __svfscanf_l ()
(gdb) # and then throws a signal because it tries to access address 0
(gdb) # which is a bad address. Becaue the trace disappears into the OS we look
(gdb) # at our code at line 33. Something might of happened there.
(gdb) display opselect
No symbol ‘‘opselect’’ in current context.
(gdb) #  Oh, can’t do that. Because the stack is currently in frame 0 which is the
(gdb) #  environment for where is bad access happended i.e., __svfscanf_l ()
(gdb) #  So to poke around in out main() code at line 32 we need to adjust the
(gdb) #  stack pointer to the right frame for the main(). We can see from above that that
(gdb) #  is frame 2. Once we excute the frame 2 command below we restore the stack to
(gdb) #  the main() function and we can then look at variables in our code.
(gdb) frame 2
#2  0x0000000100000c5f in main () at operator.c:32
32  scanf(‘‘%d’’, opselect);
(gdb) display opselect
1: opselect = 0
(gdb) #  so scanf passes the contents of opselect as an address - which in this
(gdb) # case we can see is 0! That is out problem
(gdb) quit
The program is running.  Exit anyway? (y or n) y
$

The break statement


  for ( expression1 ; expression2 ; expression3 ) {
       statement1 ;
       if( ... )
          break;
       statementN ;
  }

  while ( expression1 ) {
       statement1 ;
       if( ... )
          break;
       statementN ;
  }

  switch ( expression1 ) {
       case const1:
          statement 1;
          break;

       case const2:
          statement 2;
          break;

       case const3:
          statement 3;
          break;

       default:
          statement n;
          break;

   }

The continue statement


  for ( expression1 ; expression2 ; expression3 ) {
       statement1 ;
       if( ... )
          continue;
       statementN ;
  }

  while ( expression1 ) {
       statement1 ;
       if( ... )
          continue;
       statementN ;
  }