CS 50 Software Design and Implementation

Lecture 6

C Continued: Preprocessor, Functions, Data Structures, Arrays, and Strings

In this lecture we continue our introduction to the C langauge.

Goals

We plan to learn the following from today’s lecture:

OK. Let’s get started.

The C preprocessor

You will notice that a few lines, typically near the beginning of a C program, begin with the hash or pound sign, #. These lines are termed C preprocessor directives and are actually instructions (directives) to a special program called the C preprocessor (located in /lib/cpp). As its name suggests, the C preprocessor processes the text of a C program before the C compiler sees it. The preprocessor directives (all beginning with #) should begin in column 1 (the 1st column) of any source line on which they appear.

The C preprocessor is easily able to locate these lines and then examine the characters following the #. The following characters usually form a special word in the C preprocessors syntax which typically cause the preprocessor to modify the C program before it is sent to the C compiler itself. Although there are about 20 different preprocessor directives, well only discuss the most common one here and then a few others as we need them.

Header file inclusion

The #include directive, pronounced hash include, typically appears at the beginning of a C program. It is used to textually include the entire contents of another file at the point of the #include directive. A common #include directive, seen at the beginning of most C files is


#include <stdio.h>

This directive indicates that the contents of the file named stdio.h should be included at this point (the directive is replaced with the contents). There is no limit to the number of lines that may be included with this directive and, in fact, the contents of the included file may have further #include directives which are handled in the same way. We say that the inclusions are nested and, of course, care should be taken to avoid recursive nestings!

The example using <stdio.h>, above, demonstrates two important points. The filename itself appears between the characters < ... >. The use of these characters indicates that the enclosed filename should be found in the standard include directory, /usr/include. The required file is then /usr/include/stdio.h.

The standard include files are used to consistently provide system-wide data structures or declarations that are required in many different files. By having the standard include files centrally located and globally available, all C programmers are guaranteed of using the same data structures and declarations that they (all) require. C99 only defines 15 operating system independent header files.

Have a (recursive) look in the /usr/include directory yourself and you see that there are over 2000 standard include files available under LINUX!

Importantly, it is the use of the < ... > characters which signify that the /usr/include directory name should be prepended to the filename to locate the required file. Alternatively, the “ ... ” characters may also be used, as in the following example:


#include "mystructures.h"

to include the contents of the file mystructures.h at the correct point in the C program. Because the “ ...” characters are used, the file is sought in the present working directory, that is ./mystructures.h. By using the “ ...” characters we can specify our own include files which are located in the same directory as the C source programs themselves.

In both of the above examples the indicated filename had the “extension” of .h. Whereas we have previously said that the extension of .c is expected by the C compiler, the use of .h is only a convention within UNIX. The .h indicates that the file is a header file, because they generally contain information required at the head (beginning) of a C program. Header files typically (and should) contain only declarations of C constructs, like data structures and constants used throughout the C program. In particular, they should not contain any executable code, variable definitions, nor C statements.

Defining textual constants

Another frequently used C preprocessor directive is the #define directive, pronounced hash define. The #define directive is used to introduce a textual value, or textual constant, which when recognized by the C preprocessor will be textually substituted by its definition. Traditionally #define directives were the only method available to C programmers, using old K&R (Brian Kernighan and Dennis Ritchie) C, of introducing constants in C programs. For example, frequently used #define-ed constants are:


#define FRESHMAN 1
#define SOPHOMORE 2
#define JUNIOR 3
#define SENIOR 4

After these definitions, each time the C preprocessor locates the sequence JUNIOR as a complete word within the C program, it will be substituted for the character sequence 3. Although the new ANSI-C standard has introduced a formal const construct for supporting constants, the #define directive is still the preferred method of defining some forms of constants. For example, when defining an array of integers (described in greater detail later) we use a #define directive to define the maximum size of the array.

Thereafter we use the #define-ed constant in the array definition:


#define MAXSIZE 100
int myarray[MAXSIZE];

If necessary, a preprocessor token may be undefined is no longer required:


#undef MAXSIZE

Textual, inline functions

The #define directive may also be used to define some inline functions, more correctly termed macros, within your C programs. An often cited example is:


#define sqr(x) x * x

C does not have a standard function for calculating the square of, say, an integer value, but using the inline macro defined above, we can now write:


result = sqr(i);

where i is an integer variable. Notice that the macro substitution was performed with the macros argument being i. In a manner akin to actual and formal parameter naming in Java (and C), the actual parameter i is represented in the macro as the formal parameter x without problems. Each time x appears as a unique “word” in the right-hand-side of the definition, it will be replaced in the C code by i.

Notice that this textual substitution may also be used for calculating (in this example) the square of an integer constant. For example:


result = sqr(3);

is expanded in an identical way. Our definition of sqr is not really rigourous enough to provide correct results in all cases. For example, consider the “call” to sqr(x+1) which would evaluate to 2x+1! A more correct definition would be:


#define sqr(x) ((x) * (x))

Conditional compilation

Another often used feature of the C preprocessor is the use of conditional compilation directives. The C compile pre-defines a few constants to “tell” the program the operating system in use, filename being compiled, and so on:


  #if defined(linux)
     /* compile code specific to LINUX */
     ......
  #elif defined(WIN32)
     /* compile code specific to Windows */
     ......
  #elif defined(sun)
     /* compile code specific to Suns Solaris */
     ......
  #endif

Functions

Java supports constructors and methods which allocate instances of, and interrogate and modify the state of, their own (implicit) objects. Constructors and methods are typically directed by their parameters. C is a procedural programming language, meaning that its primary synchronous control flow mechanism is the function call. Strictly speaking, C has no procedures, but instead has functions, all of which return a single instance of a base or user-defined type. Cs functions access and modify the global memory, and (possibly) their parameters. Although we may hope that a function can only modify memory that it can “see” (through Cs scoping rules) or has been provided (through its parameter list), this is untrue. By stating that there are only functions, we are noting that all functions must return a value. While nearly true, C also has a void type, difficult to describe, and often used as a place holder (to keep the compiler happy!). We may think of a procedure in C, as a function that returns a void; that is to day, nothing is returned. With a similar thought, we will often invoke a function, but have no use for its return value. For example, a function such as printf() will return an integer as its result, but we rarely need to use this integer. We can “cast its value” to void, effectively throwing away the value.


printf( .... );

The default return datatype of a function is int if a functions datatype is omitted, the compiler assumes it to be an int. This has the unpleasant result, that if an external or yet to be defined functions prototype is omitted, the compile will often silently assume an int return result. This is a frequent cause of problems, particularly when dealing with functions returning floating point values, as in Cs mathematics library. The use of gccs -pedantic switch allows us to trap such errors.

Every complete C program has an entry point named main, at which it appears the operating system calls the program. Function main is of type int this int is returned as the result of execution of the whole program, with 0 indicating a successful execution, anything non-zero otherwise. Cs functions may receive zero or more parameters. All parameters to Cs functions are passed by value.

Other than within a single file, the datatype of function parameters between the functions definition and invocation is not checked, i.e. C provides no link-time cross file type checking. Perhaps surprisingly, C also permits functions to receive a variable number of parameters. At run-time it is the functions responsibility to deal with the data types received, and the compiler cannot perform any type checking on these parameters.

Function parameters are implicitly promoted to “higher” datatypes by the compiler chars are promoted to ints, and floats are promoted to doubles.

The following example code used functions. The code toss.c asks the use to enter the number of fair toss of a coin and then computes the number of heads and tails. Random number generators are used.

C code: toss.c

The contents of toss.c looks like this:


/*
  File: toss.c

  Description:  The user enters the numbers of fair tosses of a coin and the
  program computes the number of heads and tails and prints them out.

  Input: User enters the number of tosses

  Ouput: Displays the number of fair tosses of heads and tails.

  Revised from program 6.9 in Bronson page 313

*/

/* preprocessor include files found in /usr/include/ directory *

#include <stdio.h>   /* needed to call printf() and scanf() */
#include <stdlib.h>  /* needed to call srand() and rand() */
#include <time.h>    /* needed to call the time() function */

/*
   toss is toss a coin a number of times (which is the input) and returns
   the number of heads, and therefore the number of tails can be computed
*/

/* prototpe declarations */

int tossCoin(int );
void printStats(int , int);

int main() {

  int numTosses, numHeads;

  /*
     Generate the first seed value. A NULL arguments forces
     time() to read the computer’s internal time in seconds.
     The srand then uses this value to initialize rand()
  */

  srand(time(NULL));

  printf("How many tosses of a fair coin shall we do? Please enter: ");

  scanf("%d", &numTosses);

  printf("Ok, you entered %d tosses, so we’d expect %d heads and %d tails\n",
          numTosses, numTosses/2, numTosses/2);

  numHeads = tossCoin(numTosses);

  printStats(numHeads, numTosses);

  return EXIT_SUCCESS;
}

/* printStats function */

void printStats(int heads, int tosses) {

  printf("The number of heads %d, tails %d\n", heads, (tosses-heads));

}


/* tossCoin function */

int tossCoin(int tosses){

  int i;
  int heads = 0;

  for (i=1; i <= tosses; i++) {

    if ((1 + (rand()%100)) <= 50) {

      heads++;

    }

  }

  return(heads);

}

Data structures

C has no equivalent construct to the Java class. Instead, C provides two aggregate data structures arrays and structures.

Arrays in C are not objects, nor strictly single variables. Instead, an arrays name is the name referring to the first memory address of a contiguous block of memory of the requested length. Arrays may be declared or defined wherever scalar variables are declared or defined arrays may be either arrays of Cs base types or user-defined types.

There is no array keyword in C, and no bounds checking at run-time. C array subscripts commence at 0, the highest valid subscript of int a[N] thus being N-1.

One dimensional arrays defined with (for example) int score[20];

-> declare score as array of 20 int


  int score[20];

  total = 0;
  for(i=0 ; i<20 ; i++)
     total = total + score[i];

Multi-dimesntional arrays?

Strictly speaking, C does not support multi-dimensional arrays. However, if all (one-dimensional) arrays in C are considered as vectors, then multi-dimensional arrays are simply understood as “vectors of vectors”.

-> explain char str[10][20]

declare str as array of 10 array of 20 char
The number of elements of an array can be determined with :


  #define NELEMENTS (sizeof(score) / sizeof(score[0]))

      for(i=0 ; i<NELEMENTS ; i++)
          total = total + score[i];

User-defined C Structures

Structures in C are aggregate datatypes consisting of fields or members of base types, or other user-defined types. C structures may not include executable code, unlink methods in Java classes.


  struct person {
     char name[20];
     char addr[80];
     int age;
  };

  struct person p1, p2;
  int ages;

  ages = p1.age + p2.age;   /* the sum of their ages */

  if(strcmp(p1.name, p2.name) == 0) ... /* do they have the same name? */

Character arrays and strings

C provides no base type that is a string, though the C compiler accepts the use of double quoted character string literals and does the obvious thing. A string in C is a sequence of characters (bytes) in contiguous memory locations. The string is terminated by the sentinel value of the NULL character (zero byte). When a C compiler detects a string literal in a program, it will allocate enough contiguous global (read-only) memory to hold the characters of the string (including the NULL byte at the end).

C does not record the length of a string anywhere (as does Java). Instead, by convention, the length of a string is defined as the number of characters from the beginning of the string (its starting address) up to, but not including, the NULL byte. The length of “hello” is 5.

Arrays of characters are typically used to store character strings. Notice that the parameter to the following function does not indicate any expected (maximum) size, or “length”, of the array.


  int my_strlen(char str[]) {

     int i = 0, len = 0;

     while( str[i] != ’\0’ ) {
         len++;
         i++;
     }
     return(len);
}

The snippet of code below include two strings that are compared. Literal strings are stored as ASCII codes and the strng comparison strcmp compares each character’s ASCII code in making the comparison. If  s1 < s2 then the return value < 0, s1 > s2 then the return value > 0, s1 = s2 then the return = 0.

C code: string.c

The contents of string.c looks like this:


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(){

  char s1[] = "Beb", s2[] = "Bee", s3[] = "Bee";
  int cmp;

  // First test s1 - s2

  cmp = strcmp(s1, s2);
  printf("s1 (%s) < s2 (%s) returned cmp is %d\n", s1, s2, cmp);

  // Second test s2 - s1

  cmp = strcmp(s2, s1);
  printf("s2 (%s) > s1 (%s) returned cmp is %d\n", s2, s1, cmp);

  // Third test, s2 = s3

  cmp = strcmp(s2, s3);
  printf("s1 (%s) = s1 (%s) returned cmp is %d\n", s2, s3, cmp);

  return EXIT_SUCCESS;

}

[atc@Macintosh-10 l6]$ mygcc -o string string.c
[atc@Macintosh-10 l6]$ ./string
s1 (Beb) < s2 (Bee) returned cmp is -3
s2 (Bee) > s1 (Beb) returned cmp is 3
s1 (Bee) = s1 (Bee) returned cmp is 0
[atc@Macintosh-10 l6]$

C code: array.c

The contents of array.c looks like this:


/*

   File: array.c

   Description: This code sets up a two demensional array and
                prints out the contents of each locaion.


   Revised from program 8.7 in Bronson page 405

*/

#include <stdio.h>
#include <stdlib.h>

#define NUMROWS 3
#define NUMCOLS 4

int main()
{
  int i, j;

  int val[NUMROWS][NUMCOLS] = { 8,16,9,52,
                                3,15,27,6,
                                14,25,2,10 };

  /* explicitly print out each element of the array */

  printf("\nDisplay of val array by explicit element");
  printf("\n%2d %2d %2d %2d",
         val[0][0],val[0][1],val[0][2],val[0][3]);
  printf("\n%2d %2d %2d %2d",
         val[1][0],val[1][1],val[1][2],val[1][3]);
  printf("\n%2d %2d %2d %2d",
         val[2][0],val[2][1],val[2][2],val[2][3]);

  /* loop through and print out the array */

  printf("\n\nDisplay of val array using a nested for loop");
  for (i = 0; i < NUMROWS; i++)
  {
    printf("\n"); /* start a new line for each row */
    for (j = 0; j < NUMCOLS; j++)
      printf("%2d ", val[i][j]);
  }
  printf("\n");

  return EXIT_SUCCESS;
}


If we run array we get the following output


$ ./array

Display of val array by explicit element
 8 16  9 52
 3 15 27  6
14 25  2 10

Display of val array using a nested for loop
 8 16  9 52
 3 15 27  6
14 25  2 1

The next code snippet shows the address of an array and some of its elements. It importantly shows the equivalence of two types of common notation in dealing with addresses and arrays, Run the code below. Note that the & operator is not used before the array name. Because an array is a pointer constant equivalent to the address of the first storage location reserved for the array. the expressions “numbers” and &numbers[0] are equivalent. As an aside, if you wanted to pass the address of an array in a function call you could replace &numbers[0] with simply numbers. */

C code: array-address.c

The contents of array-address.c looks like this:



/*

   File: array-address.c

   Description: This code prints out various address related items
                in an array. It also shows the equivalence between
the plain name of array in this case ‘‘numbers’’ and
the use of notation ‘‘&(numbers[0])’’


   Revised from program 8.10 in Bronson page 411

*/

#include <stdio.h>
#include <stdlib.h>

#define NUMELS 20

int main() {

  int numbers[NUMELS];

  printf(The starting address of the numbers array using notation &numbers[0] is: %p\n,
         (void *)&numbers[0]);
  printf(‘‘The storage size of each array element is: %ld\n’’,
                                                   sizeof(int));
  printf(‘‘The address of element numbers[5] is : %p\n’’, (void *)&numbers[5]);

  printf(‘‘The starting address of the array,\n’’);

  printf(‘‘  using the notation numbers, is: %p\n’’, (void *)numbers);

  printf(‘‘Therefore, the notation &numbers[0] and numbers are equivalent!\n’’);

  return EXIT_SUCCESS;

}


Let’s look at the output from array-address



$ ./array-address
The starting address of the numbers array using notation &numbers[0] is: bffff880
The storage size of each array element is: 4
The address of element numbers[5] is : bffff894
The starting address of the array,
  using the notation numbers, is: bffff880
Therefore, the notation &numbers[0] and numbers are equivalent!