In this lecture, we will go over the basics of C programing.

Goals

Our goals today are to understand:

  • Operators
  • Precedence
  • Base data types
  • Storage modifiers
  • Scope rules of global and local variables
  • Flow of control in C programs

Activity

Our in-class activity is to compile and modify guessprime4.c.

Basic Operators

Nearly all operators in C are identical to those of Java. However, the role of C in system programming exposes us to much more use of the shift and bit-wise operators than in Java. Here are the basic operators:

  • Assignment =
  • Arithmetic +, -, *, /, %, unary
  • Priorities may be overridden with ( )’s.
  • Relational (all of these have the same precedence) >, >=, <, <=
  • Equality == , !=
  • Logical && (and), || (or), ! (not)

Pre- and post- decrement and increment operators

Any (integer, character or pointer) variable may be either incremented or decremented before or after its value is used in an expression.

Take a look at this example increment.c:

/*
 * increment.c - illustrate pre and post increment and decrement.
 *
 * CS 50, Fall 2022
 */ 

#include <stdio.h>

int main() 
{
     int fred = 3, a=3;

     printf("Start; fred = %d and a = %d\n", fred, a);
     a = --fred;
     printf("a = --fred; fred = %d and a = %d\n", fred, a);
     a = ++fred;
     printf("a = ++fred; fred = %d and a = %d\n", fred, a);
     a = fred--;
     printf("a = fred--; fred = %d and a = %d\n", fred, a);
     a = fred++;
     printf("a = fred++; fred = %d and a = %d\n", fred, a);

     return 0;
}

--fred will decrement fred before its value is used

++fred will increment fred before its value is used

fred-- will get (old) value and then decrement fred

fred++ will get (old) value and then increment fred

Let’s compile it with mygcc (defined in .bashrc and .bash_orofile from day 2) and run the generated executable:

$ alias mygcc
alias mygcc='gcc -Wall -pedantic -std=c11 -ggdb'
$ mygcc -o increment increment.c
$ ls -l increment
-rwxr-xr-x 1 d84xxxx thayerusers 17280 Sep 17 16:43 increment*
$ ./increment 
Start; fred = 3 and a = 3
a = --fred; fred = 2 and a = 2
a = ++fred; fred = 3 and a = 3
a = fred--; fred = 2 and a = 3
a = fred++; fred = 3 and a = 2

The compiler produces an executable file named increment (specified by the -o option). We do not need chmod to make it an executable. The compiler takes care of that when it creates an executable with the correct permission for the file increment .

Bitwise operators and masking

Ultimately, every integer variable is just a bit string, so C has operators to allow you to manipulate an integer as a sequence of bits:

& (bitwise and), | (bitwise or), ~ (bitwise negation).

You can use these to check if certain bits are on, as in (nextchar & 0x30), which computes a bitwise ‘and’ operation between the variable nextchar and the hexadecimal constant number 0x30.

You can also shift bits to the left or right:

Shift operators << (shift left), >> (shift right)

Note: results may vary based upon whether the type of the variable being shifted is “signed” or “unsigned”. See H&S pp.231-233.

Combined operators and assignment

When the result of an operation is assigned to the same variable that the operator is operating on, then the operator and the assignment can be combined. For example a = a + 3 can be written as a += 3. See this simple code combined.c for more examples.

Precedence of operators

Expressions are all evaluated from left-to-right, and the default precedence may be overridden with brackets.

Operator
Precedence
highest
( ) [ ]
++ -- !
* / % |
+ -
== !=
&
|
&&
||
?:
=
,
lowest

Variables

Variable names (and type and function names as we shall see later) must commence with an alphabetic or the underscore character A-Z a-z and be followed by zero or more alphabetic, underscore or digit characters A-Z a-z 0-9. Most C compilers, such as gcc, accept and support variable, type, and function names to be up to 256 characters in length. (Some older C compilers only supported variable names with up to 8 unique leading characters and keeping to this limit may be preferred to maintain portable code.) It is also preferred that you do not use variable names consisting entirely of uppercase characters. All-uppercase variable names are typically reserved for constants (such as MAXBUFSIZE, AVAGADROS_NUMBER, MAXUSERS).

Importantly, C variable names are case sensitive, so MYLIMIT, mylimit, Mylimit and MyLimit are four different variable names.

There are some specific variable/function naming styles that you may encounter. The major ones are

  • camelCase: writing compound words with the first letter of each word capitalized, except for the first word’s first letter, which is not capitalized.

  • PascalCase: writing compound words just as in camelCase* with the first letter of the first word also capitalized. (In Java it is common to use this case for class names, but camelCase for member names.)

  • snake_case: writing compound words with an underscore between each word with little, if any, capitalization.

Any programming project, including all of your assignments, should pick a variable/function naming style and stick with it.

In C, every constant, variable, or function name is declared to be of a certain type. This type may be either a base type supported by the C language itself, or a user-defined type consisting of elements drawn from C’s set of base types.

Basic types

Here are C’s basic types:

type print description
void   the void type
char %c the character type
short %h the short integer type (sometimes shorter than int)
int %d the standard integer type
long %ld the longer integer type (sometimes longer than int)
bool %d the Boolean type, representing true/false
float %f the standard floating-point (real) type
double %f the extra precision floating-point type
long double %LF the super precision floating-point type

To use bool one must #include <stdbool.h>.

The table above lists some of the basic variables available to C programmers. We first list the format string if you use these variables in a print statement. For example

int a = 5;
printf("a = %d\n",a);

We can determine the number of bytes required for datatypes (and other things, as we will see later) with the sizeof operator. In contrast, Java defines how long each datatype may be. In C, the sizes vary from machine to machine, with the details managed by the compiler. C’s only guarantee is that:

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)

Examine data-types.c, which defines variables of each of the base types and prints the sizeof each one.

$ mygcc data-types.c -o data-types
$ ./data-types 
-------contents ------- sizeof()------

contents of char is a --- sizeof 1 bytes

contents int is  2 --- sizeof  4 bytes

contents short is  3 --- sizeof 2 bytes

contents long is 4 --- sizeof 8 bytes

contents bool is 1 --- sizeof 1 bytes

contents float is 1000.256714 --- sizeof 4 bytes

contents double is  1.100000e+24 --- sizeof 8 bytes

contents long double is  1E+31 --- sizeof 16 bytes
$ 

The above ran on plank.thayer.dartmouth.edu. Although these sizes are common for Linux machines today (2022), it is possible they may change in the future; code should not be dependent on specific sizes.

void

The void type is different than all of the others, in that it is not possible to define and use a void variable. As such, it has no size. It is used for two purposes: (1) to indicate that a function returns no value, or (2) to indicate (with void*) a “pointer to anything”. We’ll see examples of both.

Type coercion

C permits assignments and parameter passing between variables of different types using type casts or coercion.

For example,

int a = 4;
float b = 0.0;

b = a;  // implicitly converts from int to float
a = b;  // implicitly converts from float to int
a = (int) b; // explicit "cast" from float to int 

Some casts in C must be made explicit, and are used where some languages require a ‘transfer function’. We will see examples of C’s cast operator later in the course.

Unsigned integers

Normal integers (int, short, long) can represent both negative and positive numbers. For example, we saw above that a short is 2 bytes in size, thus 16 bits; it can represent -32,767 .. +32,768 (plus or minus 2^15). But an unsigned short, still 16 bits in size, represents only non-negative integers, thus from 0 .. 65,535 (2^16-1).

Programs use unsigned integers when they need double the precision and never need to represent a negative value.

Constant variables

Although a constant variable sounds like an oxymoron, and it is, C allows us to define just such a thing. Some examples:

const float pi = 3.14159535;
const int maxNameLength = 50;

int main(const int argc, const char* argv[]);

The const storage modifier tells the compiler that this named value cannot be changed; it is useful for constants (like the first two examples) or for declaring that a function will not change its parameter values. In CS 50 we urge the use of const whereever appropriate, that is, whereever the programmer expects that named value to never change… and if the code accidentally tries to change the value, the compiler will issue a warning. Helps avoid tragic mistakes!

User-defined types (typedef)

It is often helpful to define a new type, in effect, an ‘alias’ for an existing type. Doing so can make the code more readable, or help to abstract the particularities of an implementation from other code that uses the type.

For example, before C supported a Boolean type, it was common to implement it this way:

typedef short boolean;   // defines a new type named 'boolean'
const boolean TRUE = 1;  // defines a new constant of type 'boolean'
const boolean FALSE = 0; // defines a new constant of type 'boolean'

Then one could write code with Boolean variables, e.g.,

boolean success = FALSE; // defines a new variable of type 'boolean'

In CS 50 we’ll see typedef used more often for compound types. Those come later.

extern

Variables, constants, and functions can be declared extern, meaning that their implementation occurs in a different C source file. It is common to see extern declarations in a header (.h) file, which when included in a source (.c) file allows the compiler to learn about the existence of something whose definition lays elsewhere. The compiler makes a note of it, leaving it for the linker to resolve later. For example, we see the following declaration in readline.h:

extern bool readLine(char* buf, const int len);

and then in readline.c we see the definition (implementation) of that function:

bool 
readLine(char* buf, const int len) 
{
...
//implementation here
...
}

static

The confusingly-named static modifier has two uses. For global names (outside of any function), they indicate to the compiler that this name should only be visible within the current source (.c) file. You can think of the global variable as being “private” to this file.

In CS 50, we most commonly see static used to “hide” functions within a a source file representing a module; as a result they are not visible to other source files, and they are not part of the “interface” to the module. In this regard, it is like the difference between “private” and “public” methods in Java. We’ll come back to this idea in more detail, later.

The static modifier has another use, though you will not use it in CS 50. Inside a function, the static modifier on a local variable indicate that the value of this variable should persist even after the function returns. This feature should be used in only special circumstances and with great care.

Scope rules

As shown above, in C you can name constants, variables, functions, and even types (using typedef). Each name has a scope, that is, the section of the source file where the variable is ‘visible’, that is, valid for use in the code.

C has true main types of scope, global and local, though in principle every block and sub-block (as defined by {} braces) can define a new subsidiary scope.

Once a name is defined, it is valid from that point in the source file down to the end of its scope.

Global names are typically defined near the top of the source file and are thus valid through the rest of the source file, within every function and the blocks within them.

Local names occur within a function; these include the function parameters, and other names defined within the function body. As a matter of style and convention, local names are typically declared at the top of a function body, although there are instances where it is more readable to declare them further below.

Other, narrower scopes can occur inside a statement block within a function body; the most common example are loop variables – names valid only within the loop body.

Let’s look at an example of each of these three types: global, local, and loop scopes. But first, a quick note about initialization.

Variable initialization

Constant variables can be assigned only once: either initialized when they are defined, or assigned once thereafter.

In CS 50 we advocate for initializing every variable right there in the definition, even if only to some default value. Here’s why.

Global variables are initialized to zero, by the compiler and linker.

Local variables (whether at the function or block level) are not initialized; thus, their initial value is undefined — whatever happens to be in memory, aka, garbage. It is dangerous to use the value of an uninitialized variable, especially a pointer, because it leads to undetermined results - or crashes your program.

Global names

Global names (variables, functions, types, etc.) are visible from the point where they are declared until the end of that file. Thus, they are typically declared near the top of the file.

This rule (from the declaration to the end of the file) is the main reason why we tend to declare all a file’s functions near the top of the file: so they can be called from any point within the file. For example, look at guess6.c, which declares three functions near the top:

/* function prototype *declaration*; the function is *defined* below */
int askGuess(const int low, const int high);
int pickAnswer(const int high);
bool str2int(const char string[], int* number);

and then uses them in the main() function.

In CS 50 we never use global variables. They are risky, and thus bad style. Thus, in CS 50, global names are primarily functions, constants, and types.

static vs extern

As discussed above, if a global name is declared or defined with a static modifier, it means that the variable is only available from within that file.

If a global name is declared with an extern modifier, it means it is not necessarily defined (implemented) within this file; the linker will need to later find its definition in another file, and link its use in this file with the implementation in that other file. The variable may be declared as extern in all files, but must be defined (and not as a static!) in exactly one file.

Look again at guess6.c; because it includes readline.h near the top, the code from that file is incorporated into guess6 right at that point, including the line

extern bool readLine(char* buf, const int len);

Later, in a separate run of the compiler, readline.c also includes readline.h near the top of its file, declaring function readLine(). In this case, the compiler also finds the definition of function readLine(). That’s fine; the function is declared in several files, but defined in only one.

If neither static or extern modifier is applied, that name may be accessed from another source file… though in that file, it would need to be labeled with the extern modifier. Look again at guess6.c; the three functions it declares and defines are not marked static or extern; they would thus be potentially visible to other code modules linked with this program. In such a small program, this issue is not important, but in a larger, more sophisticated program, we should mark those functions static.

Local names

Local names (typically local variables and constants) are accessible from the point of their definition until the end of the block in which they are defined.

When the function returns, the variable’s memory is automatically deallocated and, if the function is called again, is reallocated… but with undetermined content. As above, we recommend initializing all local variables.

A local name may shadow that of a global variable, making that global variable inaccessible. Blocks do not have names, and so shadowed variables cannot be named. For example,

int x, y, z;  // global variables

int foo(int x, int y) // shadows the globals named 'x' and 'y'
{
    int z = 99; // shadows the global named 'z'
    int sum = x + y + z;  // uses the local variables only
}

Loop variables and statement blocks

Variables may also be declared at the beginning of a statement block, but may not be declared anywhere other than the top of the block. Such variables are visible until the end of that block.

The most common use of this capability is for loop variables, specifically, for loops. For example, we diagram the core of sqrt.c below.

labeled diagram of the sqrt code

The function main() has three local names: two parameters (argc, argv, both defined with the const modifier and thus they act as local constants) and one local variable (exit_status).

Its for loop is a nice example of the use of a loop variable i to index the iterations over an array, and two statement-block variables to hold information only needed within that statement block (the body of the for loop).

Examples of global and local variables

Let’s look at some additional code snippets to reinforce the ideas of local and global variables and the issue of the scope of these variables in a section of code.

Example scope.c

/*
 * scope.c - illustrates the use of global and local variables and 
 * global function prototypes.
 *
 * Revised code taken from pg. 330 (Program 7.1) (Bronson) "First 
 * Book on ANSI C"
 * 
 * CS 50, Fall 2022
 */

#include <stdio.h>

/* firstnum is a global variable not defined on the main() stack. It has full scope of all functions in the file scope.c. Any code in the file can read and write to it. Once main() teriminates the variable is dellocated and no longer exists */
int firstnum; // create a global variable named firstnum

void valfun(); // global function prototype

int main()
{
  int secnum; //create a local variable named secnum
  firstnum = 10; //store a value into the global variable
  secnum = 20; // store a value into the local variable

  printf("\nFrom main(): firstnum = %d",firstnum);
  printf("\nFrom main(): secnum = %d\n",secnum);

  valfun(); // call the function valfun

  printf("\nFrom main() again: firstnum = %d",firstnum);
  printf("\nFrom main() again: secnum = %d\n",secnum);

  return 0;
}

void valfun() // no values are passed to this function
{
  /* secum is a local variable created on the stack when valfun() executes. When valfun() exits the stack is deallocated and the variable no longer exists. It is local and its scope is valfun()*/
  int secnum; // create a second local variable named secnum 
  secnum = 30; // this only affects this local variable's value

  printf("\nFrom valfun(): firstnum = %d",firstnum);
  printf("\nFrom valfun(): secnum = %d\n",secnum);
  firstnum = 40; // this changes firstnum for both functions
}

If we run the code the output is as follows:

$ mygcc -o scope scope.c
$ ./scope

From main(): firstnum = 10
From main(): secnum = 20

From valfun(): firstnum = 10
From valfun(): secnum = 30

From main() again: firstnum = 40
From main() again: secnum = 20

Study the output. Is it what you expected?

The first thing to note about the source code is that it defines a global variable firstnum that’s scope is the complete file and therefore is accessible from main() and the valfun().

Note scope.c has a main() and a valfun() function. The prototype for valfun() is declared at the top of the file, giving it global scope in the file scope.c. Both main() and valfun() update and print the value of firstnum, which represents a variable with a memory address (space is not allocated on the stack as in the case of local variables such as secnum). Note that main() and valfun() both have local variables named secnum. This name collision is not a problem and causes no clash because of the scope of these two different local variables (that happen to have the same name) only have local scope inside the main() and valfun() functions, respectively. Their instances are private to main() and valfun(), respectively. They have no association other than having the same names. They are local variables created on the stack and no longer exist when the function exits. For example, valfun() creates a variable for secnum of integer type on its local stack when it executes, but when it returns control to main() the stack is deallocated and the variable no longer exists. In contrast, the global variable firstnum and its current value are not changed when valfun() exits.

Examples of local and static local variables

Consider another storage modifier that is impacted by scope: static. Here the variable is placed in global storage with limited visibility depending on where it is defined. Let’s look at two code snippets that illustrate the use of local and static variables. These represent two important cases in C.

First, let’s look at the case of local variables.

Example: auto.c

/*
 * auto.c - illustrates the auto local variables.
 *
 * Code taken from pg. 336 (Program 7.2) (Bronson)
 *  "First Book on ANSI C"
 *
 * CS 50, Fall 2022
 */
#include <stdio.h>

void testauto(); // function prototype

int main()
{
  int count; // create the auto variable count
  for (count = 1; count <= 3; count++ )
    testauto();
  return 0;
}

void testauto()
{
  int num = 0; // create the auto variable num, initialized to zero
  
  printf("The value of the automatic variable num is %d\n", num);
  num++;
}

If we run the code the output is as follows:

$ ./auto
The value of the automatic variable num is 0
The value of the automatic variable num is 0
The value of the automatic variable num is 0

Study the output. Is it what you expected?

Now let’s look at the case when num is defined as static inside the scope of the function teststat(). See the example static.c.

/*
 * static.c - illustrates the use of auto variables, with `static`.
 *
 * Code taken from pg. 336 (Program 7.2) (Bronson)
 *  "First Book on ANSI C"
 *
 * CS 50, Fall 2022
 */
#include <stdio.h>

void testauto(); // function prototype

int main()
{
  int count; // create the auto variable count
  for (count = 1; count <= 3; count++ )
    testauto();
  return 0;
}

/* The variable num in teststat() is only set to zero once. The value set by the local scope static variable num detains its value when teststat() returns.*/
void testauto()
{
  static int num = 0; // num is a local static variable
  
  printf("The value of the automatic variable num is %d\n", num);
  num++;
}

Note that we have changed num in the testauto function to be static. If we run static.c, the output is as follows:

$ ./static
The value of the static variable num is now 0
The value of the static variable num is now 1
The value of the static variable num is now 2

Note, that the value of num is now persistent across multiple invocations of the function. This is in direct contrast to the local variable of the last code snippet - i.e., auto.c. In essence, the operator static allocates memory to the variable of type int that is outside the stack just like a global variable in scope.c - i.e., firstnum. However, the distinction here is that static is not global. It is only accessible in the function teststat().

Question: If I have defined static int num; at the top of static.c how would that change the scope of the static variable? Is it different to int num; as defined as global variable (like firstnum in scope.c)? Yes, if a global variable is defined as static, the variable is not visible outside the file in which it was declared. The variable is essentially ‘private’ to that file.

Math

Although the standard C library includes many useful functions, there is an entirely separate library with a plethora of mathematics functions.

For details, study the man page for the desired function, such as

  • man 3 sqrt - square root
  • man 3 pow - raise a number to a power
  • man 3 cos - cosine

Here we give an argument 3 to man to ask it to look in “section 3 of the manual”, because library functions are described in section 3 (whereas commands like ls and bash are in section 1).

To use any of these functions, you must include the header file in your .c file:

#include <math.h>

and you must ask the linker to link with the math library:

$ mygcc sqrt.c -lm -o sqrt

The -l option indicates you want to link with one of the standard libraries, and the library called m is the math library.

Look at the example sqrt.c.

/*
 * sqrt - demonstrate use of the math library
 * 
 * usage:
 *    sqrt [number]...
 * where 'number' is an integer or floating-point number
 * 
 * To compile, you must link with the math library:
 *    mygcc sqrt.c -lm -o sqrt
 *
 * CS 50, Fall 2022
 */

#include <stdio.h>
#include <stdlib.h>
#include <math.h>   // declares all functions from the math library

int
main(const int argc, const char *argv[])
{
  int exit_status = 0;
  
  for (int i = 1; i < argc; i++) {
    float number; // the numeric equivalent to the argument
    char extra;   // any extraneous characters after parsing number

    if (sscanf(argv[i], "%f%c", &number, &extra) == 1) {
      printf("sqrt(%f) = %f\n", number, sqrt(number));
    } else {
      printf("%s: invalid number\n", argv[i]);
      exit_status++;
    }
  }

  return exit_status;
}

The following is example run:

$ mygcc sqrt.c -lm -o sqrt
$ ./sqrt
$ ./sqrt 4 
sqrt(4.000000) = 2.000000
$ ./sqrt 4 10 100 x 99.y 
sqrt(4.000000) = 2.000000
sqrt(10.000000) = 3.162278
sqrt(100.000000) = 10.000000
x: invalid number
99.y: invalid number
$ echo $?
2
$

This program demonstrates

  • floating-point variables and the %f specifier for printf() and scanf(),
  • use of the math function sqrt() from the math library,
  • a defensive way to parse an argument string into a number,
  • definition of variables (here, i, number, and extra) that are only defined within the scope of a loop (here, the for loop),
  • use of a variable to track the future exit status of the program — which should be 0 on success, non-zero for any failures,
  • the bash special variable $? that holds the exit status of the most recent command.

Flow of control in a C program

Control flow within C programs is almost identical to the equivalent constructs in Java. However, C provides no exception mechanism, and so C has no constructs like try, catch, and finally.

Conditional execution

      if ( expression )
           statement1;

      if ( expression ) {
           statement1;
           statement2;
           ......
      }

      if ( expression ) {
           statement;
      } else {
           statement;
      }

Although the braces { } are not required when the ‘then’ or ‘else’ clauses are a single statement, in CS 50 we insist on including the braces every time. Doing so helps to avoid mistakes that can occur if the indentation makes it look like a statement is part of a clause when it actually is not. For example;

      if ( expression )
           statement1;
           statement2;
      statement3;

will always execute statement2 and statement3, even though it looks like statement2 would only be executed when expression is true. The compiler ignores indentation and treats the above as if it were written

      if ( expression )
           statement1;
      statement2;
      statement3;

If that was not the programmers’ intent, they should have written

      if ( expression ) {
           statement1;
           statement2;
      }
      statement3;

Thus: always use braces, they’ll avoid surprises.

while and do-while loops

      while ( conditional-expression ) {
           statement1;
           statement2;
           ......
      }

      do {
           statement1;
           statement2;
           ......
      } while ( conditional-expression );

As for the if statement, the braces { } are not required when the loop body is a single statement, but in CS 50 we insist on including the braces every time.

for loops

      for( initialization ; conditional-expression ; adjustment ) {
           statement1;
           statement2;
           ......
      }

Any of the components of the for statement’s for-expressions may be missing; if the conditional-expression is missing, it is always true. Infinite loops may be coded in C with for( ; ; ) … or with while(true)

As for the while loop, the braces { } are not required when the loop body is a single statement, but in CS 50 we insist on including the braces every time.

Note the above for loop is exactly equivalent to

      initialization;
      while ( conditional-expression ) {
            statement1;
            statement2;
            ......
            adjustment;
      }

The switch statement

The switch statement allows you to execute a different statement depending on the value of an expression, as follows:

      switch ( expression ) {
           case const1 : statement1; break;
           case const2 : statement2; break;
           case const3 :
           case const4 : statement4;
           default : statementN; break;
      }

One of the few differences here between C and Java is that C permits control to “drop down” to following case constructs, unless there is an explicit break statement.

The break statement

The break statement causes control to break out of the innermost loop (for or while) or switch statement.

      for ( expression1 ; expression2 ; expression3 ) {
           statement1 ;
           if( ... )
              break;
           statementN ;
      }
      // after 'break', execution continues here

      while ( expression1 ) {
           statement1 ;
           if( ... )
              break;
           statementN ;
      }
      // after 'break', execution continues here

      switch ( expression1 ) {
           case const1:
              statement 1;
              break;

           case const2:
              statement 2;
              break;

           case const3:
              statement 3;
              break;

           default:
              statement n;
              break;
       }
       // after 'break', execution continues here

The continue statement

The continue statement causes control to jump to the bottom of the innermost enclosing loop (for or while), and continue looping from there.

      for ( expression1 ; expression2 ; expression3 ) {
           statement1 ;
           if( ... )
              continue;
           statementN ;
           // after 'continue', execution continues here
      }

      while ( expression1 ) {
           statement1 ;
           if( ... )
              continue;
           statementN ;
           // after 'continue', execution continues here
      }