C Basics
In this lecture, we will go over the basics of C programing.
Our goals today are to understand:
- Operators
- Precedence
- Base data types
- Storage modifiers
- Scope rules of global and local variables
- Flow of control in C programs
Our in-class activity is to compile and modify guessprime4.c.
Basic Operators
Nearly all operators in C are identical to those of Java. However, the role of C in system programming exposes us to much more use of the shift and bit-wise operators than in Java. Here are the basic operators:
- Assignment
- Arithmetic
+, -, *, /, %, unary
- Priorities may be overridden with
( )
’s. - Relational (all of these have the same precedence)
>, >=, <, <=
- Equality
- Logical
Pre- and post- decrement and increment operators
Any (integer, character or pointer) variable may be either incremented or decremented before or after its value is used in an expression.
Take a look at this example increment.c:
* increment.c - illustrate pre and post increment and decrement.
* CS 50, Fall 2022
#include <stdio.h>
int main()
int fred = 3, a=3;
printf("Start; fred = %d and a = %d\n", fred, a);
a = --fred;
printf("a = --fred; fred = %d and a = %d\n", fred, a);
a = ++fred;
printf("a = ++fred; fred = %d and a = %d\n", fred, a);
a = fred--;
printf("a = fred--; fred = %d and a = %d\n", fred, a);
a = fred++;
printf("a = fred++; fred = %d and a = %d\n", fred, a);
return 0;
will decrement fred
before its value is used
will increment fred
before its value is used
will get (old) value and then decrement fred
will get (old) value and then increment fred
Let’s compile it with mygcc
(defined in .bashrc and .bash_orofile from day 2) and run the generated executable:
$ alias mygcc
alias mygcc='gcc -Wall -pedantic -std=c11 -ggdb'
$ mygcc -o increment increment.c
$ ls -l increment
-rwxr-xr-x 1 d84xxxx thayerusers 17280 Sep 17 16:43 increment*
$ ./increment
Start; fred = 3 and a = 3
a = --fred; fred = 2 and a = 2
a = ++fred; fred = 3 and a = 3
a = fred--; fred = 2 and a = 3
a = fred++; fred = 3 and a = 2
The compiler produces an executable file named increment
(specified by the -o
option). We do not need chmod
to make it an executable. The compiler takes care of that when it creates an executable with the correct permission for the file increment
Bitwise operators and masking
Ultimately, every integer variable is just a bit string, so C has operators to allow you to manipulate an integer as a sequence of bits:
(bitwise and), |
(bitwise or), ~
(bitwise negation).
You can use these to check if certain bits are on, as in (nextchar & 0x30)
, which computes a bitwise ‘and’ operation between the variable nextchar
and the hexadecimal constant number 0x30
You can also shift bits to the left or right:
Shift operators <<
(shift left), >>
(shift right)
Note: results may vary based upon whether the type of the variable being shifted is “signed” or “unsigned”. See H&S pp.231-233.
Combined operators and assignment
When the result of an operation is assigned to the same variable that the operator is operating on, then the operator and the assignment can be combined. For example a = a + 3
can be written as a += 3
. See this simple code combined.c for more examples.
Precedence of operators
Expressions are all evaluated from left-to-right, and the default precedence may be overridden with brackets.
Operator |
Precedence |
highest |
( ) [ ] |
++ -- ! |
* / % | |
+ - |
== != |
& |
| |
&& |
|| |
?: |
= |
, |
lowest |
Variable names (and type and function names as we shall see later) must commence with an alphabetic or the underscore character A-Z a-z
and be followed by zero or more alphabetic, underscore or digit characters A-Z a-z 0-9
Most C compilers, such as gcc
, accept and support variable, type, and function names to be up to 256 characters in length.
(Some older C compilers only supported variable names with up to 8 unique leading characters and keeping to this limit may be preferred to maintain portable code.) It is also preferred that you do not use variable names consisting entirely of uppercase characters.
All-uppercase variable names are typically reserved for constants (such as MAXBUFSIZE
Importantly, C variable names are case sensitive, so MYLIMIT
, mylimit
, Mylimit
and MyLimit
are four different variable names.
There are some specific variable/function naming styles that you may encounter. The major ones are
camelCase: writing compound words with the first letter of each word capitalized, except for the first word’s first letter, which is not capitalized.
PascalCase: writing compound words just as in camelCase* with the first letter of the first word also capitalized. (In Java it is common to use this case for class names, but camelCase for member names.)
snake_case: writing compound words with an underscore between each word with little, if any, capitalization.
Any programming project, including all of your assignments, should pick a variable/function naming style and stick with it.
In C, every constant, variable, or function name is declared to be of a certain type. This type may be either a base type supported by the C language itself, or a user-defined type consisting of elements drawn from C’s set of base types.
Basic types
Here are C’s basic types:
type | description | |
void |
the void type | |
char |
%c | the character type |
short |
%h | the short integer type (sometimes shorter than int ) |
int |
%d | the standard integer type |
long |
%ld | the longer integer type (sometimes longer than int ) |
bool |
%d | the Boolean type, representing true/false |
float |
%f | the standard floating-point (real) type |
double |
%f | the extra precision floating-point type |
long double |
%LF | the super precision floating-point type |
To use
one must#include <stdbool.h>
The table above lists some of the basic variables available to C programmers. We first list the format string if you use these variables in a print statement. For example
int a = 5;
printf("a = %d\n",a);
We can determine the number of bytes required for datatypes (and other things, as we will see later) with the sizeof
In contrast, Java defines how long each datatype may be.
In C, the sizes vary from machine to machine, with the details managed by the compiler.
C’s only guarantee is that:
sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)
Examine data-types.c, which defines variables of each of the base types and prints the sizeof
each one.
$ mygcc data-types.c -o data-types
$ ./data-types
-------contents ------- sizeof()------
contents of char is a --- sizeof 1 bytes
contents int is 2 --- sizeof 4 bytes
contents short is 3 --- sizeof 2 bytes
contents long is 4 --- sizeof 8 bytes
contents bool is 1 --- sizeof 1 bytes
contents float is 1000.256714 --- sizeof 4 bytes
contents double is 1.100000e+24 --- sizeof 8 bytes
contents long double is 1E+31 --- sizeof 16 bytes
The above ran on plank.thayer.dartmouth.edu
Although these sizes are common for Linux machines today (2022), it is possible they may change in the future; code should not be dependent on specific sizes.
The void
type is different than all of the others, in that it is not possible to define and use a void
As such, it has no size.
It is used for two purposes:
(1) to indicate that a function returns no value, or
(2) to indicate (with void*
) a “pointer to anything”.
We’ll see examples of both.
Type coercion
C permits assignments and parameter passing between variables of different types using type casts or coercion.
For example,
int a = 4;
float b = 0.0;
b = a; // implicitly converts from int to float
a = b; // implicitly converts from float to int
a = (int) b; // explicit "cast" from float to int
Some casts in C must be made explicit, and are used where some languages require a ‘transfer function’. We will see examples of C’s cast operator later in the course.
Unsigned integers
Normal integers (int
, short
, long
) can represent both negative and positive numbers.
For example, we saw above that a short
is 2 bytes in size, thus 16 bits; it can represent -32,767 .. +32,768 (plus or minus 2^15).
But an unsigned short
, still 16 bits in size, represents only non-negative integers, thus from 0 .. 65,535 (2^16-1).
Programs use unsigned integers when they need double the precision and never need to represent a negative value.
Constant variables
Although a constant variable sounds like an oxymoron, and it is, C allows us to define just such a thing. Some examples:
const float pi = 3.14159535;
const int maxNameLength = 50;
int main(const int argc, const char* argv[]);
The const
storage modifier tells the compiler that this named value cannot be changed; it is useful for constants (like the first two examples) or for declaring that a function will not change its parameter values.
In CS 50 we urge the use of const
whereever appropriate, that is, whereever the programmer expects that named value to never change… and if the code accidentally tries to change the value, the compiler will issue a warning.
Helps avoid tragic mistakes!
User-defined types (typedef)
It is often helpful to define a new type, in effect, an ‘alias’ for an existing type. Doing so can make the code more readable, or help to abstract the particularities of an implementation from other code that uses the type.
For example, before C supported a Boolean type, it was common to implement it this way:
typedef short boolean; // defines a new type named 'boolean'
const boolean TRUE = 1; // defines a new constant of type 'boolean'
const boolean FALSE = 0; // defines a new constant of type 'boolean'
Then one could write code with Boolean variables, e.g.,
boolean success = FALSE; // defines a new variable of type 'boolean'
In CS 50 we’ll see typedef
used more often for compound types.
Those come later.
Variables, constants, and functions can be declared extern
, meaning that their implementation occurs in a different C source file.
It is common to see extern
declarations in a header (.h
) file, which when included in a source (.c
) file allows the compiler to learn about the existence of something whose definition lays elsewhere.
The compiler makes a note of it, leaving it for the linker to resolve later.
For example, we see the following declaration in readline.h:
extern bool readLine(char* buf, const int len);
and then in readline.c we see the definition (implementation) of that function:
readLine(char* buf, const int len)
//implementation here
The confusingly-named static
modifier has two uses.
For global names (outside of any function), they indicate to the compiler that this name should only be visible within the current source (.c
) file. You can think of the global variable as being “private” to this file.
In CS 50, we most commonly see static
used to “hide” functions within a a source file representing a module; as a result they are not visible to other source files, and they are not part of the “interface” to the module.
In this regard, it is like the difference between “private” and “public” methods in Java.
We’ll come back to this idea in more detail, later.
modifier has another use, though you will not use it in CS 50. Inside a function, thestatic
modifier on a local variable indicate that the value of this variable should persist even after the function returns. This feature should be used in only special circumstances and with great care.
Scope rules
As shown above, in C you can name constants, variables, functions, and even types (using typedef). Each name has a scope, that is, the section of the source file where the variable is ‘visible’, that is, valid for use in the code.
C has true main types of scope, global and local, though in principle every block and sub-block (as defined by {}
braces) can define a new subsidiary scope.
Once a name is defined, it is valid from that point in the source file down to the end of its scope.
Global names are typically defined near the top of the source file and are thus valid through the rest of the source file, within every function and the blocks within them.
Local names occur within a function; these include the function parameters, and other names defined within the function body. As a matter of style and convention, local names are typically declared at the top of a function body, although there are instances where it is more readable to declare them further below.
Other, narrower scopes can occur inside a statement block within a function body; the most common example are loop variables – names valid only within the loop body.
Let’s look at an example of each of these three types: global, local, and loop scopes. But first, a quick note about initialization.
Variable initialization
Constant variables can be assigned only once: either initialized when they are defined, or assigned once thereafter.
In CS 50 we advocate for initializing every variable right there in the definition, even if only to some default value. Here’s why.
Global variables are initialized to zero, by the compiler and linker.
Local variables (whether at the function or block level) are not initialized; thus, their initial value is undefined — whatever happens to be in memory, aka, garbage. It is dangerous to use the value of an uninitialized variable, especially a pointer, because it leads to undetermined results - or crashes your program.
Global names
Global names (variables, functions, types, etc.) are visible from the point where they are declared until the end of that file. Thus, they are typically declared near the top of the file.
This rule (from the declaration to the end of the file) is the main reason why we tend to declare all a file’s functions near the top of the file: so they can be called from any point within the file. For example, look at guess6.c, which declares three functions near the top:
/* function prototype *declaration*; the function is *defined* below */
int askGuess(const int low, const int high);
int pickAnswer(const int high);
bool str2int(const char string[], int* number);
and then uses them in the main()
In CS 50 we never use global variables. They are risky, and thus bad style. Thus, in CS 50, global names are primarily functions, constants, and types.
static vs extern
As discussed above, if a global name is declared or defined with a static
modifier, it means that the variable is only available from within that file.
If a global name is declared with an extern
modifier, it means it is not necessarily defined (implemented) within this file; the linker will need to later find its definition in another file, and link its use in this file with the implementation in that other file.
The variable may be declared as extern
in all files, but must be defined (and not as a static
!) in exactly one file.
Look again at guess6.c; because it includes readline.h near the top, the code from that file is incorporated into guess6
right at that point, including the line
extern bool readLine(char* buf, const int len);
Later, in a separate run of the compiler, readline.c also includes readline.h near the top of its file, declaring function readLine()
In this case, the compiler also finds the definition of function readLine()
That’s fine; the function is declared in several files, but defined in only one.
If neither
modifier is applied, that name may be accessed from another source file… though in that file, it would need to be labeled with theextern
modifier. Look again at guess6.c; the three functions it declares and defines are not markedstatic
; they would thus be potentially visible to other code modules linked with this program. In such a small program, this issue is not important, but in a larger, more sophisticated program, we should mark those functionsstatic
Local names
Local names (typically local variables and constants) are accessible from the point of their definition until the end of the block in which they are defined.
When the function returns, the variable’s memory is automatically deallocated and, if the function is called again, is reallocated… but with undetermined content. As above, we recommend initializing all local variables.
A local name may shadow that of a global variable, making that global variable inaccessible. Blocks do not have names, and so shadowed variables cannot be named. For example,
int x, y, z; // global variables
int foo(int x, int y) // shadows the globals named 'x' and 'y'
int z = 99; // shadows the global named 'z'
int sum = x + y + z; // uses the local variables only
Loop variables and statement blocks
Variables may also be declared at the beginning of a statement block, but may not be declared anywhere other than the top of the block. Such variables are visible until the end of that block.
The most common use of this capability is for loop variables, specifically, for loops. For example, we diagram the core of sqrt.c below.
The function main()
has three local names: two parameters (argc
, argv
, both defined with the const
modifier and thus they act as local constants) and one local variable (exit_status
Its for
loop is a nice example of the use of a loop variable i
to index the iterations over an array, and two statement-block variables to hold information only needed within that statement block (the body of the for
Examples of global and local variables
Let’s look at some additional code snippets to reinforce the ideas of local and global variables and the issue of the scope of these variables in a section of code.
Example scope.c
* scope.c - illustrates the use of global and local variables and
* global function prototypes.
* Revised code taken from pg. 330 (Program 7.1) (Bronson) "First
* Book on ANSI C"
* CS 50, Fall 2022
#include <stdio.h>
/* firstnum is a global variable not defined on the main() stack. It has full scope of all functions in the file scope.c. Any code in the file can read and write to it. Once main() teriminates the variable is dellocated and no longer exists */
int firstnum; // create a global variable named firstnum
void valfun(); // global function prototype
int main()
int secnum; //create a local variable named secnum
firstnum = 10; //store a value into the global variable
secnum = 20; // store a value into the local variable
printf("\nFrom main(): firstnum = %d",firstnum);
printf("\nFrom main(): secnum = %d\n",secnum);
valfun(); // call the function valfun
printf("\nFrom main() again: firstnum = %d",firstnum);
printf("\nFrom main() again: secnum = %d\n",secnum);
return 0;
void valfun() // no values are passed to this function
/* secum is a local variable created on the stack when valfun() executes. When valfun() exits the stack is deallocated and the variable no longer exists. It is local and its scope is valfun()*/
int secnum; // create a second local variable named secnum
secnum = 30; // this only affects this local variable's value
printf("\nFrom valfun(): firstnum = %d",firstnum);
printf("\nFrom valfun(): secnum = %d\n",secnum);
firstnum = 40; // this changes firstnum for both functions
If we run the code the output is as follows:
$ mygcc -o scope scope.c
$ ./scope
From main(): firstnum = 10
From main(): secnum = 20
From valfun(): firstnum = 10
From valfun(): secnum = 30
From main() again: firstnum = 40
From main() again: secnum = 20
Study the output. Is it what you expected?
The first thing to note about the source code is that it defines a global variable firstnum
that’s scope is the complete file and therefore is accessible from main()
and the valfun()
Note scope.c
has a main()
and a valfun()
The prototype for valfun()
is declared at the top of the file, giving it global scope in the file scope.c
Both main()
and valfun()
update and print the value of firstnum
, which represents a variable with a memory address (space is not allocated on the stack as in the case of local variables such as secnum
Note that main()
and valfun()
both have local variables named secnum
This name collision is not a problem and causes no clash because of the scope of these two different local variables (that happen to have the same name) only have local scope inside the main()
and valfun()
functions, respectively.
Their instances are private to main()
and valfun()
, respectively.
They have no association other than having the same names.
They are local variables created on the stack and no longer exist when the function exits.
For example, valfun()
creates a variable for secnum
of integer type on its local stack when it executes, but when it returns control to main()
the stack is deallocated and the variable no longer exists.
In contrast, the global variable firstnum
and its current value are not changed when valfun()
Examples of local and static local variables
Consider another storage modifier that is impacted by scope: static
Here the variable is placed in global storage with limited visibility depending on where it is defined.
Let’s look at two code snippets that illustrate the use of local and static variables.
These represent two important cases in C.
First, let’s look at the case of local variables.
Example: auto.c
* auto.c - illustrates the auto local variables.
* Code taken from pg. 336 (Program 7.2) (Bronson)
* "First Book on ANSI C"
* CS 50, Fall 2022
#include <stdio.h>
void testauto(); // function prototype
int main()
int count; // create the auto variable count
for (count = 1; count <= 3; count++ )
return 0;
void testauto()
int num = 0; // create the auto variable num, initialized to zero
printf("The value of the automatic variable num is %d\n", num);
If we run the code the output is as follows:
$ ./auto
The value of the automatic variable num is 0
The value of the automatic variable num is 0
The value of the automatic variable num is 0
Study the output. Is it what you expected?
Now let’s look at the case when num
is defined as static
inside the scope of the function teststat()
. See the example static.c.
* static.c - illustrates the use of auto variables, with `static`.
* Code taken from pg. 336 (Program 7.2) (Bronson)
* "First Book on ANSI C"
* CS 50, Fall 2022
#include <stdio.h>
void testauto(); // function prototype
int main()
int count; // create the auto variable count
for (count = 1; count <= 3; count++ )
return 0;
/* The variable num in teststat() is only set to zero once. The value set by the local scope static variable num detains its value when teststat() returns.*/
void testauto()
static int num = 0; // num is a local static variable
printf("The value of the automatic variable num is %d\n", num);
Note that we have changed num
in the testauto
function to be static
If we run static.c
, the output is as follows:
$ ./static
The value of the static variable num is now 0
The value of the static variable num is now 1
The value of the static variable num is now 2
Note, that the value of num
is now persistent across multiple invocations of the function.
This is in direct contrast to the local variable of the last code snippet - i.e., auto.c
In essence, the operator static
allocates memory to the variable of type int
that is outside the stack just like a global variable in scope.c
- i.e., firstnum
However, the distinction here is that static
is not global.
It is only accessible in the function teststat()
Question: If I have defined static int num;
at the top of static.c
how would that change the scope of the static variable?
Is it different to int num
; as defined as global variable (like firstnum
in scope
.c)? Yes, if a global variable is defined as static, the variable is not visible outside the file in which it was declared. The variable is essentially ‘private’ to that file.
Although the standard C library includes many useful functions, there is an entirely separate library with a plethora of mathematics functions.
For details, study the man
page for the desired function, such as
man 3 sqrt
- square rootman 3 pow
- raise a number to a powerman 3 cos
- cosine
Here we give an argument 3
to man
to ask it to look in “section 3 of the manual”, because library functions are described in section 3 (whereas commands like ls
and bash
are in section 1).
To use any of these functions, you must include the header file in your .c
#include <math.h>
and you must ask the linker to link with the math library:
$ mygcc sqrt.c -lm -o sqrt
The -l
option indicates you want to link with one of the standard libraries, and the library called m
is the math library.
Look at the example sqrt.c.
* sqrt - demonstrate use of the math library
* usage:
* sqrt [number]...
* where 'number' is an integer or floating-point number
* To compile, you must link with the math library:
* mygcc sqrt.c -lm -o sqrt
* CS 50, Fall 2022
#include <stdio.h>
#include <stdlib.h>
#include <math.h> // declares all functions from the math library
main(const int argc, const char *argv[])
int exit_status = 0;
for (int i = 1; i < argc; i++) {
float number; // the numeric equivalent to the argument
char extra; // any extraneous characters after parsing number
if (sscanf(argv[i], "%f%c", &number, &extra) == 1) {
printf("sqrt(%f) = %f\n", number, sqrt(number));
} else {
printf("%s: invalid number\n", argv[i]);
return exit_status;
The following is example run:
$ mygcc sqrt.c -lm -o sqrt
$ ./sqrt
$ ./sqrt 4
sqrt(4.000000) = 2.000000
$ ./sqrt 4 10 100 x 99.y
sqrt(4.000000) = 2.000000
sqrt(10.000000) = 3.162278
sqrt(100.000000) = 10.000000
x: invalid number
99.y: invalid number
$ echo $?
This program demonstrates
- floating-point variables and the
specifier forprintf()
, - use of the math function
from the math library, - a defensive way to parse an argument string into a number,
- definition of variables (here,
, andextra
) that are only defined within the scope of a loop (here, thefor
loop), - use of a variable to track the future exit status of the program — which should be 0 on success, non-zero for any failures,
- the bash special variable
that holds the exit status of the most recent command.
Flow of control in a C program
Control flow within C programs is almost identical to the equivalent constructs in Java.
However, C provides no exception mechanism, and so C has no constructs like try
, catch
, and finally
Conditional execution
if ( expression )
if ( expression ) {
if ( expression ) {
} else {
Although the braces { }
are not required when the ‘then’ or ‘else’ clauses are a single statement, in CS 50 we insist on including the braces every time.
Doing so helps to avoid mistakes that can occur if the indentation makes it look like a statement is part of a clause when it actually is not.
For example;
if ( expression )
will always execute statement2
and statement3
, even though it looks like statement2
would only be executed when expression
is true.
The compiler ignores indentation and treats the above as if it were written
if ( expression )
If that was not the programmers’ intent, they should have written
if ( expression ) {
Thus: always use braces, they’ll avoid surprises.
while and do-while loops
while ( conditional-expression ) {
do {
} while ( conditional-expression );
As for the if
statement, the braces { }
are not required when the loop body is a single statement, but in CS 50 we insist on including the braces every time.
for loops
for( initialization ; conditional-expression ; adjustment ) {
Any of the components of the for
statement’s for-expressions may be missing; if the conditional-expression is missing, it is always true.
Infinite loops may be coded in C with for( ; ; )
… or with while(true)
As for the while
loop, the braces { }
are not required when the loop body is a single statement, but in CS 50 we insist on including the braces every time.
Note the above for
loop is exactly equivalent to
while ( conditional-expression ) {
The switch statement
The switch
statement allows you to execute a different statement depending on the value of an expression, as follows:
switch ( expression ) {
case const1 : statement1; break;
case const2 : statement2; break;
case const3 :
case const4 : statement4;
default : statementN; break;
One of the few differences here between C and Java is that C permits control to “drop down” to following case constructs, unless there is an explicit break
The break statement
The break
statement causes control to break out of the innermost loop (for
or while
) or switch
for ( expression1 ; expression2 ; expression3 ) {
statement1 ;
if( ... )
statementN ;
// after 'break', execution continues here
while ( expression1 ) {
statement1 ;
if( ... )
statementN ;
// after 'break', execution continues here
switch ( expression1 ) {
case const1:
statement 1;
case const2:
statement 2;
case const3:
statement 3;
statement n;
// after 'break', execution continues here
The continue statement
The continue
statement causes control to jump to the bottom of the innermost enclosing loop (for
or while
), and continue looping from there.
for ( expression1 ; expression2 ; expression3 ) {
statement1 ;
if( ... )
statementN ;
// after 'continue', execution continues here
while ( expression1 ) {
statement1 ;
if( ... )
statementN ;
// after 'continue', execution continues here