The first four lectures have been a crash course in the shell and shell programming. Now we move to the C language. We will spend the rest of the course developing our C and systems programming skill set by first understanding the basics of the language and then through examples study good code and write our own.
This lecture will serve as an introduction to the C language. Why is C an important language 30 years after its development?
We plan to learn the following from today’s lecture:
OK. Let’s get started.
We intend to use the text book more as a reference than working through the book in a stepwise fashion - no time for that. We will relate to sections and use example code from the book from time to time. I would suggest that students start reading the book - it is very readable. Please read chapters 2 and 3 of the text. If you already have some knowledge of C you can skip this reading assignment. This type of reading is different from assigned reading of articles which we typically will discuss in class. Reading from the course book is more of a back up to what we discuss or for you to fill in your knowledge of things we don’t have time to dive deeply into in class. Read as much as time permits.
C can be correctly described as a successful, general purpose programming language, a description also given to Java and C++. C is a procedural programming language, not an object-oriented language like Java or C++. Programs written in C can of course be described as “good” programs if they are written clearly, make use of high level programming practices, and are well documented with sufficient comments and meaningful variable names. Of course all of these properties are independent of C and are provided through many high level languages.
C has the high level programming features provided by most procedural programming languages - strongly typed variables, constants, standard (or base) datatypes, enumerated types, a mechanism for defining your own types, aggregate structures, control structures, recursion and program modularization.
C does not support sets of data, Java’s concept of a class or objects, nested functions, nor subrange types and their use as array subscripts, and has only recently added a Boolean datatype.
C does have, however, separate compilation, conditional compilation, bitwise operators, pointer arithmetic and language independent input and output. The decision about whether C, C++, or Java is the best general purpose programming language (if that can or needs to be decided), is not going to be an easy one.
C is the programming language of choice for most systems-level, engineering, and scientific programming. The world’s popular operating systems - Linux, Windows and Mac OS-X, their interfaces and file-systems, are written in C; the infrastructure of the Internet, including most of its networking protocols, web servers, and email systems, are written in C; software libraries providing graphical interfaces and tools, and efficient numerical, statistical, encryption, and compression algorithms, are written in C; and the software for most embedded devices, including those in cars, aircraft, robots, smart appliances, sensors, mobile phones, and game consoles, are written in C.
Nearly all operators in C are identical to those of Java. However, the role of C in system programming exposes us to much more use of the shift and bit-wise operators than in Java.
Assignment
=
Arithmetic
+, -, *, /, %, unary -
Priorities may be overridden with ( )s.
Relational
>, >=, <, <= (all have same precedence)
== (equality) and != (inequality)
Logical
&& (and), || (or), ! (not)
Pre- and post- decrement and increment
Any (integer, character or pointer) variable may be either incremented or decremented before or after its value is used in an expression.
For example :
–fred will decrement fred before value used.
++fred will increment fred before value used.
fred– will get (old) value and then decrement.
fred++ will get (old) value and then increment.
Let’s write some C code to look at pre and post increment and decrement.
C code: increment.c
Where to get the C source code examples from: Note, all the source code examples used in these C programming notes can be copies to your local machine from my public_html/cs50 directory. The snippet below creates a local directory called examples and copies the C source code files over. After that you can compile and execute the code with the notes – make sure you do this for all C source code examples in the notes.
You can copy my bash files over in the same manner if you want my set up and aliases.
What follows below is copying the my bash files over to the examples directory. I recommend that you do this and then cut and paste sections of my bash files (e.g., aliases, mygcc, etc.) to your own bash files in your home directory as you wish.
OK now let’s look at increment.c
Once we have the C code we have to compile it with gcc with the various compiler switches we discussed in Lecture 1. Let’s compile the code using:
Using mygcc filename.c -o filename as the convention. The compiler produces an executable filename.
You do not have to use chmod to make it an executable. The compiler takes care of that when it creates an executable with the correct permission for file filename.
Check it out: Save the file in your directory cs50/code/ Compile and run the code. Check the output.
Bitwise operators and masking
& (bitwise and), | (bitwise or), ~ (bitwise negation).
To check if certain bits are on (fred & MASK), etc.
Shift operators << (shift left), >> (shift right).
Combined operators and assignment
a += 2; a -= 2;
a *= 2
May be combined as in a += b; a = a+b;
Type coercion
C permits assignments and parameter passing between variables of different types using type casts or coercion. Casts in C are not implicit, and are used where some languages require a “transfer function”.
Expressions are all evaluated from left-to-right, and the default precedence may be overridden with brackets.
() coercion (highest)
++ – !
* / %
+ -
<< >>
!= ==
&
|
&&
||
? :
=
, (lowest)
Variable names (and type and function names as we shall see later) must commence with an alphabetic or the underscore character A-Z a-z _ and be followed by zero or more alphabetic, underscore or digit characters A-Z a-z 0-9. Most C compilers, such as gcc, accept and support variable, type and function names to be up to 256 characters in length. Some older C compilers only supported variable names with up to 8 unique leading characters and keeping to this limit may be preferred to maintain portable code. It is also preferred that you do not use variable names consisting entirely of uppercase characters uppercase variable names are best reserved for #define-ed constants, as in MAXSIZE above. Importantly, C variable names are case sensitive and MYLIMIT, mylimit, Mylimit and MyLimit are four different variable names.
Variables are declared to be of a certain type, this type may be either a base type supported by the C language itself, or a user-defined type consisting of elements drawn from Cs set of base types. Cs base types and their representation on our labs Pentium PCs are:
bool an enumerated type, either true or false
char the character type, 8 bits long
short the short integer type, 16 bits long
int the standard integer type, 32 bits long
long the longer integer type, also 32 bits long
float the standard floating point (real) type, 32 bits long (about 10 decimal digits of precision)
double the extra precision floating point type, 64 bits long (about 17 decimal digits of precision)
enum the enumerated type, monotonically increasing from 0
Very shortly, we will see the emergence of Intels IA64 architecture where, like the Power-PC already, long integers occupy 64 bits.
We can determine the number of bytes required for datatypes with the sizeof operator. In contrast, Java defines how long each datatype may be. Cs only guarantee is that:
sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)
Let’s write some C code to look at these base data types. We will use the sizeof operator and the printf function. We will also define vriables of each of the base types and print the initialzed values as part of the data-types.c code.
C code: data-types.c
The contents of data-types.c looks like this:
Let’s compile and run the code.
Check it out: Save the file in your directory cs50/code/ Compile and run the code. Check the output.
Base types may be preceded with one of more storage modifier :
auto the variable is placed on the stack (default, deprecated)
extern the variable is defined outside of the current file
register request that the variable be placed in a register (ignored)
static the variable is placed in global storage with limited visibility
typedef introduce a user-defined type
unsigned storage and arithmetic is only of/on positive integers
All scalar auto and static variables may be initialized immediately after their definition, typically with constants or simple expressions that the compiler can evaluate at compile time. The C99 language defines that all uninitialized global variables, and all uninitialized static local variables will have the starting values resulting from their memory locations being filled with zeroes - conveniently the value of 0 for an integer, and 0.0 for a floating point number.
Scope is defined as the section (e.g., function, block) of the program where the variable is valid and known.
In Java, a variable is simply used as a name by which we refer to an object. A newly created object is given a name for later reference, and that name may be re-used to refer to another object later in the program. In C, a variable more strictly refers to a memory address (or contiguous memory address starting from the indicated point) and the type of the variable declares how that memorys contents should be interpreted and modified.
C only has two true lexical levels, global and function, though sub-blocks of variables and statements may be introduced in sub-blocks in many places, seemingly creating new lexical levels. As such, variables are typically defined globally (at lexical level 0), or at the start of a statement block, where a functions body is understood to be a statement block.
Variables defined globally in a file, are visible until the end of that file. They need not be declared at the top of a file, but typically are. If a global variable has a storage modifier of static, it means that the variable is only available from within that file. If the static modifier is missing, that variable may be accessed from another file if part of a program compiled and linked from multiple source files.
The extern modifier is used (within “our” file) to declare the existence of the indicated variable in another file. The variable may be declared as extern in all files, but must be defined (and not as a static!) in only a single file.
Variables may also be declared at the beginning of a statement block, but may not be declared anywhere other than the top of the block. Such variables are visible until the end of that block, typically until the end of the current function. A variables name may shadow that of a global variable, making that global variable inaccessible. Blocks do not have names, and so shadowed variables cannot be named. Local variables are accessible until the end of the block in which they are defined.
Local variables are implicitly preceded by the auto modifier as control flow enters the block, memory for the variable is allocated on the run-time stack. The memory is automatically deallocated (or simply becomes inaccessible) as control flow leaves the block. The implicit auto modifier facilitates recursion in C each entry to a new block allocates memory for new local variables, and these unique instances are accessible only while in that block.
If a local variable is preceded by the static modifier, its memory is not allocated on the run-time stack, but in the same memory as for global variables. When control flow leaves the block, the memory is not deallocated, and remains for the exclusive use by that local variable. The result is that a static local variable retains its value between entries to its block. Whereas the starting value of an auto local variable (sitting on the stack) cannot be assumed (or more correctly, should be considered to contain a totally random value), the starting value of a static local variable is as it was when the variable was last used.
Let’s look at some code snippets to reinforce the ideas of local and global variables and the issue of the scope of these variables in a sectin of code. The example comes from the book and illustrates the ideas nicely.
C code: scope.c
If we run the code the output is as follows:
Study the output. Is it what you expected? Now read on.
The first thing to note about the source code is that it defines a global variable firstnum that’s scope is the complete file and therefore is accessible from main() and the valfun(). Note, scope.c has a main() and a valfun() function. The prototype for valfun() is declared at the top of the file giving it global scope in the file scope.c. We will talk about prototypes later. Both main() and valfun() update and print the value of firstnum which represents a variable with a memory address (space is not allocated on the stack as in the case of auto variables such as secnum). Note that main() and valfun() both have local variables called secnum. This is not a problem and causes no clash because of the scope of these two different local variables (that happen to have the same name) only have local scope inside the main() and valfun() functions, respectively. There instances are private to main() and valfun(), respectively. They have no association other than having the same names. They are auto variables created on the stack and no longer exist when the function exists. For example, valfun() creates a variable for secnum of integer type on its local stack when it executes but when it returns control to main() the stack is deallocated and the variable no longer exists. In contrast, the global variable firstnum and its current value are not changed when valfun() exits.
That leads us to another storage operator that is impacted by scope, i.e., static. Here the variable is placed in global storage with limited visibility depending on where it is defined. Let’s look at two code snippets that illustrates the use of local auto and static variables. These represent to important cases in C.
First, let’s look at the case of auto local variables.
C code: auto.c
If we run the code the output is as follows:
Study the output. Is it what you expected? Now read on.
Now let’s look at the case when num is defined as static inside the scope of the function teststat(). Note, that the value of num is now persistent across multiple invocations of the function. This is in direct contrast to the auto local varable of the last code snippet - i.e., auto.c. In essence, the operator static allocates memory to the variable of type int that is outside the stack just like a global variable in scope.c - i.e., firstnum. However, the distinction here is that static is not global. It is only accessible in the function teststat(). Hope that clarifies the issue of scope, local and global variables and the issue of auto variables and static variables.
First, let’s look at the case of static local variables.
C code: static.c
If we run the code the output is as follows:
Is this what you expected?
Question: If I have defined “static int num;” at the top of static.c how would that change the scope of the static variable? Is it different to “int num” as defined as gobal variable (like firstnum in scope)?
Control flow within C programs is almost identical to the equivalent constructs in Java. However, C provides no exception mechanism, and so C has no try, catch, and finally, constructs.
Conditional execution
Of significance, and a very common cause of errors in C programs, is that pre C99 has no Boolean datatype. Instead, any expression that evaluates to the integer value of 0 is considered false, and any nonzero value as true. A conditional statements controlling expression is evaluated and if non-zero (i.e. true) the following statement is executed. Most errors are introduced when programmers (accidently) use embedded assignment statements in conditional expressions:
A good habit to get into is to place constants on the left of (potential) assignments:
When compiling with gcc -std=c99 -Wall -pedantic ... the only way to “shut the compiler up” is to use extra parenthesis:
Cs other control flow statements are very unsurprising:
Any of the components may be missing, If the conditional-expression is missing, it is always true. Infinite loops may be requested in C with for( ; ; ) ... or with while(1) ...
One of the few differences here between C and Java is that C permits control to drop down to following case constructs, unless there is an explicit break statement.
C code: operator.c
The contents of opeator.c looks like this:
In this class we will use the gdb command line debugger. However, Apple replaced gdb (sadly) in 2015 with its own command line debugger called lldb – it’s very similar to gdb. If you want to develop code on your mac with I strongly recommend then you will have to also learn lldb. If you don’t want that taxation then use gdb but develop your code on a Linux machine in the lab. Here is a tutorial on lldb command-line debugger and here are examples of gdb commands with the lldb counterparts . The example below uses gdb.
You can use printf statements to help with inspection of variables and control flow through your program. However, this is a very primative way to debug your code. We will have a lecture on the art of debugging and how to use the gnu debugger (gdb) soon. But for now I would like to introduce gdb and use it to find a software bug.
You can checkout the lecture notes on gdb, type man gdb at the command line or google for more information.
First, let’s create a bug that will cause a segmentation fault (segfault) when we run operator.c: by changing the line
scanf("%d", &opselect);
to
scanf("%d", opselect);
notice we have just removed &
First, let’s make our mygcc alias has the gdb flag as shown below. If you use mygcc it will catch this errror as shown below - which is good right. It would save you debugging this problem. I will walk through the following sequence in class. We will use a number of basic gdb commands to find where the segfault occurs but first we will use the debugger just to step through the code and inspect some of the variables. Then we will introduce our error - this is a simple error easily made. After that we will use the backtrace command to find where the problem is.
Couple of good places to go on the web for information on gdb other than the manual pages. First for a detailed expose check out: the GDB manual]or a very short primer by Takashi Okumura that will provide a little for detail when understanding some of the gdb commands below.
Like the shell commands, gdb commands can be terse and difficult to remember. Here is a very good quick reference to gdb commands - all you need to know in terms of command syntax is here.
This simple debugging example is a start but you will have to become accomplished in gdb to be a good hacker. DO NOT RELY on printf - it is not your real friend, gdb is!
We will use a lot of the basic gdb commands in the example below such as break, run, next (use step if you want to step into a function and next - n for short - to execute the function but not step into it - subtle difference there), continue, display, printf, x (examine memory), backtrace (bt for short), and frame ( checkout stack frames - this is an important concept in c and very usefulf or debugging and poking around in your code and looking at variables) and list. These are most of the common commands.
I strongly recommend that you go through the sequence of steps below and use these debugging commmands. Don’t worry you can’t break anything. Google gdb or man gdb for more information of the commands above. Just like the shell commands you’ll only need a subset of the the complete set of gdb commands to become a very effective debugger. Again, printf is for dummies (or OK to get started) and not part of hacker’s parlance or the necessary tools in your toolkit: gdb is!
OK. Let’s get started on this walk through.
Note, if you debug on mac you will see a file (it’s actually a directory) in your current directory called filename.dSYM (where filename is the name of the compiled file). This directory stores the debug symbols for your app used by the debugger. So don’t remove it. For production code you would remove this directory before shipping your code. The mygcc flag -ggdb tells the compiler to produce the dSYM file – dSYM stands for Xcode’s Debug SYmbols file.