Introduction to C
The first few lectures have been a crash course in the shell and shell programming. Now we move to the C language. We will spend the rest of the course developing our C and systems programming skill set by first understanding the basics of the language and then (through examples) study good code and write our own.
This lecture will serve as an introduction to the C language. In the next lecture, we will learn more basics on C.
Goals
We will learn the following from today’s lecture:
- Structure of a C program
- Compiling and running a C program
- Input and output with stdin, stdout
- Random numbers
- Functions
Reading
We elected not to require a specific textbook on C. There are many to choose from, including a good text online (see the Resources page). The Harbison and Steele book is very highly recommended and is, as I’ve said in class, an excellent reference and learning resource. But it is not a tutorial. When I do reference it, I will use the shorthand “H&S”.
If you feel you need more of a C textbook, either use the one online or see the professor for recommendations of good ones. For example, past incarnations of this course used the text by Bronson, A First Book of ANSI C. It’s not bad, and there are lots of others, including some that you can get electronically, like Prinz’s C in a Nutshell (O’Reilly).
Manuals
C programming depends on a suite of standard libraries for input/output, strings, math, memory allocation, and so forth.
Most or all of these functions are documented in man pages, just like shell commands.
Try man strcpy
, for example.
For some C functions there are shell commands with identical names; if you type
man printf
, for example, you’ll see the man page for the bashprintf
command and not the C functionprintf()
. You can askman
to look only for library functions (section 3 of the manual) withman 3 printf
.
Activity
Our in-class activity is to compile and test guessprime4.c, and write a shell script to test it.
C
C can be correctly described as a successful, general-purpose programming language, a description also given to Java and C++. C is a procedural programming language, not an object-oriented language like Java or C++. Programs written in C can be described as ‘good’ programs if they are written clearly, make use of high-level programming practices, give the expected result, run efficiently, and are well-documented with sufficient comments and meaningful variable names. Of course, all of these properties are independent of C and are traits of good programming in any high-level language.
C has the high-level programming features provided by most procedural programming languages - strongly typed variables, constants, structured types, enumerated types, a mechanism for defining your own types, aggregate structures, control structures, recursion and program modularization.
C does not support sets of data, Java’s concept of a class or objects, nested functions, nor subrange types and their use as array subscripts, and has only recently added a Boolean data type.
C does have, however, separate compilation, conditional compilation, bitwise operators, pointer arithmetic and language independent input and output. The decision about whether C, C++, or Java is the best general-purpose programming language (if that question can be decided, or even needs to be decided), is not going to be an easy one.
C is the programming language of choice for most systems-level, engineering, and scientific programming. The world’s popular operating systems - Linux, Windows, and Mac OS X, their interfaces and file systems, are written in C; the infrastructure of the Internet, including most of its networking protocols, web servers, and email systems, are written in C; software libraries providing graphical interfaces and tools, and efficient numerical, statistical, encryption, and compression algorithms, are written in C; and the software for most embedded devices, including those in cars, aircraft, robots, smart appliances, sensors, and game consoles, is written in C. Recent mobile devices such as the iPhone, iPad, and some Microsoft products use languages derived from C, such as Objective C, C++, and C#.
The TIOBE Programming Community index is another indicator of the popularity of programming languages. Updated monthly, it provides a great historical look at this topic. See the latest results here.
C’s overall philosophy is “get out of the programmer’s way.” C is often criticized for allowing the programmer to do pretty much whatever they want.
C compilation process
As you begin C programming we ask you to use several command-line flags that cause the C compiler to be especially careful. Specifically, you should compile every program like this:
gcc -Wall -pedantic -std=c11 -ggdb program.c -o program
-Wall
turns on “all” possible warnings (-W
), to help warn you of possible mistakes-pedantic
, to be extra picky about syntax, again, to help avoid mistakes;-std=c11
, to insist that your code follow the ‘c11’ version of the C language standard, and-ggdb
to enable the resulting program to be debugged by thegdb
debugger (more on that later).
To make this easier for you, our customized bash configuration defines an “alias”:
alias mygcc='gcc -Wall -pedantic -std=c11 -ggdb'
which creates a new bash command mygcc
, so the above can just be typed
mygcc program.c -o program
Here you ask gcc to compile program.c
and to output (-o
) to file program
.
A word of warning: Whether using
gcc -o hello hello.c
ormygcc -o hello hello.c
you must take care to avoid getting the order of the files wrong with the-o
switch which tells the complier that the name of the file following the-o
switch will be the name of the executable. One student compiled the correct waymygcc -o hello hello.c
(producing a executablehello
correctly) and then recompiled but got the order wrong:mygcc -o hello.c hello
. What thegcc
compiler did wasn’t pleasant. It took the executablehello
as the source file and andhello.c
as the name of the executable to be created. The result was the real source filehello.c
disappeared! Well, it didn’t actually disappear, it was just erased by the compiler as it got ready to produce a new output file of that same name. So please be careful: the-o
tells the compiler that the executable it creates should be given the name that follows the-o
.
Because mygcc
is a bash alias, it is only available at the bash commandline.
If you want to compile your program elsewhere, e.g., from within emacs, you’ll need to type out the full commandline above.
The compilation of a C program actually requires several steps: preprocessing, compiling, assembling, and linking, as shown below. We will return to this diagram as we learn more about C, and are better able to understand the purpose of each step.
In this diagram, I envision compiling a program names.c
and linking it with the module readlinep.c
.
When we run the compiler like this:
$ mygcc names.c readlinep.c -o names
It is actually running a series of commands, creating (and later removing) various intermediate files:
- run the C preprocessor
cpp
onnames.c
to producenames.i
: still C source code, but with comments removed and with#include
files incorporated. - run the C compiler
cc
onnames.i
to produce assembly languagenames.s
. This is still a text file, but no longer in C. - run the assembler
as
onnames.s
to produce the object codenames.o
. This is now a binary file containing machine instructions. - repeat those steps for
readlinep.c
, resulting inreadlinep.o
. - run the linker
ld
onnames.o
andreadlinep.o
, linking them together and with common libraries likestdio.a
, to produce an executable binary programnames
.
We use gcc
, the Gnu C compiler, which may actually use Gnu versions of the above programs (like gcc
instead of cc
and gas
instead of gas
) for some steps.
C examples
In class we iteratively build a C version of our friend from last class, guessprime.sh, and then enhance it.
- guessprime1.c: (simple replacement for the bash program)
- guessprime2.c: (add readGuess() function)
- guessprime3.c: (move readGuess() function to bottom, declare function prototype)
- guessprime4.c: (add isPrime() function; check bounds of input; args in prototypes)
- guessprime5.c: (pick a random answer)
We may not have time to cover:
- guessprime6.c: (support command-line arguments)
- guessprime7.c: (handle non-numeric input from user)
- guessprime8.c: (smarter ‘readline’ function, to be introduced tomorrow)
- guessprime9.c: (interpret words like ‘quit’ on input; usage of stderr)
I encourage you to read and experiment with them on your own.
You can compare one version with another using diff
(here, I also use the []
filename expansion in bash):
tjp@plank:~/cs50/examples$ diff guessprime[12].c
2c2
< * guessprime1.c - a C version of our simple bash demo program guessprime.sh
---
> * guessprime2.c - a C version of our simple bash demo program guessprime.sh
12,14d11
< // This is a one-line comment; modern syntax
< /* This is another one-line comment; traditional syntax */
<
18,19c15,16
< int main() {
< const int answer = 31;
---
> // Ask for and read a guess
> int readGuess() {
25c22,31
< // compare guess number to answer
---
> return guess;
> }
>
>
> // Main function - ask for a guess, quit when it matches the answer and keep asking otherwise
> int main() {
> const int answer = 31;
> int guess;
>
> guess = readGuess();
28,29c34
< printf("Enter a prime between 1-100: ");
< scanf("%d", &guess);
---
> guess = readGuess();