In this lecture we continue our introduction to the C langauge.
We plan to learn the following from today’s lecture:
OK. Let’s get started.
You will notice that a few lines, typically near the beginning of a C program, begin with the hash or pound sign, #. These lines are termed C preprocessor directives and are actually instructions (directives) to a special program called the C preprocessor (located in /lib/cpp). As its name suggests, the C preprocessor processes the text of a C program before the C compiler sees it. The preprocessor directives (all beginning with #) should begin in column 1 (the 1st column) of any source line on which they appear.
The C preprocessor is easily able to locate these lines and then examine the characters following the #. The following characters usually form a special word in the C preprocessors syntax which typically cause the preprocessor to modify the C program before it is sent to the C compiler itself. Although there are about 20 different preprocessor directives, well only discuss the most common one here and then a few others as we need them.
The #include directive, pronounced hash include, typically appears at the beginning of a C program. It is used to textually include the entire contents of another file at the point of the #include directive. A common #include directive, seen at the beginning of most C files is
This directive indicates that the contents of the file named stdio.h should be included at this point (the directive is replaced with the contents). There is no limit to the number of lines that may be included with this directive and, in fact, the contents of the included file may have further #include directives which are handled in the same way. We say that the inclusions are nested and, of course, care should be taken to avoid recursive nestings!
The example using <stdio.h>, above, demonstrates two important points. The filename itself appears between the characters < ... >. The use of these characters indicates that the enclosed filename should be found in the standard include directory, /usr/include. The required file is then /usr/include/stdio.h.
The standard include files are used to consistently provide system-wide data structures or declarations that are required in many different files. By having the standard include files centrally located and globally available, all C programmers are guaranteed of using the same data structures and declarations that they (all) require. C99 only defines 15 operating system independent header files.
Have a (recursive) look in the /usr/include directory yourself and you see that there are over 2000 standard include files available under LINUX!
Importantly, it is the use of the < ... > characters which signify that the /usr/include directory name should be prepended to the filename to locate the required file. Alternatively, the “ ... ” characters may also be used, as in the following example:
to include the contents of the file mystructures.h at the correct point in the C program. Because the “ ...” characters are used, the file is sought in the present working directory, that is ./mystructures.h. By using the “ ...” characters we can specify our own include files which are located in the same directory as the C source programs themselves.
In both of the above examples the indicated filename had the “extension” of .h. Whereas we have previously said that the extension of .c is expected by the C compiler, the use of .h is only a convention within UNIX. The .h indicates that the file is a header file, because they generally contain information required at the head (beginning) of a C program. Header files typically (and should) contain only declarations of C constructs, like data structures and constants used throughout the C program. In particular, they should not contain any executable code, variable definitions, nor C statements.
Another frequently used C preprocessor directive is the #define directive, pronounced hash define. The #define directive is used to introduce a textual value, or textual constant, which when recognized by the C preprocessor will be textually substituted by its definition. Traditionally #define directives were the only method available to C programmers, using old K&R (Brian Kernighan and Dennis Ritchie) C, of introducing constants in C programs. For example, frequently used #define-ed constants are:
After these definitions, each time the C preprocessor locates the sequence JUNIOR as a complete word within the C program, it will be substituted for the character sequence 3. Although the new ANSI-C standard has introduced a formal const construct for supporting constants, the #define directive is still the preferred method of defining some forms of constants. For example, when defining an array of integers (described in greater detail later) we use a #define directive to define the maximum size of the array.
Thereafter we use the #define-ed constant in the array definition:
If necessary, a preprocessor token may be undefined is no longer required:
The #define directive may also be used to define some inline functions, more correctly termed macros, within your C programs. An often cited example is:
C does not have a standard function for calculating the square of, say, an integer value, but using the inline macro defined above, we can now write:
where i is an integer variable. Notice that the macro substitution was performed with the macros argument being i. In a manner akin to actual and formal parameter naming in Java (and C), the actual parameter i is represented in the macro as the formal parameter x without problems. Each time x appears as a unique “word” in the right-hand-side of the definition, it will be replaced in the C code by i.
Notice that this textual substitution may also be used for calculating (in this example) the square of an integer constant. For example:
is expanded in an identical way. Our definition of sqr is not really rigourous enough to provide correct results in all cases. For example, consider the “call” to sqr(x+1) which would evaluate to 2x+1! A more correct definition would be:
Another often used feature of the C preprocessor is the use of conditional compilation directives. The C compile pre-defines a few constants to “tell” the program the operating system in use, filename being compiled, and so on:
Java supports constructors and methods which allocate instances of, and interrogate and modify the state of, their own (implicit) objects. Constructors and methods are typically directed by their parameters. C is a procedural programming language, meaning that its primary synchronous control flow mechanism is the function call. Strictly speaking, C has no procedures, but instead has functions, all of which return a single instance of a base or user-defined type. Cs functions access and modify the global memory, and (possibly) their parameters. Although we may hope that a function can only modify memory that it can “see” (through Cs scoping rules) or has been provided (through its parameter list), this is untrue. By stating that there are only functions, we are noting that all functions must return a value. While nearly true, C also has a void type, difficult to describe, and often used as a place holder (to keep the compiler happy!). We may think of a procedure in C, as a function that returns a void; that is to day, nothing is returned. With a similar thought, we will often invoke a function, but have no use for its return value. For example, a function such as printf() will return an integer as its result, but we rarely need to use this integer. We can “cast its value” to void, effectively throwing away the value.
The default return datatype of a function is int if a functions datatype is omitted, the compiler assumes it to be an int. This has the unpleasant result, that if an external or yet to be defined functions prototype is omitted, the compile will often silently assume an int return result. This is a frequent cause of problems, particularly when dealing with functions returning floating point values, as in Cs mathematics library. The use of gccs -pedantic switch allows us to trap such errors.
Every complete C program has an entry point named main, at which it appears the operating system calls the program. Function main is of type int this int is returned as the result of execution of the whole program, with 0 indicating a successful execution, anything non-zero otherwise. Cs functions may receive zero or more parameters. All parameters to Cs functions are passed by value.
Other than within a single file, the datatype of function parameters between the functions definition and invocation is not checked, i.e. C provides no link-time cross file type checking. Perhaps surprisingly, C also permits functions to receive a variable number of parameters. At run-time it is the functions responsibility to deal with the data types received, and the compiler cannot perform any type checking on these parameters.
Function parameters are implicitly promoted to “higher” datatypes by the compiler chars are promoted to ints, and floats are promoted to doubles.
The following example code used functions. The code toss.c asks the use to enter the number of fair toss of a coin and then computes the number of heads and tails. Random number generators are used.
C code: toss.c
The contents of toss.c looks like this:
C has no equivalent construct to the Java class. Instead, C provides two aggregate data structures arrays and structures.
Arrays in C are not objects, nor strictly single variables. Instead, an arrays name is the name referring to the first memory address of a contiguous block of memory of the requested length. Arrays may be declared or defined wherever scalar variables are declared or defined arrays may be either arrays of Cs base types or user-defined types.
There is no array keyword in C, and no bounds checking at run-time. C array subscripts commence at 0, the highest valid subscript of int a[N] thus being N-1.
One dimensional arrays defined with (for example) int score[20];
-> declare score as array of 20 int
Multi-dimesntional arrays?
Strictly speaking, C does not support multi-dimensional arrays. However, if all (one-dimensional) arrays in C are considered as vectors, then multi-dimensional arrays are simply understood as “vectors of vectors”.
-> explain char str[10][20]
declare str as array of 10 array of 20 char
The number of elements of an array can be determined with :
Structures in C are aggregate datatypes consisting of fields or members of base types, or other user-defined types. C structures may not include executable code, unlink methods in Java classes.
C provides no base type that is a string, though the C compiler accepts the use of double quoted character string literals and does the obvious thing. A string in C is a sequence of characters (bytes) in contiguous memory locations. The string is terminated by the sentinel value of the NULL character (zero byte). When a C compiler detects a string literal in a program, it will allocate enough contiguous global (read-only) memory to hold the characters of the string (including the NULL byte at the end).
C does not record the length of a string anywhere (as does Java). Instead, by convention, the length of a string is defined as the number of characters from the beginning of the string (its starting address) up to, but not including, the NULL byte. The length of “hello” is 5.
Arrays of characters are typically used to store character strings. Notice that the parameter to the following function does not indicate any expected (maximum) size, or “length”, of the array.
The snippet of code below include two strings that are compared. Literal strings are stored as ASCII codes and the strng comparison strcmp compares each character’s ASCII code in making the comparison. If s1 < s2 then the return value < 0, s1 > s2 then the return value > 0, s1 = s2 then the return = 0.
C code: string.c
The contents of string.c looks like this:
C code: array.c
The contents of array.c looks like this:
If we run array we get the following output
The next code snippet shows the address of an array and some of its elements. It importantly shows the equivalence of two types of common notation in dealing with addresses and arrays, Run the code below. Note that the & operator is not used before the array name. Because an array is a pointer constant equivalent to the address of the first storage location reserved for the array. the expressions “numbers” and &numbers[0] are equivalent. As an aside, if you wanted to pass the address of an array in a function call you could replace &numbers[0] with simply numbers. */
C code: array-address.c
The contents of array-address.c looks like this:
Let’s look at the output from array-address