In this lecture, we carry on our introduction to the C language.
We plan to learn the following from today’s lecture:
The C programming language has a very powerful feature (and if used incorrectly a very dangerous feature), which allows a program at run-time to access its own memory. This ability is well supported in the language through the use of pointers. There is much written about the power and expressiveness of C’s pointers, and much (more recently) written about Java’s lack of pointers. More precisely, Java does have pointers, termed references, but the references to Java’s objects are so consistently and carefully constrained at both compile and run-time, that very little can go wrong. Not C.
C has both “standard” variables and structures, and pointers to these variables and structures (Java only has references to objects, and it is only possible to manipulate the computers memory used to hold the objects, by using references). C’s drawback is that while the pointers allow us to easily refer to scalar variables and aggregate structures, C has very little support to prevent us accessing anything else (accidently) at run-time. All speed advantages provided by the availability of pointers, can be trivially consumed by the time taken to debug a program incorrectly using pointers.
Pointers allow us to refer to the address of a variable rather than its value. If this were all that were possible, we may be able to get away without using pointers at all. Unfortunately, parameters to functions may only be passed by value, and so a rudimentary understanding of C’s pointers is needed to use “pass-by-reference” parameter passing in C.
Consider the following example trying to interchange the value of two integer variables:
C code: swap.c
before a=3, b=5
after a=3, b=5
Not what we expected. What went wrong?
We need to pass a “reference” to the two integers to be interchanged, so that the function swap() is actually dealing with the original variables, rather than new copies of their values (passed on the run-time stack).
C code: okswap.c
before a=3, b=5
after a=5, b=3
Now it works.
Here we have introduce a bit more syntax (and, typically, it uses punctuation characters).
The address operator, &, is used to determine the run-time memory address of a variable. Here we require the memory address of the variables i and j before passing these addresses to the swap() function. Notice, that we are still using pass-by-value parameter passing, but that we are passing addresses on the run-time stack.
The two asterisks * in swap()’s formal definition (e.g., *ip) indicate that the variables ip and jp are pointers, or pointer variables, rather than just “simple” variables. It is typical in C programs to append “p” or “ptr” to a variables name to indicate that its a pointer.
The asterisks are always placed in front of ip and jp in function swap() indicate that we wish to dereference these variables. Instead of using the contents of these variables (which are “meaningless” memory addresses) we wish to use the values pointed to by these variables. Notice, that we may dereference variables on “both sides” of an assignment expression.
One often confusing point in C is the synonymous use of arrays, character strings, and pointers. The name of an array in C, is actually the memory address of the arrays first element. Thus the following two assignment statements are the same, and the first is the most commonly used:
-> explain char *ptr
declare ptr as pointer to char
If we also remember that C’s character strings are simply a contiguous series of characters which, by convention, are terminated by a NULL character, then we can consider strings to be arrays to, and strings may be accessed through pointers (you may wish to consider a strings first character as being stored at the memory address of the array of characters. We can thus write:
We will often see the use of character pointers (used to strings), and character arrays (with assumed terminating NULL characters), used interchangeably:
In the code snippet we look at the contents of two ints a and b (int a, b;) definitions and a pointer p (int *p;). We move various values around and look at &p, p, and *p.
We set the contents of the pointer p = 1. Not a good idea. Run the code with and without the line p = 1. What happens and why? The tittle of the code is self explanatory!
C code: buserr.c
Many people get a little confused about the * operator when used in C. The asterisk get used in two ways in C code and you should remember this so as not to get confused. We have used the * to either declare a pointer to a type for example int *ptr or to deference a pointer where * is used as the indirection operator * to access the data by means of pointer ptr, for example c = *ptr where previous c is declared e.g., char c; Here the indirection operator indicated the contents of what the pointer is pointing to is written into the c variable.
Let’s look at the code below and discuss some of the examples to clear this point up. Also we will look at the similarities and differences between arrays and pointers.
C code: pointer-examples.c
Another confusing facility in C is the use of pointer arithmetic with which we may advance a pointer to point to successive memory locations at run-time. It would make little sense to be able to “point anywhere”into memory, and so C automatically adjusts pointers (forwards and backwards) by values that are multiples of the size of the base types (or user-defined structures) to which the pointer points. We specify pointer arithmetic in the same way we specify numeric arithmetic, using +, , and pre- and post- increment and decrement operators (multiplication and division make little sense). We may thus traverse an array with pointer arithmetic:
Notice that we are simply “moving the pointer along”, we are not modifying what it pointers to, simply accessing adjacent memory locations until we reach one containing the NULL character. This example is a little simple, because the character pointer will only be advanced one memory location (one byte) at a time, as a character is one byte long. Alternatively, consider the five equivalent examples:
Unfortunately, we frequently see an excessive use of pointer arithmetic in C with programmers trying to be too smart to speed up their programs. For example my_strcpy() below. Does the function copy the NULL character to?
C code: strcpy.c
The contents of strcpy.c looks like this:
If you compile and run pointers then you get the following. Look closely at the pointer values and the address of the people array of structs and the various sizes of data types including a pointer and the size of the person struct.
With code such as this, in which we are trying to copy all characters from src to dest until we reach the NULL character, we always have in the back of our minds the concern as to whether the NULL character is in fact copied from the end of src to dest, and thus legally terminates dest.
NOTE, there is a bug in C code: strcpy.c You will need to get use to debugging code. Study the ouput and the arrays. The output is wrong. Can you find the error? It is related to pointers. Think about it first by studying the code. Reading each line and executing the program in your head with a pen and paper. Here your head is the computer (instruction execution) and the pen/paper, memory. Can you find it?
Once you have spent a little time studying the original code then take a look at the fixed code:
C code: fixed-strcpy.c
Now compile the new code and look at the output. Compare the output of fixed-strcpy.c and strcpy.c
“Desk checking code” is a very valuable and efficient way to find bugs. I think it is a little smarter to do that just hacking on the computer and using printf. When desk checking code for errors you are the computer and printf is you looking at the values of variables that you updated on paper as you execute each line of the code in your head. Also, many bugs like this one are so called “boundary bugs”. We will talk about debugging soon in class. You need a suite of technques. I rate desk checking way up there; conversely, blind hacking to fix debugs - way down there. Be smart as a coder, use your head.
The correct out out is below. Note, now the code prints the correct last element of the src array (’/0’ which is 0) and the character at the same index in the dst array (a white space):
Special case that comes in handy. For example, malloc returns a pointer to void that can be equated to any data type, int *, char *, struct *, etc.
void * is a generic pointer that can be assigned to any type. Very cool!
Some examples below: