CS 50 Software Design and Implementation

Lecture 8

Pesky Pointers

In this lecture, we carry on our introduction to the C language.

Goals

We plan to learn the following from today’s lecture:

Pointers

The C programming language has a very powerful feature (and if used incorrectly a very dangerous feature), which allows a program at run-time to access its own memory. This ability is well supported in the language through the use of pointers. There is much written about the power and expressiveness of C’s pointers, and much (more recently) written about Java’s lack of pointers. More precisely, Java does have pointers, termed references, but the references to Java’s objects are so consistently and carefully constrained at both compile and run-time, that very little can go wrong. Not C.

C has both “standard” variables and structures, and pointers to these variables and structures (Java only has references to objects, and it is only possible to manipulate the computers memory used to hold the objects, by using references). C’s drawback is that while the pointers allow us to easily refer to scalar variables and aggregate structures, C has very little support to prevent us accessing anything else (accidently) at run-time. All speed advantages provided by the availability of pointers, can be trivially consumed by the time taken to debug a program incorrectly using pointers.

Pointers allow us to refer to the address of a variable rather than its value. If this were all that were possible, we may be able to get away without using pointers at all. Unfortunately, parameters to functions may only be passed by value, and so a rudimentary understanding of C’s pointers is needed to use “pass-by-reference” parameter passing in C.

Consider the following example trying to interchange the value of two integer variables:


  #include <stdio.h>

  void swap(int i, int j) {

      int temp;
      temp = i;
      i =j;
      j =temp;

  }

  int main(int argc, char *argv[]) {

      int a=3, b=5;
      printf("before a= %d, b= %d\n",a,b);
      swap(a,b);
      printf("after a= %d, b= %d\n",a,b);
      return(0);

  }

C code: swap.c

before a=3, b=5

after a=3, b=5

Not what we expected. What went wrong?

Pass by reference using pointers

We need to pass a “reference” to the two integers to be interchanged, so that the function swap() is actually dealing with the original variables, rather than new copies of their values (passed on the run-time stack).


  #include <stdio.h>
  void swap(int *ip, int *jp) {

      int temp;
      temp = *ip;
      *ip = *jp;
      *jp = temp;

  }

  int main(int argc, char *argv[]) {

    int a=3, b=5;
    printf("before a= %d, b= %d\n",a,b);
    swap(&a, &b);
    printf("after a= %d, b= %d\n",a,b);
    return(0);

  }

C code: okswap.c

before a=3, b=5

after a=5, b=3

Now it works.

Here we have introduce a bit more syntax (and, typically, it uses punctuation characters).

The address operator, &, is used to determine the run-time memory address of a variable. Here we require the memory address of the variables i and j before passing these addresses to the swap() function. Notice, that we are still using pass-by-value parameter passing, but that we are passing addresses on the run-time stack.

The two asterisks * in swap()’s formal definition (e.g., *ip) indicate that the variables ip and jp are pointers, or pointer variables, rather than just “simple” variables. It is typical in C programs to append “p” or “ptr” to a variables name to indicate that its a pointer.

The asterisks are always placed in front of ip and jp in function swap() indicate that we wish to dereference these variables. Instead of using the contents of these variables (which are “meaningless” memory addresses) we wish to use the values pointed to by these variables. Notice, that we may dereference variables on “both sides” of an assignment expression.

Pointers to arrays and character strings

One often confusing point in C is the synonymous use of arrays, character strings, and pointers. The name of an array in C, is actually the memory address of the arrays first element. Thus the following two assignment statements are the same, and the first is the most commonly used:


  char buffer[BUFSIZ], *ptr;
  ptr = buffer;
  ptr = &buffer[0];

-> explain char *ptr
declare ptr as pointer to char

If we also remember that C’s character strings are simply a contiguous series of characters which, by convention, are terminated by a NULL character, then we can consider strings to be arrays to, and strings may be accessed through pointers (you may wish to consider a strings first character as being stored at the memory address of the array of characters. We can thus write:


  int n;
  char *hex_values = "0123456789abcdef";

We will often see the use of character pointers (used to strings), and character arrays (with assumed terminating NULL characters), used interchangeably:


  int my_strlen(char *str) {

      int len = 0;

      while( str[len] /* != ’\0’ */ )
          ++len;

      return(len);

  }

Pointers can be like driving too fast: dangerous

In the code snippet we look at the contents of two ints a and b (int a, b;) definitions and a pointer p (int *p;). We move various values around and look at &p, p, and *p.

We set the contents of the pointer p = 1. Not a good idea. Run the code with and without the line p = 1. What happens and why? The tittle of the code is self explanatory!

C code: buserr.c

Declaration versus Redirection: the difference uses of the asterisk * operator

Many people get a little confused about the * operator when used in C. The asterisk get used in two ways in C code and you should remember this so as not to get confused. We have used the * to either declare a pointer to a type for example int *ptr or to deference a pointer where * is used as the indirection operator * to access the data by means of pointer ptr, for example c = *ptr where previous c is declared e.g., char c; Here the indirection operator indicated the contents of what the pointer is pointing to is written into the c variable.

Let’s look at the code below and discuss some of the examples to clear this point up. Also we will look at the similarities and differences between arrays and pointers.

C code: pointer-examples.c



 #include<stdio.h>

int main(int argc, char *argv[]) {

  int a[]={1,2,3,4,5,6,7};

  int i;

  int *p = a; //  equivalent to p = a;

  // lots of ways to print the address of the array.

  // array names such as a or variables such as i have addresses in
  // memory &a and &i, respectively. These are called address constants.
  // They can’t be changed.

  // Note, with pointer or array arithmetic that what is added e.g.,
  // interger 1 in terms of address is dependant on the type. array a is
  // an array of integers so adding 1 would add 04 to the address. If
  // it was a char array then only 1 would be added to the address
  // computation

  printf("addresses: &a = %p a+1 = %p p+1 = %p\n", (void *)&a,
          (void *)(a+1), (void *)(p+1));

  // because a is an array name you can just use a instead of &a
  // to print the address

  printf("addresses: a = %p a+1 = %p p+1 = %p\n", (void *)&a,
          (void *)(a+1), (void *)(p+1));

  // Now lets see the various ways we can look at the content of variables
  // using array and pointer arithmetic. Note, the indirection operator *
  // is used to access data, i.e., the contents of what the pointer is
  // pointing to.

  printf("addresses: a[1] = %d *(a+1) = %d *(p+1) = %d p[1] = %d\n", a[1],
        *(a+1), *(p+1), p[1]);

  // we can use a variable such as i with arrays of course but also for
  // pointer arthimetic

  i=2;

  printf("addresses: a[i] = %d *(a+i) = %d *(p+i) = %d p[i] = %d\n", a[i],
         *(a+i), *(p+i), p[i]);

  // Here we make p point to 0. NULL pointers have significance in C.
  // Because all objets have non zero addresses a NULL pointer always
  // represents an invalid address. Functions can return pointers and
  // indicate failure by returning a NULL pointer.

  p=NULL;

  // Some times people get confused with the * operator. It has two meanings:

  // When * is used when declaring a pointer char *ptr  means ‘‘pointer to’’
  // a character, e.g.,

  char *ptr = "andrew";
  char  c;

  // but when not used in the declaration of a pointer (e.g., *ptr).
  // The indirection operator * is used to access the data by means of
  // pointer ptr. Therefore,  c = *ptr but the object (a) into variable c

  c=*ptr;

  printf("ptr = %p, *ptr = %c, c = %c\n", (void *)ptr, *ptr, c);

  return 0;

}

Pointer Arithmetic

Another confusing facility in C is the use of pointer arithmetic with which we may advance a pointer to point to successive memory locations at run-time. It would make little sense to be able to “point anywhere”into memory, and so C automatically adjusts pointers (forwards and backwards) by values that are multiples of the size of the base types (or user-defined structures) to which the pointer points. We specify pointer arithmetic in the same way we specify numeric arithmetic, using +, , and pre- and post- increment and decrement operators (multiplication and division make little sense). We may thus traverse an array with pointer arithmetic:


  int my_strlen(char *str) {

      int len = 0;

      while( *str /* != ’\0’ */ ) {
          ++len;
          ++str;
      }
      return(len);

  }

Notice that we are simply “moving the pointer along”, we are not modifying what it pointers to, simply accessing adjacent memory locations until we reach one containing the NULL character. This example is a little simple, because the character pointer will only be advanced one memory location (one byte) at a time, as a character is one byte long. Alternatively, consider the five equivalent examples:


  int sum_array(int *values, int n) {

      int i, *ip;
      int sum = 0;

      for(sum=0, i=0 ; i<n ; i++)
          sum += values[i];

      for(sum=0, i=0 ; i<n ; i++)
          sum += *(values+i);

      for(sum=0, ip=values; ip<&values[n] ; ip++)
          sum += *ip;

      for(sum=0, i=0 ; i<n ; i++) {
          sum += *values;
          ++values;
      }

      for(sum=0, i=0 ; i<n ; i++)
          sum += *values++;
      }

      return(sum);

   }

Unfortunately, we frequently see an excessive use of pointer arithmetic in C with programmers trying to be too smart to speed up their programs. For example my_strcpy() below. Does the function copy the NULL character to?

C code: strcpy.c

The contents of strcpy.c looks like this:


/*
  File: strcpy.c

  Description: Copies a source string to a destination. Keeps copying
  until it finds the NULL char in the source char string

  Input: char pointers for source (s2) and destination (s1)

  Output: returns the pointer to the destinatio (s1)
*/

#include<stdio.h>
#include<stdlib.h>

char *my_strcpy(char * , const char * );

int main()
{

  char src[] = "cs23!";
  char dst[]="Hello hello";
  char *curdst;
  int len=0;

  printf("src address %p and first char %c \n", (void *)&src, src[0]);
  printf("dst address %p and first char %c \n", (void *)&dst, dst[0]);

  // compute where NULL character is ’\0’ ASCII 0

  while(src[len++]);

  // print out the char arrays and various addresses.

  printf("src array %s and last element %d\n", src, atoi(&src[len]));
  printf("dst array %s and last element %c\n", dst, dst[len]);

  // do the copy

  curdst= my_strcpy(dst, src);

  // check to see if the NULL char is copied too.

  printf("dst array %s and last element %d\n", dst, atoi(&dst[len]));

  return 0;

}

char *my_strcpy(char *s1, const char *s2) {

  register char *d = s1;

  // print the pointer variables address and their contents, and first char

  printf("s2 address %p, its contents is a pointer %p to
          first char %c \n", (void *)&s2, (void *)s2, *s2);
  printf("s1 address %p, its contents is a pointer %p to first
          char %c \n", (void *)&s1, (void *)s1, *s1);

  while ((*d++ = *s2++));
  return(s1);

}

If you compile and run pointers then you get the following. Look closely at the pointer values and the address of the people array of structs and the various sizes of data types including a pointer and the size of the person struct.



./strcpy
src address 0x7fff5fbff650 and first char c
dst address 0x7fff5fbff660 and first char H
src array cs23! and last element 0
dst array Hello hello and last element h
s2 address 0x7fff5fbff620, its contents is a pointer 0x7fff5fbff650 to first char c
s1 address 0x7fff5fbff628, its contents is a pointer 0x7fff5fbff660 to first char H
dst array cs23! and last element 0

With code such as this, in which we are trying to copy all characters from src to dest until we reach the NULL character, we always have in the back of our minds the concern as to whether the NULL character is in fact copied from the end of src to dest, and thus legally terminates dest.

There is a bug in the code.

NOTE, there is a bug in C code: strcpy.c You will need to get use to debugging code. Study the ouput and the arrays. The output is wrong. Can you find the error? It is related to pointers. Think about it first by studying the code. Reading each line and executing the program in your head with a pen and paper. Here your head is the computer (instruction execution) and the pen/paper, memory. Can you find it?

Once you have spent a little time studying the original code then take a look at the fixed code:

C code: fixed-strcpy.c

Now compile the new code and look at the output. Compare the output of fixed-strcpy.c and strcpy.c

“Desk checking code” is a very valuable and efficient way to find bugs. I think it is a little smarter to do that just hacking on the computer and using printf. When desk checking code for errors you are the computer and printf is you looking at the values of variables that you updated on paper as you execute each line of the code in your head. Also, many bugs like this one are so called “boundary bugs”. We will talk about debugging soon in class. You need a suite of technques. I rate desk checking way up there; conversely, blind hacking to fix debugs - way down there. Be smart as a coder, use your head.

The correct out out is below. Note, now the code prints the correct last element of the src array (’/0’ which is 0) and the character at the same index in the dst array (a white space):


./strcpy
src address 0x7fff5fbff650 and first char c
dst address 0x7fff5fbff660 and first char H
src array cs23! and last element 0
dst array Hello hello and last element
s2 address 0x7fff5fbff620, its contents is a pointer 0x7fff5fbff650 to first char c
s1 address 0x7fff5fbff628, its contents is a pointer 0x7fff5fbff660 to first char H
dst array cs23! and last element 0

Casting


int a, *ip;

char *cp;

long l, *lp;

// Will not work, why?

a = cp;

// But this will - casting; but probably not a great idea to case the pointer to int.

a = (int)cp;

// Will not work

ip = lp;

// But this will - casting

ip = (int *)lp;

void *

Special case that comes in handy. For example, malloc returns a pointer to void that can be equated to any data type, int *, char *, struct *, etc.

void * is a generic pointer that can be assigned to any type. Very cool!

Some examples below:

Casting


void *vp;

// Will not work

cp = ip;

// Need, casting

cp = (char *)ip;

// But, this works a treat

cp = vp;

// and this

ip =vp;