CS 50 Software Design and Implementation

Lecture 7

Standard IO Lib and C/OS Inteface

In this lecture, we carrying on our introduction to the C langauge.

Goals

We plan to learn the following from today’s lecture:

The standard I/O library

The C language itself does not define any particular file or character-based input or output routines (nor any windowing routines) unlike Java. Instead any program may provide its own. Clearly this is a daunting task, and so the standard C library provides a collection of functions to perform file-based input and output. The standard I/O library functions provide efficient, buffered I/O to and from both terminals and files.

C programs requiring standard I/O should include the line:


  #include <stdio.h>

All transactions through the standard I/O functions require a file pointer:


  FILE *fp;

  fp = fopen("file.dat", "r");
  ......
  fclose(fp);

Although we are strictly dealing with a C pointer, we simply pass this pointer to functions in the standard C library. Some texts will refer to this pointer as a file stream (and C++ confused this even more), but these should not be confused with nor be described as akin to Javas streams. An number of predicate macros are provided to check the status of file operations on a given file pointer:


  feof(fp)      /* checks for end-of-file */
  ferror(fp)    /* checks for an error on a file */

The standard I/O functions all return NULL or 1 (as appropriate) when an error is detected. For example:


  #include <stdio.h>

  int main(int argc, char *argv[]) {

    FILE *fp;

    if((fp=fopen("/etc/passwd", "w")) == NULL) {
       error message ...
    } else {

      /* process the file */
      ...
      fclose(fp);
    }
  }

The most frequently used functions in the C standard I/O library perform output of formatted data. We also see here the most frequent use of Cs acceptance of functions receiving a variable number of arguments:


  fprintf(FILE *fp, char *format, (T)arg1, (T)arg2, ...);

  e.g. int res;
       char *name = "Chris";

       fprintf(fp,"res= %d name= %20s\n", res, name);

Many standard I/O functions accept a format specifier a string indicating how following arguments are to be displayed. This mechanism is in contrast to Javas toString facility in which each object knows how to output/display itself as a String object. There are many possible format specifiers, the most common ones being “c” for character values, “d” for decimal values, “f” for floating point values, and “s” for character strings. Format specifiers may be preceded by a number of format modifiers, which may further specify their data type, and to indicate the width of the required output (in characters). As a special case, we may use a more concise version of fprintf() in which the FILE pointer of the operating systems standard output device is used (typically, the screen). Thus, the following two statements are identical:


  fprintf(stdout, "res= %d name= %20s\n", res, name);
  printf("res= %d name= %20s\n", res, name);

We mentioned before that the C standard I/O library provides efficient buffering. This means that although it appears that the output has gone to the FILE pointer, it may still be held within an internal character buffer in the library (and will hence not yet be on disk, or to the screen). We often need to flush our output to ensure that it is more quickly written to disk or the screen. FILE pointers are automatically flushed when a file is closed or the process exits:


  /* ... format some output ...*/
  fflush(fp);

As well as outputting to FILE pointers, we may also perform formatted output to a character array (a string), with a very similar series of functions:


  int res;
  char *name = "Chris";
  char buffer[BUFSIZ];
  sprintf(buffer, "res= %d name= %s\n", res, name);

Cs standard I/O library may also be used to input values from FILE pointers and character arrays using fscanf() and sscanf(). Because we want the contents of Cs variables to be modified by the standard I/O functions, we need to pass the address of the variables:


  fscanf(fp, format, &arg1, &arg2, ...);

  e.g.,

  int i, res;
  char buffer[BUFSIZ];
  fscanf(fp, "%d %d", &i, &res);
  sscanf(buffer, "%d %d", &i, &res);

We also frequently need to read all lines from a file, or to (perhaps) sum all integers values from a file. We must be careful here, with the particular return values of the C standard I/O functions. The functions themselves return NULL FILE pointers, or a value of 1 at the end of a file or an error condition, but we must be care when we check these values:


  #define MAXLINE 80

  int i, sum;
  char line[MAXLINE];

  for(;;) {
    fgets(line, sizeof(line), fp);
    if(feof(fp))
       break;

    /* ... process the line just read ...*/

  }
  fclose(fp);



  ........

  An example of reading in integers from a file using fscanf()

  sum = 0;
  while(fscanf(fp, "%d", &i) == 1)
     sum += i;
  fclose(fp);
  ........

  A note on fgets()

  Assuming there is enough room in the buffer, fgets() reads the data including
  the newline \n into the buffer and null \0 terminates the line. If there is not enough
  room in the buffer before coming across the newline, fgets() copies what
  it can (the length of the buffer minus one byte)
  and null terminates the string.

  Note, fgets() stops reading when it reads a newline character \n,
  but the newline is considered a valid character and is included
  in the returned string.

  If you want to remove it, you’ll need to trim it yourself:

  length = strlen(str);

  if (str[length - 1] == ’\n’)
    str[length - 1] = ’\0’;

  Where str is the string into which you read the data from the file, and length is of type size_t.

  Let’s delve into this a bit more in the files.c example below

Here is a code snippet that uses fopen(), assert(), fgets(), strlen(), printf(), sscanf(), fclose(). The code it self is not that interesting. It reads what is in file input It saves the string (45 , ignore˙this % C read˙in˙this**) from the file input to an array and the sscanf() into various variable. Not that because the input file has a string and a new line character \n the fgets() appends that at the end of the array. The code replaces with with a NULL or \0 character. Some formatting is shown.

C code: files.c

Danger: gets() why not to use gets() - an example in bad IO

There is a saying - you learn from your mistakes, so make lots of them. There is another one: don’t make the same mistake twice. The use of the standard IO function gets() is a mistake. Only make the mistake of using once in running the following program.

Lets look at the following buffer-overflow.c aka really-bad-c.c:

C code: buffer-overflow.c

Apart of the use of gets() this is a nice little string manipulation program that uses a number of other C library calls that are of interest.

The contents of buffer-overflow.c looks like this:


/*

   File:  buffer-overflow.c

   Description: This is a bad program! But its fun. The basic idea of
                of the program to input and manipulate strings using
                arrays of chars is fun. However, there is a serious flaw
                in the program. The book uses the function gets(). This
                is a seriously dangerous function call. DONT USE IT.

   Revised code taken from pg. 457 (Program 9.5) (Bronson) "First Book on ANSI C"

*/

#include  <stdio.h>
#include <string.h> /* required for the string function library */

#define MAXELS 50

int main()
{

  char string1[MAXELS] = "Hello";
  char string2[MAXELS] = "Hello there";
  int n;

  n = strcmp(string1, string2);

  if (n < 0)
    printf("%s is less than - %s\n\n", string1, string2);
  else if (n == 0)
    printf("%s is equal to - %s\n\n", string1, string2);
  else
    printf("%s is greater than -  %s\n\n", string1, string2);

  printf("The length of string1 is %d characters\n", strlen(string1));
  printf("The length of string2 is %d characters\n\n", strlen(string2));

  strcat(string1," there World!");

  printf("After concatenation, string1 contains the string value\n");
  printf("%s\n", string1);
  printf("The length of this string is %d characters\n\n",
                                                   strlen(string1));

  printf("Please enter a line of text for string2, max %d characters: ", sizeof(string2));

  /* In the code below comment and uncomment the gets() code */

  gets(string2);

  /* In the code below comment and uncomment the fgets() code segment */

  /*  fgets(string2, sizeof(string2), stdin); */

  printf ("Thanks for entering %s\n", string2);

  strcpy(string1, string2);

  printf("After copying string2 to string1");
  printf(" the string value in string1 is:\n");
  printf("%s\n", string1);
  printf("The length of string1 is %d characters\n\n",
                                                 strlen(string1));
  printf("\nThe starting address of the string1 string is: %x\n",
                                                 string1);
  printf("\nThe starting address of the string2 string is: %x\n",
                                                 string2);
  return 0;
}

Let’s look at the output when running the program first with gets() and then with the safer fgets().



If we run the code with gets() we get a segfult when entering 51 charcters.

$ ./buffer-overflow
Hello is less than - Hello there

The length of string1 is 5 characters
The length of string2 is 11 characters

After concatenation, string1 contains the string value
Hello there World!
The length of this string is 18 characters

warning: this program uses gets(), which is unsafe.
Please enter a line of text for string2, max 50 characters: 01234678901234567890123456789012345678901234567890
Thanks for entering 01234678901234567890123456789012345678901234567890
Segmentation fault


If we comment out gets() and uncomment fgets() we are safe:

$ ./buffer-overflow
Hello is less than - Hello there

The length of string1 is 5 characters
The length of string2 is 11 characters

After concatenation, string1 contains the string value
Hello there World!
The length of this string is 18 characters

Please enter a line of text for string2, max 50 characters:  01234678901234567890123456789012345678901234567890
Thanks for entering  012346789012345678901234567890123456789012345678
After copying string2 to string1 the string value in string1 is:
 012346789012345678901234567890123456789012345678
The length of string1 is 49 characters


The starting address of the string1 string is: bffff87a

The starting address of the string2 string is: bffff848

This is a bad program! But it’s fun. The basic idea of of the program to input and manipulate strings using arrays of chars is fun. However, there is a serious flaw in the program. The book uses the function gets(). This is a seriously dangerous function call. DONT USE IT.

The program below defines a buffer of 50 chars in length. The user types in characters from the keyboard and they are written to the buffer i.e., string1 and string2.

The input parameter to gets() is the name of the array (which is a pointer - more on pointers later). The function does not know how long the array is! It is impossible to determine the length of string1 and string2 from a pointer alone.

If we run the program below and type in 50 characters, all is safe. But if we type 51 or 60 or 100, etc. chars we over run or “overflow” the buffer. We end up writing past the end of the array! BTW, you can do this easily without calling an unsafe function such as getf() so it’s an important lesson to learn or mistake to me. Bugs happen at bounderies conditions and one important boundary is the end of the array. It that is the one thing you learn from this lecture then that is a good one. If we over write string1 we might over write into string2. Recall, by convention C strings are terminated by ’/0’ (NULL) by escape 0 which is ascii 0. If this is overwritten then a piece of code operating on the array will keep on trucking until it finds a ’/0’.

If we run this code and type in more that 50 chars (as we did above) any thing can happen; for example: 1) the code could work with no visible affect of the bug; 2) immediate segfault; 3) segfault later in the code stream; 3) termination of another unrelated program or system call e.g., strcat() in our code below.

BTW, buffer overflow was the backdoor to bringing down the Internet machines by a very clever fellow. Now a Professor of Computer Science, MIT.

The reason I highlight this is the book uses gets() and promotes its use. Don’t use it. Rather, use the safe fgets(); fgets() is a buffer safe function. Its prototype is:

char *fgets(char *s, int size, FILE *stream);

Example:

fgets(buf, sizeof(buf), stdin);

The fgets() function shall read bytes from stream into the array pointed to by buf, until n-1 bytes are read, or a newline is read and transferred to buf, or an end-of-file condition is encountered. The string is then terminated with a null byte.

We replace gets() with fgets() in the above code and now we are safe.

We have learnt a few lessons here. We got insight into IO from the stdin to a char array. As the course progresses we will use dynamic allocation of memory for data structures and arrays. So this problem will be mute.

If you want to read in just characters from the screen as in the case of Lab3 prs game you can use getchar(). But remember that if you enter a character on the keyboard and then hit return the “newline” character is also in the stream. Take a look at this code and run it. The code removes any control characters from the stream such as newlines, tabs, etc – all characters considered whitespaces.

C code: getchar.c

Also, make sure you understand ascii. This little program converts a string to a number using atoi (ascii to int) and the prints various control characters and finally the ascii table (unextended).

C code: ascii.c

The C/Operating system interface

Operating systems, such as UNIX, LINUX, Mac-OSX, and Windows-XP, will call C programs with two parameters:

an integer argument count (argc),

an array of pointers to character strings (argv), and

Notice that in many previous examples weve provided a main() without any parameters all. Remember that C does not check the length and types of parameter lists of functions which it does not know about ones that have not been prototyped. In addition, the function main() has no special significance to the C compiler. Only the linker requires main() as the apparent starting point of any program. Most C programs you see will only have the first two parameters.


int main(int argc, char *argv[])

-> explain char *argv[]

The following program prints out the command line. Note that argv[0] is the program name and then follows the input items to the proogram argv[1] .. argv[N].

Lets look at the following snippet:

C code: arguments.c

The contents of arguments.c looks like this:


#include <stdio.h>

int main(int argc, char *argv[]) {

  int i;

  printf("%d items were input on the command line\n", argc);
  for (i = 0; i < argc; i++)
    printf("arguments %d is %s\n", i, argv[i]);

}

[atc@Macintosh-10 l7]$ mygcc arguments arguments.c
[atc@Macintosh-10 l7]$ ./arguments did you catch me sneaking onto moose Sunday
9 items were input on the command line
arguments 0 is ./arguments
arguments 1 is did
arguments 2 is you
arguments 3 is catch
arguments 4 is me
arguments 5 is sneaking
arguments 6 is onto
arguments 7 is moose
arguments 8 is Sunday

declare argv as array of pointer to char.

Another more interesting snippet of code shows that the comand line is store as a set of string arguments in memory and that the address of the location of the first character for each string arguments is stored in the argv[] array.

Lets look at the following snippet:

C code: command.c

The contents of command.c looks like this:


#include <stdio.h>

int main(int argc, char *argv[])
{
  int i;

  printf("\nThe number of items on the command line is %d\n\n",argc);
  for (i = 0; i < argc; i++)
  {
    printf("argument %d is %s\n", i, argv[i]);
    printf("The address stored in argv[%d] is %x\n", i, (unsigned int)argv[i]);
    printf("The character pointed to is %c\n", *argv[i]);
  }

  return 0;
}

If you run the command this the output - note that the hexadecimal
address of the first character for each argument is printed out too.

[atc@dhcp-212-172 l7] ./command hello cs50 ready to go bolling!

The number of items on the command line is 7

argument 0 is ./command
The address stored in argv[0] is bffff96c
The character pointed to is .
argument 1 is hello
The address stored in argv[1] is bffff976
The character pointed to is h
argument 2 is cs50
The address stored in argv[2] is bffff97c
The character pointed to is c
argument 3 is ready
The address stored in argv[3] is bffff981
The character pointed to is r
argument 4 is to
The address stored in argv[4] is bffff987
The character pointed to is t
argument 5 is go
The address stored in argv[5] is bffff98a
The character pointed to is g
argument 6 is bolling!
The address stored in argv[6] is bffff98d
The character pointed to is b

The characters (and assic inserts for NULL) for the line is as follows:

./command.hello\0cs23ready\0to\0go\0bolling!\0

NULL=\0 to terminate the argument which is a “char *” string.

A common activity at the start of a c program is to search the argument list for command-line switches commencing with a dash character. Remaining command-line parameters are often assumed to be filenames.

The program below parses the command line of a sort command. It will process:

sort -n
sort -r
sort -u
sort -r -u -n

Any variation of the above is also supported.

but not:

sort -run

An example of defensive programming: If the use enters a bad option then the user is informed with a usage message:


[atc@dhcp-210-161 l7] ./sort -y
Usage: bad option -y

C code: sort.c

The contents of sort.c (not sort code is included only command line parsing)looks like this:


// The program parses the input switches to sort
// supports command lines such as sort -r -u -n
// but not sort -run which you will need for
// the Lab3

#include<stdio.h>

int main(int argc, char *argv[]) {

  int unique, reverse, numsort;
  char *progname;

  progname = argv[0];

  // run through the input commands looking
  // for switches

  while((argc > 1) && (argv[1][0] == ’-’)) {

    // argv[1][1] is the actual option

    switch (argv[1][1]) {

    case ’r’:
        printf("Switch is %c\n", argv[1][1]);
        reverse = 1;
        break;
    case ’u’:
        printf("Switch is %c\n", argv[1][1]);
        unique = 1;
        break;
    case ’n’:
        printf("Switch is %c\n", argv[1][1]);
        numsort = 1;
        break;

    default:
      printf("Usage: bad option %s\n", argv[1]);
      break;
    }

    // decrement the number of arguments left
    // increment the argv pointer to the next argument

      argc--; argv++;
  }

  // other processing

  return(0);
}