In this lecture, we carrying on our introduction to the C langauge.
We plan to learn the following from today’s lecture:
The C language itself does not define any particular file or character-based input or output routines (nor any windowing routines) unlike Java. Instead any program may provide its own. Clearly this is a daunting task, and so the standard C library provides a collection of functions to perform file-based input and output. The standard I/O library functions provide efficient, buffered I/O to and from both terminals and files.
C programs requiring standard I/O should include the line:
All transactions through the standard I/O functions require a file pointer:
Although we are strictly dealing with a C pointer, we simply pass this pointer to functions in the standard C library. Some texts will refer to this pointer as a file stream (and C++ confused this even more), but these should not be confused with nor be described as akin to Javas streams. An number of predicate macros are provided to check the status of file operations on a given file pointer:
The standard I/O functions all return NULL or 1 (as appropriate) when an error is detected. For example:
The most frequently used functions in the C standard I/O library perform output of formatted data. We also see here the most frequent use of Cs acceptance of functions receiving a variable number of arguments:
Many standard I/O functions accept a format specifier a string indicating how following arguments are to be displayed. This mechanism is in contrast to Javas toString facility in which each object knows how to output/display itself as a String object. There are many possible format specifiers, the most common ones being “c” for character values, “d” for decimal values, “f” for floating point values, and “s” for character strings. Format specifiers may be preceded by a number of format modifiers, which may further specify their data type, and to indicate the width of the required output (in characters). As a special case, we may use a more concise version of fprintf() in which the FILE pointer of the operating systems standard output device is used (typically, the screen). Thus, the following two statements are identical:
We mentioned before that the C standard I/O library provides efficient buffering. This means that although it appears that the output has gone to the FILE pointer, it may still be held within an internal character buffer in the library (and will hence not yet be on disk, or to the screen). We often need to flush our output to ensure that it is more quickly written to disk or the screen. FILE pointers are automatically flushed when a file is closed or the process exits:
As well as outputting to FILE pointers, we may also perform formatted output to a character array (a string), with a very similar series of functions:
Cs standard I/O library may also be used to input values from FILE pointers and character arrays using fscanf() and sscanf(). Because we want the contents of Cs variables to be modified by the standard I/O functions, we need to pass the address of the variables:
We also frequently need to read all lines from a file, or to (perhaps) sum all integers values from a file. We must be careful here, with the particular return values of the C standard I/O functions. The functions themselves return NULL FILE pointers, or a value of 1 at the end of a file or an error condition, but we must be care when we check these values:
Here is a code snippet that uses fopen(), assert(), fgets(), strlen(), printf(), sscanf(), fclose(). The code it self is not that interesting. It reads what is in file input It saves the string (45 , ignore˙this % C read˙in˙this**) from the file input to an array and the sscanf() into various variable. Not that because the input file has a string and a new line character \n the fgets() appends that at the end of the array. The code replaces with with a NULL or \0 character. Some formatting is shown.
C code: files.c
There is a saying - you learn from your mistakes, so make lots of them. There is another one: don’t make the same mistake twice. The use of the standard IO function gets() is a mistake. Only make the mistake of using once in running the following program.
Lets look at the following buffer-overflow.c aka really-bad-c.c:
C code: buffer-overflow.c
Apart of the use of gets() this is a nice little string manipulation program that uses a number of other C library calls that are of interest.
The contents of buffer-overflow.c looks like this:
Let’s look at the output when running the program first with gets() and then with the safer fgets().
This is a bad program! But it’s fun. The basic idea of of the program to input and manipulate strings using arrays of chars is fun. However, there is a serious flaw in the program. The book uses the function gets(). This is a seriously dangerous function call. DONT USE IT.
The program below defines a buffer of 50 chars in length. The user types in characters from the keyboard and they are written to the buffer i.e., string1 and string2.
The input parameter to gets() is the name of the array (which is a pointer - more on pointers later). The function does not know how long the array is! It is impossible to determine the length of string1 and string2 from a pointer alone.
If we run the program below and type in 50 characters, all is safe. But if we type 51 or 60 or 100, etc. chars we over run or “overflow” the buffer. We end up writing past the end of the array! BTW, you can do this easily without calling an unsafe function such as getf() so it’s an important lesson to learn or mistake to me. Bugs happen at bounderies conditions and one important boundary is the end of the array. It that is the one thing you learn from this lecture then that is a good one. If we over write string1 we might over write into string2. Recall, by convention C strings are terminated by ’/0’ (NULL) by escape 0 which is ascii 0. If this is overwritten then a piece of code operating on the array will keep on trucking until it finds a ’/0’.
If we run this code and type in more that 50 chars (as we did above) any thing can happen; for example: 1) the code could work with no visible affect of the bug; 2) immediate segfault; 3) segfault later in the code stream; 3) termination of another unrelated program or system call e.g., strcat() in our code below.
BTW, buffer overflow was the backdoor to bringing down the Internet machines by a very clever fellow. Now a Professor of Computer Science, MIT.
The reason I highlight this is the book uses gets() and promotes its use. Don’t use it. Rather, use the safe fgets(); fgets() is a buffer safe function. Its prototype is:
char *fgets(char *s, int size, FILE *stream);
Example:
fgets(buf, sizeof(buf), stdin);
The fgets() function shall read bytes from stream into the array pointed to by buf, until n-1 bytes are read, or a newline is read and transferred to buf, or an end-of-file condition is encountered. The string is then terminated with a null byte.
We replace gets() with fgets() in the above code and now we are safe.
We have learnt a few lessons here. We got insight into IO from the stdin to a char array. As the course progresses we will use dynamic allocation of memory for data structures and arrays. So this problem will be mute.
If you want to read in just characters from the screen as in the case of Lab3 prs game you can use getchar(). But remember that if you enter a character on the keyboard and then hit return the “newline” character is also in the stream. Take a look at this code and run it. The code removes any control characters from the stream such as newlines, tabs, etc – all characters considered whitespaces.
C code: getchar.c
Also, make sure you understand ascii. This little program converts a string to a number using atoi (ascii to int) and the prints various control characters and finally the ascii table (unextended).
C code: ascii.c
Operating systems, such as UNIX, LINUX, Mac-OSX, and Windows-XP, will call C programs with two parameters:
an integer argument count (argc),
an array of pointers to character strings (argv), and
Notice that in many previous examples weve provided a main() without any parameters all. Remember that C does not check the length and types of parameter lists of functions which it does not know about ones that have not been prototyped. In addition, the function main() has no special significance to the C compiler. Only the linker requires main() as the apparent starting point of any program. Most C programs you see will only have the first two parameters.
-> explain char *argv[]
The following program prints out the command line. Note that argv[0] is the program name and then follows the input items to the proogram argv[1] .. argv[N].
Lets look at the following snippet:
C code: arguments.c
The contents of arguments.c looks like this:
declare argv as array of pointer to char.
Another more interesting snippet of code shows that the comand line is store as a set of string arguments in memory and that the address of the location of the first character for each string arguments is stored in the argv[] array.
Lets look at the following snippet:
C code: command.c
The contents of command.c looks like this:
The characters (and assic inserts for NULL) for the line is as follows:
./command.hello\0cs23ready\0to\0go\0bolling!\0
NULL=\0 to terminate the argument which is a “char *” string.
A common activity at the start of a c program is to search the argument list for command-line switches commencing with a dash character. Remaining command-line parameters are often assumed to be filenames.
The program below parses the command line of a sort command. It will process:
sort -n
sort -r
sort -u
sort -r -u -n
Any variation of the above is also supported.
but not:
sort -run
An example of defensive programming: If the use enters a bad option then the user is informed with a usage message:
C code: sort.c
The contents of sort.c (not sort code is included only command line parsing)looks like this: