CS 50 Software Design and Implementation

Lecture 8

C Functions for CS50 Hackers

In this lecture, we discuss a number of C functions that you need to know to program TinySeacrchEngine. Maybe you don’t need all of them but many of them will be helpful. Consider learning these functions as adding new vocabulary to you c language skills.

We group functions under: file access and input/output, file positioning, string functions, character conversion and classification, print formatted output and conversion of formatted input. In each case, we give a brief description of the function and its return values. If you click on the name of the function you will see example code. We have taken most of these examples from www.cplusplus.com . Please look a the cplusplus site for more detailed description of the functions or the course book.

The TinySearch Engine will require many of the string, file and IO functions to write clean modular code. Here’s what to do: take a couple of hours to study the short code examples; first read the code then run the example and look at the output. Did you fully understand the function? If so goto the next example else go back and examine the code.

We are about to go rapid fire through a bunch of important voCabulary. If you want to know the nitty gritty definition of any of these indiviudal libray fuctions then use the man pages – for example, type “man memcpy” at the mighty command line and read.

Functions

There are many many functions in the C libraries but these will help you coverse; the functions are grouped under:

First let’s discuss some streams and files, and standard input, output and error.

Streams and Files

C supports reading and writing to streams. A stream is an IO abstraction represented by files (both text and binary) or some other source or consumer of data such as the physical terminal (stdout), keyboard (stdin) or error (stderr). We will mostly consider reading and writing formatted data to text files in the TinySearch Engine set of labs.

As we discussed in Lecture 7, when we interact with streams or files we use a pointer to a FILE data structure (FILE *fp). Here fp points to a FILE data structure that holds the various control information associated with the stream; for example, current position within the stream, indication that end of file (EOF) has occurred. The EOF in C/Linux is control^d on your keyboard; that is, you hold down the control key and hit d. The ascii value for EOF (CTRL-D) is 0x05 as shown in this ascii table . Typically a text file will have text and a bunch of whitespaces (e.g., blanks, tabs, spaces, newline characters) and terminate with an EOF. Don’t confuses this with a memory string which is always terminated with the NUL characters which is ascii 0X00 in this ascii table .

When we open a file using fopen() the returned file pointer is used as the ID in all other stream processing functions – many discussed in this lecture below.

Standard input, output and error

Three file pointers exist to three streams when you run a c program that include the <stdio.h> header file:

These three special cases of file pointers (values: 0, 1 and 2) can be redirected using  >, < or piped.

File access and input/output

In many of the code snippet examples that follow we use the file myfile.txt to read and write to. From the shell you can type “od -t c myfile.txt” to see the ascii representation of the text and any whitespaces (including new lines \n).

Open file.
FILE *fopen(const char *filename, const char *mode )
Returns: If the file is successfully opened, the function returns a pointer to a FILE object that can be used to identify the stream on future operations. Otherwise, a null pointer is returned. On most library implementations, the errno variable is also set to a system-specific error code on failure.

The following modes are associated with opening a file:

Print error message.
void perror(const char *str)
Interprets the value of errno as an error message, and prints it to stderr (the standard error output stream, usually the console), optionally preceding it with the custom message specified in str. Print error message.
Returns: none.
See for a list of errno values

Check end-of-file indicator.
int feof(FILE *stream )
Returns: A non-zero value is returned in the case that the end-of-file indicator associated with the stream is set. Otherwise, zero is returned.

Close file.
int fclose(FILE *stream )
Returns: If the stream is successfully closed, a zero value is returned. On failure, EOF is returned.

Read block of data from stream.
size_t fread(void *ptr, size_t size, size_t count, FILE *stream )
Reads an array of count elements, each one with a size of size bytes, from the stream and stores them in the block of memory specified by ptr.
Returns: The total number of elements successfully read is returned.If this number differs from the count parameter, either a reading error occurred or the end-of-file was reached while reading. In both cases, the proper indicator is set, which can be checked with ferror and feof, respectively.

Hint: fread.c is very helpful for some techniques needed crawler.c.

Write block of data to stream.
size_t fwrite(const void * ptr, size_t size, size_t count, FILE *stream)
Returns: Writes an array of count elements, each one with a size of size bytes, from the block of memory pointed by ptr to the current position in the stream.

Write string to stream
int fputs (const char *str, FILE * stream )
The function begins copying from the address specified (str) until it reaches the terminating null character (’\0’). This terminating null-character is not copied to the stream.
Returns: On success, a non-negative value is returned. On error, the function returns EOF and sets the error indicator (ferror).

Get string from stream
char * fgets ( char * str, int num, FILE * stream )
Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first.
Returns: On success, the function returns str. If the end-of-file is encountered while attempting to read a character, the eof indicator is set (feof).

Write character to stream.
int fputc(int character, FILE * stream )
Write character to stream.
Returns: On success, the character written is returned. If a writing error occurs, EOF is returned and the error indicator (ferror) is set

Get character from stream.
int fgetc(FILE * stream )
Returns the character currently pointed by the internal file position indicator of the specified stream. The internal file position indicator is then advanced to the next character. On success, the character read is returned (promoted to an int value).
Returns: The return type is int to accommodate for the special value EOF, which indicates failure: If the position indicator was at the end-of-file, the function returns EOF and sets the eof indicator (feof) of stream.

File positioning

There are a number of useful file position functions. For example, crawler requires you to work out the length of a file. The functions below combines can do just that.

Reposition stream position indicator.
int fseek(FILE * stream, long int offset, int origin )
Returns: If successful, the function returns zero.

Get current position in stream.
long int ftell(FILE * stream )
Returns: the current value of the position indicator of the stream.

Set position of stream to the beginning.
void rewind(FILE * stream).
Returns: void.

Note, in the rewind.c example we show how you read a characters in from a file in to a character array (i.e., a memory string) and then terminate the memory string with a NUL character which is ‘\0‘ is 0. BTW, we use NUL (one L) to refer to terminating a string, and NULL for pointers where NULL refers to a NULL pointer. See the ascii table to understand the ascii numbers that represent characters. Also, see the discussion of the NULL character and its meaning.

String functions

We will be manipulating strings while programming the TinySearch Engine. A memory string as you recall is a sequence of characters (including whitespaces) terminated by a the NUL (\0) character which is ascii 0X00 in this ascii table .

In what follows, we discuss a set of string handling functions that are very useful for dealing with strings. Each example includes a simple code snippet.

Copy string.
char *strcpy(char *s1, const char *s2)
Copies the string s2 into the character array s1.
Returns: The value of s1 is returned.

Copy characters from string.
char *strncpy(char *s1, const char *s2, size_t n)
Copies at most n characters of the string s2 into the character array s1.
Returns: The value of s1 is returned.

Concatenate strings.
char *strcat(char *s1, const char *s2)
Appends the string s2 to the end of character array s1. The first character from s2 overwrites the ’\0’ of s1.
Returns: The value of s1 is returned.

Append characters from string.
char *strncat(char *s1, const char *s2, size_t n)
Appends at most n characters of the string s2 to the end of character array s1. The first character from s2 overwrites the ’\0’ of s1.
Returns: The value of s1 is returned.

Locate first occurrence of character in string.
char *strchr(const char *s,  int c)
Returns: a pointer to the first instance of c in s. Returns a NULL pointer if c is not encountered in the string.

Locate last occurrence of character in string.
char *strrchr(const char *s,  int c)
Returns: returns a pointer to the last instance of c in s. Returns a NULL pointer if c is not encountered in the string.

Compare two strings.
int strcmp(const char *s1, const char *s2)
Compares the string s1 to the string s2.
Returns: The function returns 0 if they are the same, a number < 0 if s1 < s2, a number > 0 if s1 > s2.

Compare characters of two strings.
int strncmp(const char *s1, const char *s2, size_t n)
Compares up to n characters of the string s1 to the string s2.
Returns: The function returns 0 if they are the same, a number < 0 ifs1 < s2, a number > 0 if s1 > s2.

Get string length.
size_t strlen(const char *s)
Determines the length of the string s.
Returns: the number of characters in the string before the \0.

Locate characters in string
char *strpbrk(const char *s1,  const char *s2)
Returns: a pointer to the first instance in s1 of any character found in s2. Returns a NULL pointer if no characters from s2 are encountered in s1.
BTW, strpbrk means string pointer break - clear as mud ;-)

Locate a substring.
char *strstr(const char *s1,  const char *s2)
Returns: a pointer to the first instance of string s2 in s1. Returns a NULL pointer if s2 is not encountered in s1.

Character conversion and classification

Convert string to integer.
int atoi (const char * str)
Parses the C-string str interpreting its content as an integral number, which is returned as a value of type int
Returns: On success, the function returns the converted integral number as an int value.

If you want to covert integer to string you could use itoa() but it is a non-standard function (i.e., not part of the ANSI-C standard). Better to use sprintf() as discussed below.

Check if character is a white-space.
int isspace ( int c )
Checks whether c is a white-space character.
Whitespaces include space, tab, newline, CF/LF.
Returns: A value different from zero (i.e., true) if indeed c is a white-space character. Zero (i.e., false) otherwise.

Check if character is alphabetic.
int isalpha(int c )
Returns: A value different from zero (i.e., true) if indeed c is an alphabetic letter. Zero (i.e., false) otherwise.

Checks whether c is either a decimal digit or an uppercase or lowercase letter.
int isalnum(int c )
Returns: A value different from zero (i.e., true) if indeed c is either a digit or a letter. Zero (i.e., false) otherwise.

Convert uppercase letter to lowercase.
int tolower(int c)
Returns: The lowercase equivalent to c, if such value exists, or c (unchanged) otherwise. The value is returned as an int value that can be implicitly casted to char.

Convert lowercase letter to uppercase.
iint toupper(int c)
Converts c to its uppercase equivalent if c is a lowercase letter and has an uppercase equivalent. If no such conversion is possible, the value returned is c unchanged.
Returns: The uppercase equivalent to c, if such value exists, or c (unchanged) otherwise. The value is returned as an int value that can be implicitly casted to char.

Print formatted output

Many string operations use embedded format specifiers for reading or writting text to a stream. Format specifiers are replaced by the values specified in subsequent additional arguments.

For example:

fprintf (pFile, ‘‘Name %d [%-10.10s]\n’’,n+1,name);

Two format tags are used:
 %d : Signed decimal integer
 %-10.10s : left-justified (-), minimum of ten characters (10), 
 maximum of ten characters (.10), string (s).

Gives:         1234567890
Name 1 [John      ]

For details see: %[flags][width][.precision][length]specifier

Write formatted data to stream.
int fprintf(FILE * stream, const char * format, ... )
Returns: On success, the total number of characters written is returned.

Write formatted data to string
int sprintf(char * str, const char * format, ... )
Composes a string with the same text that would be printed if format was used on printf, but instead of being printed, the content is stored as a C string in the buffer pointed by str.
Returns: On success, the total number of characters written is returned. This count does not include the additional null-character automatically appended at the end of the string.

Functions to convert formatted input

Read formatted data from stdin.
int scanf(const char * format, ... )
Returns: On success, the function returns the number of items of the argument list successfully filled. This count can match the expected number of items or be less (even zero) due to a matching failure, a reading error, or the reach of the end-of-file.

Read formatted data from string.
int sscanf(const char * s, const char * format, ...)
Returns: On success, the function returns the number of items in the argument list successfully filled. This count can match the expected number of items or be less (even zero) in the case of a matching failure.

Read formatted data from stream.
int fscanf(FILE * stream, const char * format, ... )
Returns: On success, the function returns the number of items of the argument list successfully filled. This count can match the expected number of items or be less (even zero) due to a matching failure, a reading error, or the reach of the end-of-file.

Functions to set and copy memory

Fill a byte string with a byte value.
void *memset(void *b, int c, size_t len);
Returns: The memset() function writes len bytes of value c (converted to an unsigned char) to the string b. The memset() function returns its first argument.

Copy a block of memory.
void *memcpy(void *restrict dst, const void *restrict src, size_t n);
Returns: The memcpy() function copies n bytes from memory area src to memory area dst.The memcpy() function returns the original value of dst.

Miscellaneous

Cool emacs shortcuts.

see ascii table .

the discussion of the NULL character and its meaning.