CS 50 Software Design and Implementation
Lecture 9
Pointers and Dynamic Memory Allocation
In this lecture, we carr on our introduction to the C language.
Goals
We plan to learn the following from today’s lecture:
- Pesky pointers continued
- Dynamic memory allocation
- Useful header file and macros for TinySearch Engine
- Useful example of dynamic allocation of crawler data structures
Pointer Arithmetic on Structs
When you incremet and decrement a pointer it adjust based on the data type pointed to by the pointer. In
the previous examples the data type pointed to was a char - which is a byte (8 bits) in terms of strorage.
So p++ (where char *p) increments by obe byte of 8 bits. But what happens if the pointer p points to a
structure?
C code: pointers.c
The contents of pointers.c looks like this:
/*
File: pointers.c
Description: Creates an array of struct types. Initialises and computes
on an array of structs. The code is written to study the address of
pointers and to look at what happens when pointers are incremented
and decremented.
Input: None
Output: Prints various contents of pointers to the display
*/
#include <stdio.h>
#define NUMPEOPLE 100
// Note the struct array in a little different from one defined in
// Lecture 6. Here the array definitions for name and address are
// pointers to characters. Note, we use typedef to define the structure
// type. Saves use using the name struct when defining instances of
// this type.
typedef struct _person {
char *name;
char *addr;
int age;
} person;
// How big is person struct?
person people[NUMPEOPLE];
person *p;
int age;
char *myname = "Andrew T. Campbell";
char *myadr = "People’s Republic of Norwich";
int main() {
// intiatlise p to the address of the people array
// of struct person (which is the tag)
p = people;
int i;
// how much storage is needed for a pointer?
// how much storage is needed for a struct of person?
printf("pointer needs %ld bytes, struct people 0x%x (HEX) bytes\n", sizeof(p), (unsigned int)sizeof(person));
// how much storage for the array of people?
// what is the address in memory of the p and people variables?
printf("The address of p is %p, it’s contes are (i.e., it points to) %p\n",(void *)&p,(void *)p);
printf("The address of people is %p\n", (void *)people);
// increment to next person in the table
p++;
printf("after incrementing p its value is %p\n", (void *)p);
// decrement to the previous person
p--;
printf("after decrementing p its value is %p\n", (void *)p);
// Let’s initialise our array
for (i=0; i < NUMPEOPLE; i++) {
people[i].age = 21;
people[i].name = myname;
people[i].addr = myadr;
}
// Let’s reset p (even though it points to people already)
p = people;
// Let’s compute the total age for people.
// NOTE, that we use the pointer p and incremet p to move through
// the array of structs. Importantly we use the "->" symbol to
// index elements in the struct.
for (i=0; i < NUMPEOPLE; i++) {
age += p->age; // note the -> not . is used
p++;
}
p = people;
printf("%s is %d years old (again) and lives in %s\n", p->name, p->age, p->addr);
printf("The accumulative age of all people is %d years\n", age);
return 0;
}
If you compile and run pointers then you get the following. Look closely at the pointer values and the
address of the people array of structs and the various sizes of data types including a pointer and the size of
the person struct.
[atc@dhcp-210-161 l8] ./pointers
pointer needs 4 bytes, struct people 0xc (HEX) bytes
The address of p is 0x2070, it’s contes are (i.e., it points to) 0x2080
The address of people is 0x2080
after incrementing p its value is 0x208c
after decrementing p its value is 0x2080
Andrew T. Campbell is 21 years old (again) and lives in People’s Republic of Norwich
The accumulative age of all people is 2100 years
Sorting an array of values
A frequently required operation is to sort an array of, say, integers or characters. The standard C library
provides a generic function named qsort() to help with this, but we must write a pointer-based function to
perform the comparison of the arrays elements:
#include <stdlib.h>
#define N 100
int compare(const int *ip, const int *jp) {
return(*i - *j);
}
int main(int argc, char *argv[]) {
int i;
int values[N];
srandom( getpid() );
for(i=0 ; i<N ; i++)
values[i] = random();
qsort((void *)values, (size_t)N,sizeof(values[0]), compare);
....
return(0);
}
Dynamic memory allocation
The function malloc() returns a requested number of bytes from the operating systems heap. If insufficient
memory is available malloc returns NULL. When we are finished using the space returned by malloc(), our
program should be returned to the heap with a call to free(). If a process continues to malloc() memory
and fails to deallocate it using free(), the process will quickly run out of memory and terminate
ungracefully.
The dynamic memory management functions are supported by the standard C library and their detailed
function protypes are found in stdlib.h. Memor allocation functions allow programmers to dynamically
allocate memory from the heap for variables, arrays, and structures; rather that statically allocating space
on the stack. Many times it is not possible to know in advance of run time of a program what the memory
demands for the program are. People that use static allocation somewhat have to second guess what the
worst case memory needs are and statically allocation that at compile time. This is not a good
programming principle. Better to make your program general and capable of dealing with a
wide variety of demands - the message, but the smarts into the program and don’t second
guess.
The following memory management prototypes are supported:
- calloc() which dynamically allocates memory - “c” stands for contiguous allocation. calloc
is typically used to allocate contiguous space for arrays. The memory is with all bits set to
zero/ char arrays to NULL.
- malloc() which dynamically allocates memory - “m” stands for memory. Similar to calloc
but more general purpose allocation function. Does not initialize memory (user has to do that
if necessary).
- realloc() which dynamically reallocates allocates memory - “r” stands for reallocate memory.
This function allows the programmer to change the size of previously allocated memory to a
new size.
- free() which deallocates memory releasing the block of memory back to the heap. If memory
is continuously allocated and not released two things may happen: memory leak or the
requesting process could be killed. Take care to free memory.
Note:Unlike Java, C has no garbage collection of heap objects, and so programs must be very careful
about deallocating memory that is no longer required.
The memory management functions use “pointer to void” (void *) which allows the programmer to assign
the returned pointer to memory to any struct, array, variable without casting. This is a very good example
of the use of pointer to void.
Consider the following example which allocates space for a new copy of a given string. This is very similar
to the standard function named strdup():
// malloc returns a pointer to void for the number of bytes requested
void *malloc(unsigned int nbytes);
char *newstr(const char *s) {
char *p;
if( (p=malloc(strlen(s)+1)) == NULL ) {
fprintf(stderr,"out of memory!\n");
exit(1);
}
strcpy(p,s);
return(p);
}
malloc() is also frequently used to allocate memory for structures.
#define NEW(t) malloc(sizeof(t))
struct l {
char *line;
struct l *next;
};
struct l *hd = malloc( sizeof(struct l) ), *p;
fgets(buf, MAX, fp);
while( !feof(fp) ) {
p =NEW(struct l);
p->line = newstr(buf);
p->next = hd;
hd = p;
fgets(buf,MAX,fp);
}
The program below uses malloc to dynamically create an array of integers based on the user input n. The
program allocates the memory, fills it with random integers scaled between 18 and -9, displays the array
and frees the memory. It does this forever.
C code: memory.c
The contents of memory.c looks like this:
Compilie the code and run it.
/*
File: memory.c
Description: Uses malloc to dynamically create an array of n ints. The
user enters n. The program allocates the memory, fills it with random integers
scaled between 18 and -9, displays the array and frees the memory. It does
this forever
Input: User enters size of the array.
Output: Displays the contents of the full array and sums up all the elements.
This problem is adapted from A Book on C (Kelly, Pohl) pg. 261.
*/
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
#include<string.h>
// local header file for memory with various useful macros.
// We use MALLOC_CHECK(), BZERO(), MY_ASSERT(), LOG()
#include "header.h"
// prototypes
void initArray(int *, int );
int sumArray(int *, int );
void displayArray(int *, int );
int main (void) {
int *ip, n;
srand(time(NULL));
printf("\n%s\n",
"This program does the following repeatedly:\n"
"\n"
" 1) Create space for an array of size n\n"
" 2) Fill the array with randomly distributed digits\n"
" 3) Prints the array and the sum of its elements\n"
" 4) Frees the allocated memory\n");
for ( ; ; ) { // do forever
printf("Input the size of array you want created. n: ");
if (scanf("%d", &n) !=1 || n < 1) {
LOG("Usage: n has to be > 0\n\n");
continue; // try again
}
// User enters size of array. We dynamically allocate memory using the
// MALLOC_CHECK and display the array. Note, that malloc() allocate
// unused space for an object whose size in bytes is specified by n
// and whose value is unspecified. Because of that it is a good idea
//to zeroise the returned memory.
putchar(’\n’);
// allocate memory - n bytes; that is n*sizeof(int) bytes
ip = malloc(n*sizeof(int));
MALLOC_CHECK(ip);
// display malloc’d array, recall memory not specified could be any value
displayArray(ip, n);
// display BZERO’d array
BZERO(ip, n*sizeof(int));
displayArray(ip, n);
// display initialed array and sum
initArray(ip, n);
displayArray(ip, n);
printf("Sum = %d\n\n", sumArray(ip, n));
// don’t forget to free the memory or we will have memory leaks
free(ip);
}
printf("\nLater!\n\n");
return 0;
}
void initArray(int *a, int n) {
int i;
for (i = 0; i < n; i++)
a[i] = rand()%19 - 9; // scales between 18 and - 9
}
int sumArray(int *a, int n) {
int i, sum =0;
for (i = 0; i < n; i++)
sum += a[i];
return(sum);
}
void displayArray(int *a, int n) {
int i;
printf("a = [");
for (i = 0; i < n; i++)
printf("%d%s", a[i], ((i < n - 1) ? ", " : "]\n"));
}
Use macros and header files
Here is the header file header.h. It defines a number of useful macros that you can use in the TinySearch
Engine project.
C code: header.h
The contents of header.c looks like this:
// Filename: header.h
//
// Description: Some utilites for the TinySearchEngine engine project - MACROs for
// general memory allocation and initialization and some
// program exceptions processing
//
// Note, the header check below makes sure you do not include a header file twice. Use it.
#ifndef _HEADER_H_
#define _HEADER_H_
#define min(x,y) ((x)>(y))?(y):(x)
// Print s together with the source file name and the current line number.
#define LOG(s) printf("[%s:%d]%s\n", __FILE__, __LINE__, s)
// malloc a new data structure t
#define NEW(t) malloc(sizeof(t))
// Check whether s is NULL or not. Quit this program if it is NULL.
#define MYASSERT(s) if (!(s)) { \
printf("General Assert Error at %s:line%d\n", __FILE__, __LINE__); \
exit(-1); \
}
// Check whether s is NULL or not on a memory allocation. Quit this program if it is NULL.
#define MALLOC_CHECK(s) if ((s) == NULL) { \
printf("No enough memory at %s:line%d ", __FILE__, __LINE__); \
perror(":"); \
exit(-1); \
}
// Set memory space starts at pointer \a n of size \a m to zero.
#define BZERO(n,m) memset(n, 0, m)
#endif
Examples of mallocing crawler data structures
We have discuss pointers and we have looked at simple allocation of arrays using dynamic
memory. Note that malloc (which we will use through the course) returns a “pointer to void”
(void *) allowing the returning pointer to be “cast” to any data structure. In what follows,
we use sample code that is relevant to the crawler and allocate memory for some important
data structures, including, the dictionary, DNODE and URLNODE. Don’t worry that you do
not know the meaning of these structures at this point. You will become very familiar with
them.
So let’s write some code to dynamically create the dictionary. After that we will allocate a URLNODE
and DNODE and link them into the structure. This is similar to what you need to do in the real crawler
code - so take a close look.
C code: dictionary-example.c
The contents of header.c looks like this:
// Filename: dictionary-example.c
// Descriptions: The main goal of this code is to show malloc in use - dynamically allocating memory
// to dat structures.
//
// Some example code that shows how malloc and return from void/ casting pointers
// can be used. The code below is drawn from the crawler implementation.
// While it is imcomplete code it illustrates how important data structures
// such as the DICTIONARY, DNODE and URLNODE dynamically created using malloc.
//
// A note on the dictionary:
// A dictionary provides general purpose data structure based on a hash table
// where data elements (in this case URLNODEs) are assigned with a key and stored in DNODE.
// The full blown dictionary functions (not presented below) provide the code for
// adding/removing/searching data elements (URLNODEs) inside the dictionary
// at super fast speed.
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
//#include "dictionary.h"
#include "header.h"
// Unlikely to have more than an a URL longer that 1000 chars.
// The KEY is the same as a URL. The term KEY is just a
// general Dictionary/hash function term
#define KEY_LENGTH 1000
// Maximum number of hash slot in the hash table within dictionary
#define MAX_HASH_SLOT 10000
// These structures are defined in the header file dictionary.h but I put them here for ease of
// discussion.
typedef struct _DNODE {
struct _DNODE* next;
struct _DNODE* prev;
void* data; // actual data, in this case pointer to a URLNODE
char key[KEY_LENGTH]; // the actual key, in this case the URL
} __DNODE;
typedef struct _DNODE DNODE;
// The DICTIONARY holds the hash table and the start and end pointers into a double
// link list. This is a unordered list with the exception that DNODES with the same key (URL)
// are clusters along the list. So you hash into the list. Check for uniqueness of the URL.
// If not found add to the end of the cluster associated wit the same URL. You will have
// to write an addElement function.
typedef struct _DICTIONARY {
DNODE* hash[MAX_HASH_SLOT]; // the hash table of slots, each slot points to a DNODE
DNODE* start; // start of double link list of DNODES terminated by NULL pointer
DNODE* end; // points to the last DNODE on this list
} __DICTIONARY;
typedef struct _DICTIONARY DICTIONARY;
// The max length of each URL path.
#define MAX_URL_LENGTH 1000
// store the information of each URL.
typedef struct _URL{
char url[MAX_URL_LENGTH]; //!< e.g., www.cs.dartmouth.edu
int depth; //!< depth associated with this URL.
int visited; //!< crawled or not, marked true(1), otherwise false(0)
} __URL;
typedef struct _URL URLNODE;
// These dummy routine just return a fixed value to keep the main code happy
unsigned long hash=20;
int make_hash(char *c)
{
return(hash);
}
int main(void){
// Set up a dummy URL and depth
char *url="http://www.cs.dartmouth.edu/~campbell/";
int depth = 1;
// Initialise the dictionary data structure
DICTIONARY *dict = (DICTIONARY *)malloc(sizeof(DICTIONARY));
MALLOC_CHECK(dict);
BZERO(dict, sizeof(DICTIONARY));
// Set up a URLNODE
URLNODE *n = (URLNODE *)malloc(sizeof(URLNODE));
MALLOC_CHECK(n);
n->depth = depth;
n->visited = 0;
BZERO(n->url, sizeof(n->url));
strncpy(n->url, url, strlen(url));
// At this point we would call a function that would add the URLNODE to
// the dictionary if it did not already exist; for example, DAdd(dict, n, url);
// Here is a little of the code in that function. It is of course incomplete.
int h = make_hash(url);
if (dict->start == NULL) {
MYASSERT(dict->end == NULL);
DNODE *d = (DNODE *)malloc(sizeof(DNODE));
MALLOC_CHECK(d);
BZERO(d, sizeof(DNODE));
dict->hash[h] = d;
d->next = d->prev = NULL;
d->data = n;
// you would need to copy the key over here
dict->start = dict->end = d;
printf("The DICTIONARY structure dict:\n");
printf("dict %p\n",(void *)dict);
printf("hash %lu\n", hash);
printf("dict->hash[hash] %p\n",(void *)dict->hash[hash]);
printf("dict->start %p\n",(void *)dict->start);
printf("dict->end %p\n",(void *)dict->end);
printf("The DNODE structure d:\n");
printf("d %p\n",(void *)d);
printf("d->next %p\n",(void *)d->next);
printf("d->prev %p\n",(void *)d->prev);
printf("d->data %p\n",(void *)d->data);
printf("d->key is not initialized in this example");
printf("The URLNODE structure:\n");
printf("n %p\n",(void *)n);
printf("n->url %s\n",n->url);
printf("n->visited %d\n",n->depth);
printf("n->visited %d\n",n->visited);
} else {
// need to check if the DNODE already exists
// If it does not exist then initial DNODE and
// link in URLNODE
}
return 0;
}