CS 50 | Software Design and Implementation

valgrind is a terrific tool to help you locate and fix memory-related issues.

Goals

Use valgrind to track down and fix memory-related issues.

Activity

At the end of the day gdb and valgrind are tools that you can choose to use or not (though your grader will likely use both to explore the correctness of your programs so you should too!). Reading about great tools is one thing, but really learning how to use a tool comes from… using a tool! In today’s activity we use gdb and valgrind to explore and fix logic and memory bugs in code.

Valgrind: a memory management profiling tool

In this unit we explore a useful tool, valgrind, which can help you find bugs in programs that involve illegal memory access and memory leaks.

We recommend reading this excellent (and brief!) tutorial from Stanford’s CS107 class: Guide to Valgrind (some of the notes below are adapted from this guide).

Running a program under valgrind results in extensive checking of memory allocations and accesses and, when the program exits, provides a report with detailed information about the context and circumstances of each error. The output report can be quite verbose and a little difficult to use if you don’t know what you are looking for; so let’s look at a couple of examples and get a handle on how to read and interpret valgrind output.

The Goal: A clean report from valgrind that indicates “no heap memory errors and no heap memory leaks.”

When using valgrind, there are two general types of feedback you will get regarding your program’s usage of memory:

memory errors
memory leaks.

Memory errors

The really obvious and bad memory errors will crash your program outright (e.g., accessing memory that is outside of your program’s allocated memory). The not-so-obvious memory-related errors may “get lucky” most of the time (i.e., touch valid memory), but every once in a while the luck runs out and your program, somewhat mysteriously, fails. Running valgrind on your program can give you insightful information on these sorts of errors.

When an error is detected by valgrind you should see some output that includes some sort of error description, the offending line of source code, and a little bit of information about the actual memory and what may be going wrong. You may see a few types of memory errors, such as:

Invalid read/write of size X
Use of uninitialized value or Conditional jump or move depends on unitialized value(s)
Source and destination overlap in memcpy()
Invalid free()

Memory “leaks”

When you allocate memory (e.g., malloc) but fail to properly free that memory when it is no longer needed, this is called a memory leak.

As we’ve seen in class, memory leaks in small, short-lived programs that exit fairly quickly don’t cause any noticable issues. In larger projects that operate on lots of data and/or those that are intended to run for a long time (e.g., web servers), memory leaks can add up quickly and cause your program to become incredibly slow, or fail.

Valgrind allows you to check your programs for memory leaks; to get the best feedback you’ll want to specify some additional flags, here for an example program called prog:

$ valgrind --leak-check=full --show-leak-kinds=all ./prog [ARGS]

For convenience, we’ve defined a nice myvalgrind alias in the CS50 bash_profile file for just this reason.

$ alias myvalgrind

Thus, you can simply run:

$ myvalgrind ./prog [ARGS]

Note that bash aliases are not available within a Makefile or bash script, so myvalgrind will not be recognized in either context. There is a similar approach in each case, however. See below about valgrind in scripts

The easiest way to determine if there is some sort of memory leak is to check the alloc/free counts generated in the valgrind output. Ideally, the counts should match. If they don’t, you’ll get a “LEAK SUMMARY” at the end of the report as well as a little bit of information from valgrind regarding each of the detected memory leaks (e.g., how many bytes were leaked, where in the code the allocation happened).

When profiling your program, valgrind will attempt to categorize any memory leaks into one of four categories: (1) definitely lost: a chunk of memory allocated from the heap but not properly freed, and there is no longer a pointer to the data; (2) indirectly lost: a chunk that was indirectly lost due to “losing” a pointer that provided access to other heap-allocated memory; (3) possibly lost: a chunk not properly freed, but valgrind can’t determine whether or not there is a pointer to the memory; (4) still reachable: a chunk not properly freed, but the program still retains a pointer to the memory in some way. Regardless of the category, these are all considered memory leaks and should be fixed!

Valgrind demo: sorter2.c

Consider an example program sorter2.c.

/* 
 * sorter2.c - sort the lines from stdin
 *   (derived from sorter0.c)
 *   (array of pointers; use of readlinep)
 *
 * usage: sorter < infile
 * stdin: lines of text
 * stdout: numbered lines of text, in original order.
 * 
 * David Kotz, April 2016, 2017, 2019, 2021
 * CS 50, Fall 2022
 */

#include <stdio.h>
#include <stdlib.h>
#include "readlinep.h"

/* ******************* main ************************************** */
int main()
{
  const int maxLines = 100;   // maximum number of lines
  char* lines[maxLines];      // array of lines, each a pointer to string
  int n = 0;                  // number of lines read

  // read the list of lines
  for (n = 0; n < maxLines && !feof(stdin); ) {
    char* line = readlinep();
    if (line != NULL) {
      lines[n] = line;
      n++; // only increment if no error
    }
  }

  printf("%d lines\n", n);
  // print the list of lines, and free as we go
  for (int i = 0; i < n; i++) {
    printf("%d: %s\n", i, lines[i]);
    free(lines[i]);
  }

  exit(0);
}

A trivial change to that program will create a memory leak: comment out the line that free’s the memory.

  // print the list of lines, and free as we go
  for (int i = 0; i < n; i++) {
    printf("%d: %s\n", i, lines[i]);
    // free(lines[i]);
  }

Call that sorter2b.c. Here is a snippet of the output when running valgrind on sorter2b:

$ cp sorter2.c sorter2b.c
$ vi sorter2b.c  ## comment out the 'free' line
$ mygcc sorter2b.c readlinep.c -o sorter
$ valgrind ./sorter < beatles 
==9927== Memcheck, a memory error detector
==9927== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9927== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==9927== Command: ./sorter
==9927== 
4 lines
0: John
1: Ringo
2: Paul
3: George
==9927== 
==9927== HEAP SUMMARY:
==9927==     in use at exit: 324 bytes in 4 blocks
==9927==   total heap usage: 7 allocs, 3 frees, 9,621 bytes allocated
==9927== 
==9927== LEAK SUMMARY:
==9927==    definitely lost: 0 bytes in 0 blocks
==9927==    indirectly lost: 0 bytes in 0 blocks
==9927==      possibly lost: 0 bytes in 0 blocks
==9927==    still reachable: 324 bytes in 4 blocks
==9927==         suppressed: 0 bytes in 0 blocks
==9927== Rerun with --leak-check=full to see details of leaked memory
==9927== 
==9927== For counts of detected and suppressed errors, rerun with: -v
==9927== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
$

Valgrind reports 4 blocks lost - those are the four names we forgot to free. They are still reachable because their pointers are still in the lines[] array. For a clean run of valgrind, as expected in all your CS50 labs, the program would need to actually free every block of memory it allocates.

Valgrind demo: sorter7.c

Let’s now look at a more complex program, sorter7.c. Here is the output when running valgrind on sorter7:

$ $ mygcc -o sorter sorter7.c readlinep.c
$ myvalgrind ./sorter < beatles 
==27528== Memcheck, a memory error detector
==27528== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==27528== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==27528== Command: ./sorter
==27528== 
4 lines:
George
John
Paul
Ringo
==27528== 
==27528== HEAP SUMMARY:
==27528==     in use at exit: 0 bytes in 0 blocks
==27528==   total heap usage: 11 allocs, 11 frees, 9,685 bytes allocated
==27528== 
==27528== All heap blocks were freed -- no leaks are possible
==27528== 
==27528== For counts of detected and suppressed errors, rerun with: -v
==27528== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

All clean!

But let’s induce a bug by pretending we forgot to free the contents of each node when deleting list nodes:

  void listnode_delete(struct listnode* nodep)
  {
    if (nodep != NULL) {
-     if (nodep->line != NULL) {
-       free(nodep->line);
-     }
      free(nodep);
    }
  }

Call this sorter7b.c. This time, four blocks are lost (one for each line of input):

$ cp sorter7.c sorter7b.c
$ vi sorter7b.c  # comment out lines 151 to 153
$ mygcc sorter7b.c readlinep.c -o sorter
$ cat beatles | valgrind ./sorter 
==10193== Memcheck, a memory error detector
==10193== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==10193== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==10193== Command: ./sorter
==10193== 
4 lines:
George
John
Paul
Ringo
==10193== 
==10193== HEAP SUMMARY:
==10193==     in use at exit: 324 bytes in 4 blocks
==10193==   total heap usage: 11 allocs, 7 frees, 5,589 bytes allocated
==10193== 
==10193== LEAK SUMMARY:
==10193==    definitely lost: 324 bytes in 4 blocks
==10193==    indirectly lost: 0 bytes in 0 blocks
==10193==      possibly lost: 0 bytes in 0 blocks
==10193==    still reachable: 0 bytes in 0 blocks
==10193==         suppressed: 0 bytes in 0 blocks
==10193== Rerun with --leak-check=full to see details of leaked memory
==10193== 
==10193== For counts of detected and suppressed errors, rerun with: -v
==10193== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

They are not even reachable, they are “definitely lost” – because the pointers were to strings created by readLinep, then stored in list nodes… but then when the listnodes were freed, we lost the pointers to the lines.

The output from valgrind can sometimes be extensive; typically, it is best to start with the first error, try to resolve it, then test again; sometimes a single mistake can result in systematic effects that show up as repeated errors in valgrind.

Valgrind in scripts

As noted above, myvalgrind is a bash alias and thus not available within a Makefile or bash script. There is a similar approach in each case, however.

In a bash script, you can define a bash variable with the same contents as our alias, and then substitute that variable wherever you want to run valgrind:

myvalgrind='valgrind --leak-check=full --show-leak-kinds=all'
...
$myvalgrind ./program1 args...
$myvalgrind ./program2 args...

In a Makefile, you can define a Make macro (variable) with the same contents as our alias, and then substitute that variable wherever you want to run valgrind:

VALGRIND = valgrind --leak-check=full --show-leak-kinds=all
...
memtest: program
    $(VALGRIND) ./program args...