In the this lecture, we will carry on our discussion on the art of debugging.
The development of good code usually follows the design methodology we have discussed in class: first you understand the customers needs and captures these in the requirements of the system; then the system design starts. The design spec. and implementation spec follow capturing the data flow, IO, important data structure and algorithm design and finally the pseudo code for the system. After all that: which I typically think of as the thinking about the problem and using abstraction (data structure design), comes the coding phase. Once you have written code you need to test it. Yes, I know this is boring! But unit testing, sub-system testing, system testing and finally integration testing are a really important part of the life cycle of good software development. There is a tension to consider when testing - you want a set of tools that help you quickly find the bugs, but you don’t want to feel this is some sort of worthless tax strategy imposed by the company you work for to get code out of the door to the customer. So once you have coded your functions, units, modules, sub-system you need to test it and find the bugs - they will exist! Once you have a tested system and your system fails - maybe a bug not uncovered during testing - then you need to debug. In fact solving problems found while testing also requires a set of debugging tools and strategies.
In the next lecture we will discuss testing and in this one we discuss the tools and strategies you need to debug and make your code bug free - or your money back. The great computer scientist Edsger Dijkstra once said that testing can demonstrate the presence of bugs but not their absence. We will come back to that in the next lecture. To some degree design, testing, and debugging - in my opinion - are not so stepwise in terms of a known formula as professed by many of the software engineering books - they require some experience, nose, detective work; there is indeed an art to design, testing, and debugging.
In this lecture on debugging we draw from “Linux Programming” by Matthew and Stones in terms of the material and coding examples from their chapter on debugging. References and pointers to these books can be found on the course webpage.
Finally, at the end of the notes we reiterate our approach to tracking down tricky bugs. Make sure you read that section closely. BTW, you can skip these notes and lecture if you write bug free code ;-)
We plan to learn the following from today’s lecture:
Up until now you have probably being relying on simple debug skills: using printf. But as the number of lines in your code grows, indeed at this point, the number of files in the systems you are coding, then printf while helpful is not effective enough. For example, a printf placed in your code may work but the segment fault that occurs 50 lines after your printf stops the printf from finally writing out that “I HERE AM AT LINE 100” message to the screen. So you natrually think that the bug must be before line 100 but it’s at line 150. So printf is not your friend in this example, rather it’s a red herring.
Debugging code is an essential part of the design and implementation methodology we discussed earlier. Many problems can arise when coding. Bugs can be simple: array script error are sometime difficult to debug: the systems runs for days and then fails, say, because of a memory leak or buffer overflow problem. Programmer aim to understand the nature of the bug they are trying to swat: is it predicatble, does it always fail under the same set of conditions, and so on. These are clues that help track down those pesky bugs in complex systems. To some degree being a good deugger of code C comes with experience.
By now you are use to bus errors, segfaults, and seeing files such as core dumps in your directory when you run your program and it fails. We will use the GNU debugger (gdb) to help solve these problems. Through a set of examples we will show how to debug problems in a systematic manner. But first let’s dicuss why bugs occur and what technques other than running gdb help.
The complexity of a program is related to the number of interacting components; for example, the crawler interacts with an external and distribued wget() command. There is a line of thought that says as a rule of thumb the number of pseky bug grows with the number of interactions. Reducing the complexity and interactions enables us to focus in on the location of bugs in code.
Dubgging problems ranges from easy, moderate through down right super hard. Techniques that help reduce degugging time include:
We will discuss these techniques in this lecture. First a work on C.
The C programming language can be dangerous to code with if you are not familiar with the dangers; for example: C does not support array subscript checking so it is easy to write and read at locations beyond the end of an array that you have defined. So be ware of this when you access arrays using variables, particularly, in loops (e.g., x = array[i] where the length of the array is 20 but i = 21). Pointers can be dangerous too - very danferous. You know that already. For example, not initialising a pointer and then trying to access elements of a structure or memory pointed to by the unitialized pointer is playing fire. There are many other possible scenarios. There is no garbage collection in C. So if you malloc() without free() you will have a memory leak which might cause your program to fail or not, or fail sometimes. There are tools for determining if there are memory leaks such as valgrind that we will use.
Up until now you have been using mostly prinft() to help you debug your code. Printf can only get you so far. Many different types of errors or bugs can exist in software. For example, you may have bug free code but the performance of the system woeful. How do you find performance errors in your code - could it be the choice of data structure is too “slow” or the structure of your code? What happens if you code looks error free but you have memory leaks? Printf will not help. What if you have a “seg fault” which you have all experienced that occurs after your printf (which lets say is executed) but never gets printed out because the process seg faults, as discussed above. What happens if your system runs for hours and under a certain set of system conditions the code fails. Working your way through 1000s of printf statements may not help. When a bug is buried deep in the execution of your software you need sophisticated tools to track those down. Printf will not help much. This lecture talks about tools to help with performance issues, memory leaks and difficult bugs.
The major groups of errors found in system development are requirement spec, design spec and coding errors. In what follows, we discuss these common types of errors found in systems.
Many times errors creep into system development because of some misunderstanding between what the customer wants and what the developers deliver. These “errors” in the requirement phase are very costly and typically some form of requirement analysis may catch them but your very best programmer has little chance of discovering these types of errors in the software life-cycle. That is right even your very best programmers will write the wrong code. Requirement reviews can also help here. But it is important to try and identify what the systems requirements are asking the designers to do and if that is indeed what the customer wants.
Assuming that the Requirement Spec is good the next set of errors occur in the design phase. In terms of our methodology errors would be reflected in the Design Spec. Do not sit down and start coding once you understand the requirements; think, abstract and write the Design Spec. You need to understand the data flow, IO, data structures, functional decomposition, pseudo code and algorithms before you write a line of code. If errors creep in the design stage; for example, if the code is to design a search on a large volume of data the data structures which look functionally correct may imped performance - maybe a hash table would be the best choice. This issue relates to performance errors. Many types of design errors can occur and misunderstandings between software engineers in the design phase can be helped with writing good clear specs that people read and discuss in detail in design reviews before a line of code is written.
The main error people associate with software is coding errors. We all make coding errors. Having a solid Design Spec to work from that has gone through a rigorous review helps. Translating that spec to an Implementation Spec that we have discussed in this course is the next step. Once code is written and unit tested (we will discuss this in the next lecture) then “desk checking” your code is your best friend - more on coding inspection below. Hacking at the terminal in the hope of digging the error out should not be the next step - first read your code and convince yourself it works. Make believe that you are the computer executing the code line by line - update the data structures just like your code. Study the IO in your code as you execute it. You will find many errors this way. Even better ask another person working in the project but not familiar with your code to desk check your code. Sometimes errors can be staring you in the face and a fresh set of eyes can pick those pesky bugs out. The take home is that you do not need a computer to find bugs.
Clearly computers do help and tools can help enormously to track down difficult bugs. Using the compiler with strong flags can help by just running gcc. In this class, we have used: “gcc -Wall -pedantic -std=c99” which is a good start. Never execute code that give you warnings - fix the code.
When tracking down pesky bugs we can think of the following steps to finding them:
Many times people rush and “hack” the debug phase and sit and the terminal hoping to eventually track down that bug via trail and error. Most people do this as their first resort. You will find this approach can be successful but can be very time consuming - read take longer than other techniques. One of the most effective debug tools is you! Stop and read your code. Pretend you are a computer and execute the code with a pen and paper. Code inspection is very useful. Programmers closely trace through their code in detail. Look for boundary problems in code, many times bugs exist at the boundaries - of structures, arrays, code (e.g., for loops). Many difficult bugs require more power than just hacking and hoping. Once you have read your code and convinced yourself it works then assuming bugs lurk you need to instrument your code and start the detective work. Read on.
It is a smart idea to use conditional “#ifdef” to include debug code. This can be switched on using by defining a variable in the command line of the compiler. The “-D” option of the gcc compiler allows us to define a named value (e.g., DEBUG below). Using conditional definitions to print out important data structures or where in the code that an assert happens is very useful. In addition, a single switch such as -DDEBUG can “turn on” all the test code in a system. This is better than defining a global variable to do the same thing because in the case of conditional debug flags the test code can be turned off and is not included in the binary. Make sure you turn off the test code before the code is shipped, e.g., make debug becomes make release.
Please go over the buggy sort function. It illustratse a number of approaches to debugging buggy sort.c .
Note, in the code below it states: mygcc -g -o bug -DDEBUG buggy_sort1.c. Note that -g and -ggdb are mostly equivalent flags. There may be some differences between machines.
The cflow tool prints a function call tree showing which functions call which others. This is very helpful to see the structure of the code and to see how it operates. The tool essentially analyses the source files and builds a graph from the code.
Suppose you are given someones source code and want to get a handle on the decomposition of the code. What functions are used? What is the calling order? cflow is the pefect tool.
Consider the following code. The main function calls a function called whoami that prints out the user’s name. It is very simple. We can run GNU cflow to analyzes the source code. In fact cflow can analyze many source files all at once. It prints a graph, charting control flow within the program. cflow can produce direct and inverted flowgraphs.
Let’s consider whoami.c .
By just using the cflow command with the source file the output is direct graphs, displaying callercallee dependencies.
You can also create a reverse graph, charts calleecaller dependencies by running cflow with –reverse (-r) command line option as shown below. This time the function is listed and all the places in the source that call it.
The gprof tool is use to profile the runtime performance code. Many times you might be confident that the code is bug free because there are no functional problems. However, the code could be worthless if it does not meet its performance requirements. The gprof tool is an execution profiling tool that is useful when tracking down performance problems.
Warning. Don’t try and use gprof on your mac. It has problems with Intel archiecture. Log on to a machine in the lab to try it out.
Anecdotal evidence. Many time programmers focus on getting the code to work functionally and then think about speed up. This is not always the most productive approach to design or systems development. Best to design for speed if needed (e.g., use a hash table instead of searching a long double linked list). I recall once working as a consultant on improving the performance of a radio router. The performance of the system coded in Ada was appalling and someone’s head was about to roll. I spend probably two weeks just studying the code of a very large system - difficult to keep that all in your head. Profiling the code highlighted the cost of a system that had been desig/Users/atc/teaching/cs50/notes/s5/lab5/srcned and coded with excessive number of tasks and rendezvous. The cost of interprocess communications was high. What did I do? It was not nice. Turned the system into one large task and replaced all interprocess communications (IPC) (which represented system calls) with my library that implemented the IPC API. I changes couple 100 lines of code in a system of 20,000 lines of code. The improvement was massive. Packets could be forwarding from one radio input to the output radio in under 100 msec which was down from 1 second! I was king for the day, or week. I made my changes, tested them locally, desk checked the code closely - and, it ran first time! The guy who designed the system wanted me to fail - I could feel it. But when that router ran first time. Well that is a moment I will always remember. My reward? I got to design the next system. The problem was essentially a performance bug. The changes was simple once the problem was identified. I took a very radical approach that ran against OO design. But that is what was needed. The router forwarded packets a 1 second intervals was not going to fly with the customer. And, yes, I saved the day and it was fun.
Note to instructor: use lab4 solution in cs50/l16/lab4 as example code
The output of the profile is below:
The valgrind tool is excellent for finding a number of problems with illegal access and memory leaks.
In what follows, we look at some very buggy code called leaky.c
First we will compile the code and the run the valgrind tool on the code. No recompilation is need to run valgrind.
Note from the valgrind output below it indicates various problems in the leaky.c code and tells the user what line a particular problem occurred on; for example:
==2189== Invalid read of size 1
==2189== at 0x804841C: main (leaky.c:20)
There is an invalid read at line 20.
How do I find a line number in my code?
You can use a emacs short cut to goto line e.g., 20 in your code. Hold down the ESC key then hit the X key followed by the G key then 20 then CR [carriage return]. You will be brought to line 20. This is handy when gcc or valgrind or any tool for that matter tells you there is a problem at a particular line.
The complete valout is here valout
In lecture 8 we went through the intentionally buggy buggy_sort.c code from “Linux Programming” by Mathew and Stones. We use gdb in a series of debugging moves that includes set breakpoints, commands, and patching running code to verify fixes to three bugs we encounter. This is a simple example that builts on our use of gdb in the course.
If you have not gone through buggy you need to. Please go over the buggy sort function. It illustratse a number of approaches to debugging buggy sort.c .
Crawler is your first real encounter with dynamic memory management for data strcutures and pointers which drive the management and operation of the data struct ures. Many tricky bugs lead to segfaults that can be easily debugged – a pointer that is NULL when you expect an address to a malloced data structure. But bugs may manifest all sorts of edge cases leading to unstable code, infinite loops, fewer crawled files than expected. You might have a beautiful operational crawler at depth 2 but segfaults at depth 3; or you might crawl at level 3 but if you change the size of the hash table (MAX_HASH_SLOTS) from 10000 entries (which means the crawler nearly always finds a NULL pointer in the hash table) to 10 entries (which means your code is executing collision processing nearly all the time) then you might (or will likely ;-) segafult – these are edge cases. By changing a system parameter (i.e., MAX_HASH_SLOTS) we radically chan ge the operation of the system forcing the program control through code not probably executed often. These edge case tests force pesky bugs out into the light so we can swat them – splat!.
Once you have coded each of the steps above and used the debugger to verify that the code functionally works you can go on to complete the crawler cycle which is just a while loop for all URLs to visit. You will likely experiennce some of the bugs mentioned above – many associated with your codes interaction with memory – that is data strcutures.
What do you do if you have a really non-obvious bug – that is, a tricky one?
1) run your program through valgrind, and note the line numbers of any errors. Examine the code around these lines and see if I can figure out what t he problem is.
In this lecture we covered valgrind. Run this command on the command line once you have a clean make – on a single command line (it looks like it’s on two lines below but has to be one:
Here is a good place to start with valgrind:
2) If you can’t simply figure out the problem by eyeballing your code (i.e., desk checking it), then start gdb and set breakpoints at some nearby lin es – that is before and/or after the source code line numbers given by the valgrind error indicated in valout. Step through (potentially multiple times), examining variables (e.g., DNODEs) and the code until you understand “the problem” (always a good place to start) – and what valgrind is complaining about, such as, mememory leaks, writing/reading beyond malloced memory.
Start by looking at the valout file – this is the output of valgrind (see the command line). Note, valgrind runs much more slowly that non valgrind code. In addition, we set the verbose flag in the command. Ideally you should have no errors but if you do (which you will for sure) then clear all the illegal read and writes and memory leaks.
I would not continue with any further code development until you clear all the valgrind errors; your CLEAN (no errors) valout file should look like this for example:
3) Of course, there could always be edge cases where something completely unrelated to the code is at fault, but these are pretty rare
OK, we are done. Being an effective debugger requires you to have an approach first: a set of techniques that allow you to work through finding easy and more progressively hard bugs. Tools can be a great help.
In the next lecture, we will talk more about testing your code with the goal of convincing yourself that your code has no bugs. However, as Dijkstra said “testing” only demonstrate the presence of bugs but not their absence. We will talk more about where bugs lerk and how the art of testing can help discover and remove many of the common bugs.
We strongly recommend that you play with these various versions of buggy_sort and go through the notes and reproduce what we did above in a step by step manner; then you can apply these simple techniques in nailing your pesky buys.