Techniques for debugging your programs
This page is devoted to describing techniques that reduce the pain of tracking down and fixing bugs.
The Cardinal Rule
The most important rule of debugging is that one must understand a bug before trying to fix it. Simply changing the code until the problem disappears does not guarantee that something else wasn't broken in the process.
A bug that I fixed while overhauling Code Medic to produce version 2.0 provides a good example of the benefits of understanding the problem instead of merely fixing the symptoms. The Variables window was crashing while updating itself after the value of a variable was changed via gdb's "set variable" command. The stack trace (using Code Medic to debug itself!) showed that an object was being referenced after it had been deleted. Since the object was responsible for deleting itself, I initially thought of simply adding a direct notification of this occurence so the pointer to the object would be cleared. However, the code clearly indicated that this should already be happening in an indirect way. A more careful analysis showed that there was a more fundamental problem. Understanding this problem also exposed a way to dramatically speed up the code by eliminating unnecessary queries to gdb. I would never have noticed this if I had merely implemented my original idea.
Automatic debugging
The best way to manage bugs is never to introduce them in the first place. This requires discipline and attention to detail. Every possibility, no matter how remote, must be accounted for in the design and dealt with in the code. If a function returns an error value, then the caller must remember to check it and take appropriate action. Hiding error values the way the standard C library does (e.g. returning -1 if there is an error and 0-255 when successful) is a bad idea because it makes it much harder for the caller to remember to handle the errors.
Unfortunately, we humans are prone to make mistakes regardless of how hard we try to avoid them. It is therefore a very good idea to use a copious number of assert() statements so that the computer can catch the problem right when it happens. This is far easier than digging through a core dump or analyzing page after page of printed output. (Remember to never do actual work in the argument to assert(), however, because NDEBUG erases the entire statement!) The use of assert() during initial program development is especially critical to the maintenance phase because it allows a programmer who is not intimately familiar with every line of code to make changes, confident in the knowledge that the program will complain if anything goes wrong, rather than silently failing. This eliminates the single largest source of difficulty in tracking down bugs, the Cause/Effect Chasm, 3 where the root cause and the actual failure occur in completely different parts of the program.
Using assert() is the idea behind Eiffel's concept of Design by Contract. Every function should verify:
- the values that are passed in
- its state at various points during execution
- the values that it returns
Eiffel has the cool feature that it understands assert() and can optimize out the redundant ones. But assert(), or its moral equivalent, is available in all languages. Use it!
Monkey wrench testing
Feeding random inputs into your program is a good way to check that it won't crash when clueless users get their grubby hands on it. (wink, wink) Unfortunately, as explained in Monte Carlo Debugging: A Brief Tutorial, 1 random inputs are unlikely to test more than the first layer of your program's defenses, precisely because what is coming in is gibberish. What is really needed is a separate testing department staffed with programmers who can evolve a test suite that starts out generating random inputs and then slowly evolves to match the program's logic, thereby eventually testing every level.
C++ tricks
"pure virtual method called" is the bane of C++ programmers because g++ does not dump core on this error. (But it should!) It turns out that setting a breakpoint at __pure_virtual will catch this error when the program is run inside a debugger.
glibc tricks
The version of malloc() provided by glibc 2.x helps you catch memory errors. If you set the environment variable MALLOC_CHECK_ to 2, you will get a core dump if the heap corruption is detected, e.g., if you try to free() invalid memory. Refer to the malloc man page for more details.
References
[1] Bell, Charles R. Monte Carlo Debugging: A Brief Tutorial. Communications of the ACM, February 1983, Vol. 26, No. 2, pp. 126-127.
[2] Weiser, Mark. Programmers Use Slices when Debugging. Communications of the ACM, July 1982, Vol. 25, No. 7, pp. 446-452.
[3] Eisenstadt, Marc. My Hairiest Bug War Stories. Communications of the ACM, April 1997, Vol. 40, No. 4, pp. 30-37.