Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Q&A with Diomidis Spinellis on Effective Debugging

Q&A with Diomidis Spinellis on Effective Debugging

Key takeaways

  • Effective debugging depends on the mindful application of the right strategies, methods, practices, tools, and techniques.
  • Locate and reproduce bugs through detailed logging, failure reporting, defensive programming, and specialized tools.
  • After fixing one fault, find and fix similar ones and take steps to ensure they will not occur in the future.
  • Explaining a failing code snippet to a colleague can help you discover the error in it.
  • Pinpoint elusive bugs through static and dynamic program analysis tools.

The book Effective Debugging by Diomidis Spinellis describes 66 different approaches for effective debugging of applications and systems. It provides methods, strategies, techniques, and tools for finding and removing faults, and gives examples for using them in different settings.

InfoQ readers can download an excerpt from Effective Debugging.

InfoQ interviewed Spinellis about different approaches for debugging software, how using different compilers or execution platforms can help to debug software, reproducing failures, how find and fix similar faults after solving a failure, why it's better to use a graphical user interface for debugging, how to involve colleagues if you want to find faults in your code, using static program analysis tools of dynamic analysis tools to debug code, and what programmers can do to develop their debugging skills.

InfoQ: What made you decide to write a book on debugging?

Diomidis Spinellis: I love coding for the way it challenges my mind. A few years ago I recognized that debugging is an even more difficult and interesting task. Whereas coding is mainly the translation of requirements into program statements, debugging involves many different creative approaches. I also found that as with code reading, debugging is rarely taught or written about.

At that point I started observing myself whenever I debugged something and kept notes regarding the approach I used. I was surprised by how many approaches found their way to the list; the list grew and grew. I thus realized many would benefit if I expanded the list into a book. The Effective Programming series matched perfectly the material I had put together, and I was privileged to have my proposal accepted as a project within the series.

InfoQ: For whom is this book intended?

Spinellis: I wrote the book for intermediate and advanced programmers: people who know how to code, but haven’t had significant formal training on how to debug their code. Nowadays code often ends up as a running service (or app) rather than a program that performs some processing and terminates. Therefore, I also aimed the book at DevOps staff and system administrators who also face debugging tasks day in and out.

A few items in the book, such as the use of breakpoints or a debugger’s stack traversal commands, are fairly basic. I included these, because I’ve found that programmers often employ debugging approaches specific to the platform they use. Consequently, even some advanced programmers may not be familiar with debugging methods that others routinely use.

InfoQ: Which different kinds of approaches for debugging software are described in the book?

Spinellis: I start the book with three chapters containing items that are applicable to most debugging tasks. High-level strategies involve ways to approach the problem, such as narrowing down on the differences between a working system and a failing one. General-purpose methods and practices help you become better at debugging problems. One important practice in this category is to automate your testing scenarios and tasks. Doing so not only increases your efficiency, but often opens up opportunities for expressing more complex debugging scenarios. Then come general-purpose tools and techniques, which you can apply on diverse debugging tasks. These include the effective use of Unix command-line tools, the editor, the revision control system, and tracing tools.

The next four chapters detail platform-specific methods: the use of a debugger, how to narrow on bugs by tweaking the program’s code, as well as techniques you can apply when building your system and when you run it. Although the exact details of these methods depend on your platform, most platforms offer tools, such as static program analyzers or profilers, which help you pinpoint specific classes of bugs.

InfoQ: How can using different compilers or execution platforms help to debug software?

Spinellis: Manufacturers test new car models in all kinds of conditions: from the arctic tundra to the Sahara desert. This helps them uncover bugs they wouldn’t be able to find by driving around the car plant’s parking lot. A similar idea applies to software. Modern languages and APIs are large and complex; different compilers and runtime platforms stress software in diverse ways. For example, one compiler may happily accept code that causes your program to behave in an unspecified manner, whereas another will issue a warning when it encounters that code. The same goes for API implementations: one may mysteriously misbehave on an incorrect argument, while another may immediately raise a helpful exception. Even CPU diversity can help you: accessing a two-byte value on an odd memory address can generate a fault on some ARM CPUs while on others it may result in non-atomic behavior.

An interesting case occurred in the 1980s when Unix, running on the then popular VAX architecture, was written in a way that had accesses through a NULL pointer return zero. All sorts of bugs surfaced when Sun Microsystems changed that behavior in its systems to the modern practice, which results in an runtime exception.

InfoQ: What can you do to reproduce a failure?

Spinellis: Once you reliably and efficiently reproduce a failure you’re at least half way through the effort required to fix a bug. There are two main approaches for reproducing failures. One involves starting from a complex scenario that leads to a failure and gradually simplifying it. In the other approach you come up with a guess regarding the failure’s cause and build a scenario that reproduces it. Both approaches can lead to a small, tractable test case that causes the failure. Detailed logging, failure reporting, and defensive programming can often guide your actions.

With a small test case at hand, you drastically cut down the work required to debug a problem. You step-through tens of code lines rather hundreds, you inspect a screen-full of log data rather than a torrent, you zoom-into a few key variables rather than the program’s global state, and you can run the test case in seconds rather than hours.

I was recently debugging some failures in the processing of data files close to a terabyte in size whose processing took days. To solve them, I first added to the processing program many debugging options that generated detailed logs. I also added many internal consistency checks and corresponding reporting. I was thus able to narrow down the problem to about a gigabyte of data that would fail in less than a minute of processing. Then, I wrote a small filter that would isolate from the data leading to the failure, the records likely to be involved in it. This further reduced the suspect data to a few kilobytes in size, which I could manually inspect to pinpoint the problem. To ensure that my solution was correct and would stay so in the future, I also added a test case modeling the error and verified that the program handled it correctly.

Let me end here by mentioning that there are a number of specialized tools and approaches that can help you when you encounter bugs that are hard to reproduce. These include post mortem debugging (debug your code using the memory image of a crashed program), capture and replicate tools (they work wonders on non-deterministic bugs in multithreaded code), and back-in-time debugging (useful when you step over a rarely failing function call).

InfoQ: After solving a failure, how can you find and fix similar faults?

Spinellis: You’re making an important point. It’s often the case that the same fault will occur in more than one place. This can happen through the haphazard copy-pasting of code, or because an API can be easily misused. It is a sign of professionalism, once you find a fault to understand why it occurred, to find and fix similar ones, and to ensure that others like it do not crop up in the future.

You can often find similar faults by searching through the code with a regular expression that will match any suspect code. You can do this with your favorite editor or IDE, though I prefer to use for this purpose the Unix command-line tool grep. By specifying in a pipeline patterns that match the fault and reported patterns to ignore (these I pass to an instance of grep –v) I can easily narrow down my search to those cases that matter. Fixing those cases saves me and my colleagues all the work involved in debugging more potential failures.

Ensuring that similar faults do not occur in the future can be more challenging. If the fault is related to the misuse of an API function, one trick is to create a wrapper that will catch and report the error during testing. You can also see whether you can configure a code analysis tool to catch such errors during your continuous integration.

InfoQ: Why is it better to use a graphical user interface for debugging?

Spinellis: Although I love command line interfaces and I find myself to be very productive when I use them, debugging is one of the few tasks I believe is almost always better performed through a GUI. The reason for this is that debugging benefits from the simultaneous presentation of diverse data: source code, local variables, call stack, log messages, even CPU registers. A graphical interface allows you to have all these displayed on separate windows and updated concurrently. Also, pointing your mouse to a variable, to a code line, or to a call stack frame is typically much more efficient than specifying it by typing.

Sometimes I find myself debugging a system that lacks a GUI debugger. Then I improvise one through the setup of my desktop’s command-line and editor windows. One window may contain the relevant source code, another list test data, another display a continuous update of a log file, and another may offer a command-line prompt.

InfoQ: How can you involve colleagues if you want to find faults in your code?

By Tom Morris - Own work

Spinellis: The most effective way is probably the rubber duck technique. It involves explaining how your code works to someone else. Typically, half-way through your explanation, you'll exclaim “oh wait, how silly of me, that's the problem!”, and be done. When this happens rest assured that this was not a silly mistake that you carelessly overlooked. By explaining the code to your colleague you engaged different parts of your brain, and these pinpointed the problem. In most cases your colleague plays a minimal role. This is how the technique gets its name: explaining the problem to a rubber duck could have been equally effective.

There are also other more formal ways in which colleagues can help you catch bugs, such as pair programming and code reviews. In some cases you can even engage in role-play. For example, if you're debugging a communications protocol, you can take the role of one party, a colleague can take the role of the other, and you can then take turns attempting to break the protocol (or trying to make it work). Other areas where this can be effective are security (you get to play Bob and Alice), human-computer interaction, and workflows. Passing around physical objects, such as an “edit” token can help you here.

InfoQ: Which advice do you have for using static program analysis tools?

Spinellis: Static program analysis tools analyze your program to find many types of problems, from formatting glitches (think of Pylint for Python code) to code constructs that can cause your program to crash or misbehave (Coverity and FindBugs are two well-known tools in this area). Another underappreciated tool in this category is your compiler or interpreter. A few command line options (e.g. -Wall, -Wextra, and –Wshadow for many compilers) or statements (e.g. "use strict"; in JavaScript code) can trigger many useful warning messages.

If you start from scratch in this area, my advice is to first configure the tools you’ll use to produce warnings that are consistent with the practices in your organization. For example, if for some reason you already have many methods that only differ through capitalization, you might want to disable the FindBugs “Confusing method names” message. If the level of warnings can be adjusted, choose the highest level that will not drown you in warnings about things you're unlikely to fix. Then methodically remove all other warnings. This may fix the fault you're looking for, and also make it easier to see other faults in the future. Finally, ensure that your build and continuous integration processes run the static program analysis tools, so that your code won’t regress in the future.

InfoQ: How can you use dynamic analysis tools to debug code?

Spinellis: Dynamic program analysis tools deliver the ultimate truth regarding your code, because they analyze it while it runs. You typically run your program under such a tool, and view the report it produces. For example Valgrind can find memory leaks, illegal memory accesses, and the use of uninitialized memory. Execution profilers, such as those built into modern IDEs, as well as standalone tools (e.g. VisualVM, JProfiler, and Java Mission Control), help you locate performance hogs. Other tools, such as Intel Inspector, let you find deadlocks and race conditions.

One underappreciated category of dynamic analysis tools is execution tracers. These display all the program’s interactions with the operating system or the runtime system’s libraries. Examples of such tools include ltrace, strace, ktrace, truss, and SystemTap under various Unix flavors as well as Process Monitor under Windows. The beauty of these tools is that you can apply them on an executable program without requiring the source code for it. They have saved me countless times by revealing why a particular program fails.

InfoQ: What can programmers do to develop their debugging skills?

Spinellis: Airline pilots often report their experience and proficiency of in terms of flight hours. Debugging (and programming) skills also depend a lot on the time you spend on these activities. However, mindful debugging, rather than randomly throwing darts at the bug, can make a big difference both in your effectiveness and in the development of your debugging skills. This includes carefully and deliberately choosing the best approach to attack the problem, investing in an environment that can make you productive, setting up and learning to apply specialized tools, and studying to deeply understand the language features and APIs used by your code.

About the Book Author

Diomidis Spinellis is a Professor in the Department of Management Science and Technology at the Athens University of Economics and Business. He has written two award-winning, widely-translated books: “Code Reading” and “Code Quality: The Open Source Perspective” and the 2016 book “Effective Debugging”. Spinellis served for a decade as a member of the IEEE Software editorial board, authoring the regular “Tools of the Trade” column. He has contributed code that ships with Apple’s OS X and BSD Unix and is the developer of UMLGraph, CScout, and other open-source software packages, libraries, and tools. He holds an MEng in Software Engineering and a PhD in Computer Science, both from Imperial College London. Spinellis is a senior member of the ACM and the IEEE. From January 2015 he is serving as the Editor-in-Chief for IEEE Software. In a previous life he was four times winner of the International Obfuscated C Code Contest. Nowadays he tries to keep his code boring. Twitter: @CoolSWEng

Rate this Article