David Baron's Weblog

Why debug builds (and assertions) are important

Monday, 2013-07-29, 17:35 -0700

I want to talk about why DEBUG builds are important.

First, though, I'd like to clarify terminology. When developers talk about debug builds, they might be talking about one of three things:

  1. builds that have the DEBUG macro defined, so that #ifdef DEBUG blocks are used
  2. builds that have compiler-generated debugging information, so that they work better in a debugger
  3. builds that have optimization disabled, so that they work better in a debugger

I want to talk about the first of these, the #ifdef DEBUG blocks. The most important difference when DEBUG is defined is often that assertions (in their various forms) are checked. See, for example the beginning of Element::BindToTree (which is called when an Element is attached to its parent). All of these NS_PRECONDITION macros are tested only in debug builds. They're not included in the release builds that we ship to users because they would slow things down. But in debug builds, they print a warning to the console, and they cause any xpcshell test to fail, and causes any reftest or mochitest (except browser-chrome mochitests) that isn't already marked as having known assertions (i.e., known failures) to fail. We also have fatal assertions, such as the MOZ_ASSERT macro, which cause any debug build to crash instantly, which would cause any test that hits it to fail.

To explain one reason this is important, I'd like to step away from software engineering for a minute and talk about airplanes. Airplanes are expected to be really safe, and the people who make them and fly them have developed ways of doing that to make them safe. One of these is the idea that it's generally hard for an airplane to fail catastrophically with only one thing going wrong; most serious airplane crashes have required multiple failures. Consider, for example, the crash of American Airlines Flight 191 in 1979. In this incident, due to faulty repair procedures, the left engine detached from the wing due to failure of the rear engine mount, swung around in front of the wing, and ejected itself over top of the wing. This, alone, should have been survivable. However, the engine detaching from the wing damaged the hydraulic lines and cut power to some monitoring systems whose power source was only connected to a single engine. The damaged hydraulic line led the leading edge slats on the wing to retract, reducing the aerodynamic stall speed of that wing to below its takeoff speed, and causing the aircraft to roll over and crash. The response to this crash was not only to forbid the maintenance procedures that damaged the engine mount. The response also included mandating slat relief valves so that the slats would not retract due to loss of hydraulic pressure.

So, back in the world of software, there are also times we want to make sure a single mistake can't lead to a serious failure. For example, when I see an exploitable security bug, I often want to fix that bug in two different ways, so that a single mistake, alone, couldn't lead to a similar bug happening again. (See, for example, followups to bug 607222 or the multiple fixes to bug 468645.) This is possible quite often, as many exploitable security bugs are the result of a sequence of things going wrong, just like airplane crashes.

But when bad things don't happen immediately when something goes wrong, how do we know that something is wrong in the first place? That's one thing all these assertions are for. They document our understanding of how the system is supposed to work. We generally shouldn't crash in an exploitable way when one thing goes wrong. But we generally should assert. The assertion might show that something has gone wrong that we assume won't, or that something has gone wrong which we know will produce bad results when interacting with another feature (even if it might not be interacting with that feature when the assertion fires). Running all of our test suites in debug builds and making the tests fail when new assertions fire allow us to prevent introducing these first-step-of-a-sequence problems in anything tested by our test suites.

Also (and less related to airplane crashes), the process of debugging a software problem is often a process of working backwards in time: we observe a problem and figure out what led to that state. This leads to another case where it's useful to have assertions: to document things that we know will lead to a failure (such as a crash) in the near future, but which represent a much easier point to debug backwards from (for example, the assertions about callback reentry in pldhash). When good assertions are present, problems are often easier to debug, since there will be assertions documenting something that goes wrong earlier, which shortens the amount of working backwards that we have to do to find the problem.

Likewise, we can also use assertions so they fire in cases where we know something has gone wrong, but we might not notice it. For example, if we can write an assertion that will fire in a certain case where things would be drawn incorrectly, we'll detect any case in which that incorrect painting happens, even if it happens in the middle of a test that was designed to test something else and isn't actually looking at what appears on the screen.

So assertions, the key component of debug builds, help us in multiple ways. They prevent problems from being introduced that are one step of a sequence that could lead to more serious problems, and they help us debug failures that have been observed in other ways, and they allow us to increase our test coverage.

Update (23:33): Robert O'Callahan points out that I missed another important reason for assertions: they serve as documentation. And they're more reliable than prose documentation, too, because they're tested to be correct.