David Baron's Weblog

Correlating crashes with binary extensions or plugins

Tuesday, 2009-09-22, 23:54 -0700

Update (2009-09-27): I've moved this page to a permanent location and I have added new data there. Read that page instead of this one.

Firefox ships with an automated crash reporting system that can send reports of crashes back to Mozilla so that we can make Firefox crash less often. Anybody can search the reports to help find and fix problems. We categorize these crashes by their signature, which is the function, library, or address where we're executing code when Firefox crashes. However, it's sometimes hard to tell why a particular crash happens. If it's specific to a particular action or website, users will often mention that in the comments they submit, or we can sometimes tell based on the code that was running at the time of the crash. But sometimes we still can't get the browser to crash for ourselves based on what users said, or figure out a possible fix based only on the data in the crash report. Some crashes are also specific to particular plugins or extensions that users might have installed (or that might have been installed without the user's knowledge). If we know this, it gets us closer to fixing the crash, whether it's a problem in the extension or plugin, or a problem in our code triggered by it.

One of the pieces of data in a crash report is the list of shared libraries (modules containing code that is executed) that were loaded at the time of the crash (and the memory addresses at which they were loaded). These libraries could be parts of the operating system, parts of Firefox, or parts of extensions or plugins that are loaded. This list is essential for making sense of the crash report, since to know what code was executing when Firefox crashed, we need to be able to map a memory address of code that was executing. (Those addresses are themselves figured out after the crash report is submitted, using debugging information that we save after compiling Firefox, but don't ship to users, in order to reconstruct the execution stack at the time of the crash.)

However, the list of shared libraries that's loaded also tells us what plugins or extensions are running, which we can use to figure out which crashes might be related to particular extensions. In the past we've done this by looking at the list of modules by hand. But today, Mike Morgan and Aravind Gottipati helped me get a random sample of just under 10000 crash reports from a recent 24 hour period, all from Firefox 3.5.3. I wrote a script that processes those reports (which are essentially a JSON version of the data that show up in a crash report on the Web interface) that lists modules that might be related to causing the crash.

The output of my script is available for data based on:

This output looks like this:

  nsGlobalWindow::cycleCollection::UnmarkPurple(nsISupports*) (97 crashes)
     65% (63/97) vs.   2% (158/7100) FFComm.dll
     63% (61/97) vs.   2% (134/7100) bdGUICtl.dll
     63% (61/97) vs.   2% (134/7100) BDUtils.dll

What this output is saying is that the data have 97 occurrences of the crash with signature nsGlobalWindow::cycleCollection::UnmarkPurple(nsISupports*). Of these 97 crashes, in 63 the library FFComm.dll was loaded (65% of the time). However, out of all 7100 crashes, FFComm.dll was only loaded at the time of 158 of the crashes (2% of the time). This suggests that the presence of this library may be related to the cause of the crash.

There's a lot of data here to look through, so I'm posting it so everybody can look, and add data to existing bug reports.

I intentionally set the thresholds low as a starting point, so there's a good bit of noise in these reports. Future enhancements might include looking at the versions of the modules (in case a crash is only present in older versions of a plugin), using better thresholds, and perhaps also showing modules that appear to fix a crash in addition to those that appear to cause it. And perhaps even nicer-looking output.