Correlating crashes with binary extensions or plugins

Firefox ships with an automated crash reporting system that can send reports of crashes back to Mozilla so that we can make Firefox crash less often. Anybody can search the reports to help find and fix problems. We categorize these crashes by their signature, which is the function, library, or address where we're executing code when Firefox crashes. However, it's sometimes hard to tell why a particular crash happens. If it's specific to a particular action or website, users will often mention that in the comments they submit, or we can sometimes tell based on the code that was running at the time of the crash. But sometimes we still can't get the browser to crash for ourselves based on what users said, or figure out a possible fix based only on the data in the crash report. Some crashes are also specific to particular plugins or extensions that users might have installed (or that might have been installed without the user's knowledge). If we know this, it gets us closer to fixing the crash, whether it's a problem in the extension or plugin, or a problem in our code triggered by it.

One of the pieces of data in a crash report is the list of shared libraries (modules containing code that is executed) that were loaded at the time of the crash (and the memory addresses at which they were loaded). These libraries could be parts of the operating system, parts of Firefox, or parts of extensions or plugins that are loaded. This list is essential for making sense of the crash report, since to know what code was executing when Firefox crashed, we need to be able to map a memory address of code that was executing. (Those addresses are themselves figured out after the crash report is submitted, using debugging information that we save after compiling Firefox, but don't ship to users, in order to reconstruct the execution stack at the time of the crash.)

However, the list of shared libraries that's loaded also tells us what plugins or extensions are running, which we can use to figure out which crashes might be related to particular extensions. In the past we've done this by looking at the list of modules by hand. On September 22, 2009, I wrote a script that processes those reports (which are essentially a JSON version of the data that show up in a crash report on the Web interface) that lists modules that might be related to causing the crash.

Firefox 3.5.3 data (September 22, 2009)

Mike Morgan and Aravind Gottipati helped me get a random sample of just under 10000 crash reports from a recent 24 hour period, all from Firefox 3.5.3. Running my script over this analysis gives:

This output looks like this:

  nsGlobalWindow::cycleCollection::UnmarkPurple(nsISupports*) (97 crashes)
     65% (63/97) vs.   2% (158/7100) FFComm.dll
     63% (61/97) vs.   2% (134/7100) bdGUICtl.dll
     63% (61/97) vs.   2% (134/7100) BDUtils.dll

What this output is saying is that the data have 97 occurrences of the crash with signature nsGlobalWindow::cycleCollection::UnmarkPurple(nsISupports*). Of these 97 crashes, in 63 the library FFComm.dll was loaded (65% of the time). However, out of all 7100 crashes, FFComm.dll was only loaded at the time of 158 of the crashes (2% of the time). This suggests that the presence of this library may be related to the cause of the crash.

There's a lot of data here to look through, so I'm posting it so everybody can look, and add data to existing bug reports.

I intentionally set the thresholds low as a starting point, so there's a good bit of noise in these reports. Future enhancements might include looking at the versions of the modules (in case a crash is only present in older versions of a plugin), using better thresholds, and perhaps also showing modules that appear to fix a crash in addition to those that appear to cause it. And perhaps even nicer-looking output.

Correlation to number of CPU cores

New data, September 30, based on the same dataset: a report on the correlation of crash signatures to CPU cores.

Correlation to installed addons

New data, October 5, based on the same dataset:

New crash dataset from September 29, 2009

New, October 10, 2009: all of the previous reports, regenerated using a significantly larger dataset containing all Firefox 3.5.3 crashes on September 29, 2009. (Each report contains Linux crashes, then Mac crashes, then Windows crashes.)

Nightly reports

Aravind set up these reports so they are run nightly on multiple products:

Note (2009-10-24): I noticed that crashes that show up disporportionately (e.g., 75%+) as happening on uniprocessor systems (in the core counts report) seem likely to be out-of-memory crashes (based on user comments of out-of-memory dialogs and analysis in bugs).