Search code examples
debuggingstatisticscode-analysis

What is statistical debugging?


What is statistical debugging? I haven't found a clear, concise explanation yet, but the term certainly sounds impressive.

Is it just a research topic, or is it being used somewhere, for actual development? In other words: Will it help me find bugs in my program?


Solution

  • I created statistical debugging, along with various wonderful collaborators across the years. I wish I'd noticed your question months ago! But if you are still curious, perhaps this late answer will be better than nothing.

    At a very high level, statistical debugging is the idea of using statistical models of program success/failure to track down bugs. These statistical models expose relationships between specific program behaviors and eventual success or failure of a run. For example, suppose you notice that there's a particular branch in the program that sometimes goes left, sometimes right. And you also notice that runs where the branch goes left are fine, but runs where the branch goes right are 75% more likely to crash. So there's a statistical correlation here which may be worth investigating more closely. Statistical debugging formalizes and automates that process of finding program (mis)behaviors that correlate with failure, thereby guiding developers to the root causes of bugs.

    Getting back to your original question:

    Is it just a research topic, or is it being used somewhere, for actual development?

    It is mostly a research topic, but it is out there in the "real" world in two ways:

    1. The public deployment of the Cooperative Bug Isolation Project hunts for bugs in various Open Source programs running under Fedora Linux. You can download pre-instrumented packages and every time you use them you're feeding us data to help us find bugs.

    2. Microsoft has released Holmes, an implementation of statistical debugging for .NET. It's nicely integrated into Visual Studio and should be a very easy way for you to use statistical debugging to help find your own bugs in your own code. I've worked closely with Microsoft Research on Holmes, and these are good smart people who know how to put out high-quality tools.

    One warning to keep in mind: statistical debugging needs ample raw data to build good statistical models. In CBI's public deployment, that raw data comes from real end users. With Holmes, I think Microsoft assumes that the raw data will come from in-house automated unit tests and manual tests. What won't work is code with no runs at all, or with only failing runs but no successful counterexamples. Statistical debugging works off of the contrast between good and bad runs, so you need to feed it both. If you want bug-hunting tools without runs, then you'll need some sort of static analysis. I do research on that too, but that's not statistical debugging. :-)

    I hope this helped and was not too long. I'm happy to answer any follow-up questions. Happy bug-hunting!