Search code examples
.netdebuggingwindbgcrash-dumps

How to debug BSOD crash caused indirectly by .NET application


We have a .NET application that transfers files from a client workstation to a database on a central server in a LAN. A few of our customers run this on wireless Windows XP workstations that use a commercial, 3rd-party WiFi encryption mechanism (for various reasons they don't use standard WiFi encryption such as WPA). Fairly consistently, these workstations blue-screen when our application runs.

Our application does not directly call any unmanaged code, but apparently something our program is doing is indirectly causing a problem in the underlying network stack. I got a mini kernel dump file from one of the affected machines and loaded it in WinDbg, and it told me that the crash was probably caused by the .sys file which is one of the driver files for the encryption software (which I already suspected). However, the debugger didn't tell me much else that was helpful.

My question is this: is there any way for me to get a stack trace from the point of the crash all the way up to our .NET application? Do I need a complete memory dump? I have the source for our application, but I am hindered by the fact that a) I don't have the source or symbols for the driver in question; and b) I am not experienced with low-level Windows debugging. I don't mind modifying our application if necessary to avoid the problematic calls if necessary, but I'd need to know what calls to avoid.


Solution

  • As the comments have pointed out, a usermode program can not cause a bluescreen. Only a kernel level component can cause a BSOD. What is most likely happening is that your program happens to send data in a certain fashion that the network driver can't handle and that is causing the BSOD. This is not your programs fault. ALL kernel drivers are supposed to use defensive programming techniques. So if a BSOD is occurring it's the drivers fault. That's one of the major features of the kerenel/usermode seperation. The usermode is not supposed to be able to do anything that can BSOD the box.

    I realize that the above advice isn't always helpful when you're just trying to fix an issue. So the best thing to do would be to open the dump in windbg, and run !analyze -v. This will give you a reasonable stack trace (for unmanaged code) and you can see which driver is causing the issue.

    If you want to see what thread caused the issue, I'm afraid your SOL. Basically isn't not possible to know for certain which thread caused the issue, since most likely the packet was stuck on a queue and then processed later. By the time the box BSOD'd the thread that put the packet on the queue has already gone off and done other things.

    But if you're super lucky and the stars all align, then maybe you might have the thread still around in the same place that was putting the packet on the queue and you can see if using windbg with the SOS dll.

    A reasonable helper for getting started with managed debugging with windbg is here: http://blogs.msdn.com/b/alejacma/archive/2009/07/07/managed-debugging-with-windbg-introduction-and-index.aspx

    This won't answer all your questions, but it's a decent start and googling will get you most of the rest of the way.