Search code examples
.netmultithreadingtask-parallel-libraryaccess-violationdispatch

How to approach debugging an AccessViolationException in a .Net application on XP


I have a .net application that I developed on a Windows 8.1 machine using Visual Studio Express 2008 compiled for .Net 4.0

It runs fine on the Windows 8.1 machine, but on a (very) old single core XP machine it occasionally throws an AccessViolationException, and I cannot figure out why.

Running inside Visual Studio in debug mode, I get nothing helpful.

The program is very parallel and I am using the TPL.

The Event log shows this (which means nothing to me):

Stack:
    at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG ByRef)
    at System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr, 
Int32, Int32)
    at System.Windows.Forms.Application+ThreadContext.RunMessageLoopInner(Int32, System.Windows.Forms.ApplicationContext)
    at System.Windows.Forms.Application+ThreadContext.RunMessageLoop(Int32, 
System.Windows.Forms.ApplicationContext)
    at Microsoft.VisualBasic.ApplicationServices.WindowsFormsApplicationBase.OnRun()
    at Microsoft.VisualBasic.ApplicationServices.WindowsFormsApplicationBase.DoApplicationModel()
    at Microsoft.VisualBasic.ApplicationServices.WindowsFormsApplicationBase.Run(System.String[]) 

The only libraries outside of the standard .net stuff I'm using are System.data.SQLite and Newtonsoft.JSON

The application is using the JSON to access an RPC-Post API.

Any ideas what bit of my code might be causing this? Like I say it only happens on the old XP machine, but it could be a race condition I am only seeing because it is much slower. I don't even know where to start!


Solution

  • I'll noodle about this problem for a bit, it is pretty important to realize that you cannot get an answer with the info you posted. I can only talk about what you need to do to discover more information about this crash.

    Most important detail is that the crash did not occur in the DispatchMessageW() method. There are a large number of stack frames on top of the trace you posted, you however cannot see them. Because they belong to unmanaged code, the CLR only records trace information for managed code. DispatchMessage() is a work-horse winapi function that does many different things in different cases, its primary job is to call the window procedure of a window. Which is the code that handles a specific message for the window.

    What is clear from the trace is that the crash was not caused by any .NET code. Which is expected, .NET is very good at avoiding AccessViolationExceptions. There are a few controls that you could use on your form that could be responsible. On top of that list are ActiveX controls, WebBrowser, the shell dialogs like OpenFileDialog. All controls that are implemented in native code and have a very thin .NET wrapper to make them usable in a .NET project. They are normally pretty well behaved. But then this is an old machine that has been subjected to who-knows-what, such machines tend to be infected pretty badly with all kinds of "helpful" software that injects itself into any process and probably hasn't been maintained in a long time.

    You mention "very parallel", that tends to be a red flag. No strong signal, the crash occurs on the UI thread of the program, not a worker thread. But it doesn't exclude it, you could be running code on a worker that does something with the window in an illegal way and destabilizes it. Causing a subsequent crash. If you've been faithfully using the debugger without intentionally suppressing InvalidOperationException and don't create any windows on a worker thread then this isn't a strong lead.

    To get down to the root cause, you need to use an unmanaged debugger so you can see exactly where the crash occurs. That tends to be rough in more than one way, like not being to get to the machine when it bombs. In which case you need to ask the user to create a minidump of the crashed process. XP makes this painful as well, it isn't a built-in feature and you'll have a hard time using the minidump if your machine doesn't boot XP. SysInternals' ProcDump utility is useful to record one.

    Once you receive one from the customer, you'll need to open it in a debugger and inspect it to find the reason for the crash. That's going to be rough if you can't make sense of the stack trace you see now, be sure to ask for help from team members that know more about the Windows internals. Google "how to debug a minidump" to learn more, the minimal MSDN how-to page is here.

    All and all, do not expect miracles here, this is going to take at least a month out of your life climbing several steep learning curves and no guarantee for success. Which inspires the secondary approach, if your app is stable on any modern Windows version but not on one or two XP machines then this, arguably, stops being your problem. Time for the user to update his machines. Good luck with it.