Search code examples
c#debugginginternet-explorercrashbho

Next steps debugging crash in customer environment


Part of our product is an IE plugin (BHO), which is running happily in lots of different environments across multiple OS versions/IE versions.

However, in a trial setup for one customer, running XP SP3 machines via citrix XenDesktop, IE 7 is crashing when the two below conditions are met:

  • Our plugin is loaded
  • The Shockwave flash object add-on is loaded (latest version - Flash11e.ocx)

Some extra info:

  • The crash happens when we then try and show a dialog to the user, or shortly after this. However the crash doesn't happen in our code, which is all written in C#, it happens in various places, often ole32.dll.
  • Our dialogs are HTML pages rendered in a webbrowser control, shown in a Form via form.ShowDialog(ownerWindow) in the BHO.

Either plugin seems to work fine independently. Disabling flash, or skipping any sites that use flash prevent the crash.

The customer is reasonably accommodating, and I was able to run IE with the MS Debugging Tools in order to capture a few dumps at the time of the crash. I'm now having some trouble interpreting the dumps. Thinking it was heap corruption I ran the debugging tools with full pageheap enabled, but that did not trigger a breakpoint.

The analysis from the Debugging tools is as follows:

In iexplore_PID_5064_Date_12_20_2011__Time_11_19_26AM_161_Second_Chance_Exception_C0000005.dmp the assembly instruction at ole32!HandleIncomingCall+e2 in C:\WINDOWS\system32\ole32.dll from Microsoft Corporation has caused an access violation exception (0xC0000005) when trying to read from memory location 0x03ce4ff8 on thread

The stack trace at the point of crash is:

Thread 7 - System ID 1140
Entry point   ieframe!CTabWindow::_TabWindowThreadProc 
Create time   20/12/2011 19:18:08 
Time spent in user mode   0 Days 0:0:19.828 
Time spent in kernel mode   0 Days 0:0:10.468 


Full Call Stack


Function                                Arg 1     Arg 2     Arg 3     Arg 4   Source 
ole32!HandleIncomingCall+e2                 0f9aafbc     00000034     00000001     07e8ab6c    
ole32!STAInvoke+24                          17444f80     00000001     0781efc0     077e8f10    
ole32!AppInvoke+7e                          17444f28     077e8f10     0781efc0     07e8ab6c    
ole32!ComInvokeWithLockAndIPID+2c2          17444f28     077ec420     00000000     17444f28    
ole32!ComInvoke+60                          17444f28     00000400     0774ee30     07bcfe48    
ole32!ThreadDispatch+23                     17444f28     07bcfeb0     7752b096     00000000    
ole32!ThreadWndProc+fe                      005d0594     078b6ee0     0000babe     17444f2c    
user32!InternalCallWinProc+28               7752b096     005d0594     00000400     0000babe    
user32!UserCallWinProcCheckWow+150          00000000     7752b096     005d0594     00000400    
user32!DispatchMessageWorker+306            7bcff64     00000000     07bcffb4     3e25e69b    
user32!DispatchMessageW+f                   07bcff64     0013e490     0013e5b8     07868ff0    
ieframe!CTabWindow::_TabWindowThreadProc+189 07e03e30     0013e490     0013e5b8     07868ff0    
kernel32!BaseThreadStart+37                 3e25e464     07868ff0     00000000     00000000    

I'm going to see what else I can get from this dump file, but I'm hoping someone here will have a great idea. I'd like to test a lot more stuff at the customer site, but we only have so many chances with them, so I need to use any time I get there very wisely.

For me a couple of next steps seem to be:

  • If the problem is flash messing up something in the way of us showing dialogs, I'd like to test a completely stripped down BHO that just shows dialogs, to show that the problem does not lie with our code.
  • There are a lot of other plugins installed on the machine, it would be nice to start with a stripped down image and build up from there, to see when the problem starts triggering.

Sometimes the crash happens in pseuoserverinproc.dll, which is part of HDX MediaStream, which runs flash content locally rather than on the server.

== update

I've had quite a bit of success with WinDbg analysing the dumps that I have. I think it makes quite a bit of sense to try and use gflags/windbg on the desktop that is having the troubles and debug it live.

That would be my recommended next step to anyone in a similar position at the moment, will know more about how good this advice is an a weeks time when I've had a chance to apply it.


Solution

  • We solved the problem in the end (well worked around it). If anyone is interested, this is how we did it.

    Analysing the stack dumps with WinDbg (which is a great tool). We found that after the problem was isolated to showing WinForms in iexplore.exe after flash had loaded in XenDesktop deployments. Knowing this we were able to work around the problem.

    The key was getting good crash dumps, working out a minimal reproduction scenario and having a good customer that let us test our theory!