I've got an application that uses performance counters, that has worked for months. Now, on my dev machine and another developers machine, it has started hanging when I call PerformanceCounterCategory.Exists. As far as I can tell, it hangs indefinitely. It does not matter which category I use as input, and other applications using the API exhibits the same behaviour.
Debugging (using MS Symbol Servers) has shown that it is a call to Microsoft.Win32.RegistryKey that hangs. Further investigation shows that it is this line that hangs:
while (Win32Native.ERROR_MORE_DATA == (r = Win32Native.RegQueryValueEx(hkey, name, null, ref type, blob, ref sizeInput))) {
This is basically a loop that tries to allocate enough memory for the performance counter data. It starts at size = 65000
and does a few iterations. In the 4th call, when size = 520000
, Win32Native.RegQueryValueEx
hangs.
Furthermore, rather worringly, I found this comment in the reference source for PerformanceCounterLib.GetData:
// Win32 RegQueryValueEx for perf data could deadlock (for a Mutex) up to 2mins in some
// scenarios before they detect it and exit gracefully. In the mean time, ERROR_BUSY,
// ERROR_NOT_READY etc can be seen by other concurrent calls (which is the reason for the
// wait loop and switch case below). We want to wait most certainly more than a 2min window.
// The curent wait time of up to 10mins takes care of the known stress deadlock issues. In most
// cases we wouldn't wait for more than 2mins anyways but in worst cases how much ever time
// we wait may not be sufficient if the Win32 code keeps running into this deadlock again
// and again. A condition very rare but possible in theory. We would get back to the user
// in this case with InvalidOperationException after the wait time expires.
Has anyone seen this behaviour before ? What can I do to resolve this ?
This issue is now fixed, and since there has been no answers here, I will add an answer here in case the question is found in future searches.
I ultimately fixed this error by stopping the print spooler service (as a temporary measure).
It looks like the reading of Performance counters actually needs to enumerate the printers on the system (confirmed by a WinDbg dump of a hanging process, where I can see in the stack trace that winspool is enumerating printers, and is stuck in a network call). This was what was actually failing on the system (and sure enough, opening the "Devices and printers" window also hung). It baffles me that a printer/network issue can actually make the performance counters go down. One would think that there was some sort of fail-safe built in for such a case.
What I am guessing, is that this is cause by a bad printer/driver on the network. I haven't re-enabled printing on the affected systems yet, since we are hunting for the bad printer.