Search code examples
.netdebuggingms-access-2007cpujet

CPU Stuck at 100% on Customer PC, any debugging suggestions?


I'm having a dead-end situation with one of the clients using my software. Out of about 40 copies of our product sold (Application programmed in .NET 2.0 using VB.NET 2005), about 2 get non-responsive with 1 core of the dual core CPUs stuck at 100% (program uses 1 core only)

The most logical guess is an infinite loop causing this behavior, but the are thousands of lines of code with many, many loops. That is all the information I've got; now, how do you suggest I approach debugging this problem?

EDIT: Basically, the software is responsible for calculating amount of credit spent using other devices, such as PCs, etc. It is a Cybercafe management program and fails intermittently i.e. it is subtracting credit when is fails. It does other things in the background too, like checking to see if it is time to create a database backup, among other things.

EDIT: Solved. It was the most unlikely problem. The Access Database Engine which I used as the DBMS is actually the part of my application that is problematic. It has difficulty working with a row-JUST ONE FRIGGIN ROW-in one of the tables. I can't delete it, or otherwise add a record related to that row in any other table; Even MS Access 2007 causes the CPU to go up to 100% when I try to work with that row!

A simple "Compact and Repair" command fixed everything. I guess I'll issue that command every time my application starts up. That would prevent this from happening again.

Thanks to WinDbg I could find where the problem was. I recommend everyone to learn how to use it 'cause it's a real time saver.


Solution

  • Install windbg (Windows debugger) on the target machine. Invoke the debugger, and attach to the suspicious process, run the program and then wait until problem happens. When the problem happens, invoke the following command in the debugger command line

    !runaway

    This will show which of your threads are consuming most of the time. Then get several thread stacks from that thread that is consuming most of your cpu resources.

    Here is an example:

    0:015> !runaway
    

    User Mode Time Thread Time 0:1074 0 days 0:00:21.637 11:137c 0 days 0:00:02.792 4:12c8 0 days 0:00:00.530 9:1374 0 days 0:00:00.046 15:13d0 0 days 0:00:00.000 14:1204 0 days 0:00:00.000 13:154c 0 days 0:00:00.000 12:144c 0 days 0:00:00.000 10:1378 0 days 0:00:00.000 8:1340 0 days 0:00:00.000 7:12f0 0 days 0:00:00.000 6:12d4 0 days 0:00:00.000 5:12d0 0 days 0:00:00.000 3:12c4 0 days 0:00:00.000 2:12c0 0 days 0:00:00.000 1:12b4 0 days 0:00:00.000

    Now assume we want a call stack for the second thread in the list, thread 11, so we first switch to thread 11. This can be done by entering ~11s.

    0:015> ~11s
    

    eax=03fbb270 ebx=ffffffff ecx=00000002 edx=00000060 esi=00000000 edi=00000000 eip=77475e74 esp=0572f60c ebp=0572f67c iopl=0 nv up ei pl zr na pe nc cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246 ntdll!KiFastSystemCallRet: 77475e74 c3 ret

    Now get a call stack for this thread by executing kp:

    0:011> kp
    ChildEBP RetAddr  
    0572f608 77475620 ntdll!KiFastSystemCallRet
    0572f60c 75b09884 ntdll!NtWaitForSingleObject+0xc
    0572f67c 75b097f2 kernel32!WaitForSingleObjectEx+0xbe
    *** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Program Files\Mozilla Firefox 3.1 Beta 1\nspr4.dll - 
    0572f690 10019a0b kernel32!WaitForSingleObject+0x12
    WARNING: Stack unwind information not available. Following frames may be wrong.
    0572f6ac 10015979 nspr4!PR_MD_WAIT_CV+0x8b
    0572f6c4 10015763 nspr4!PR_GetPrimordialCPU+0x79
    *** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Program Files\Mozilla Firefox 3.1 Beta 1\xul.dll - 
    0572f6e0 64d44d6a nspr4!PR_Wait+0x33
    0572f708 64dbe67e xul!NS_CycleCollectorForget2_P+0x698a
    0572f72c 10019b3f xul!gfxWindowsPlatform::FontEnumProc+0xfd4e
    0572f734 10015d32 nspr4!PR_MD_UNLOCK+0x1f
    0572f738 1001624b nspr4!PR_Unlock+0x22
    0572f754 1001838d nspr4!PRP_TryLock+0x4cb
    00000000 00000000 nspr4!PR_Now+0x109d
    

    The command kp will print the parameters. Local variables can be printed with dv.

    Alternatively you can use process explorer from sysinternals.

    If all this is not possible, because it is a remote client machine, install userdump, which creates a dump file that can be sent to you for further analysis. You can create a batch file for the customer to invoke userdump with the correct parameters. Userdump is a tool from Microsoft, which can be downloaded from their web page.