I have a number of Windows 2008 R2 24 core servers that run the same process, but each instance of the process has a different data set. Usually 2-4 instances of the process run on each server. The processes are compiled for x64, have a GUI, and use Workstation GC.
Every second, the process outputs the GC counts to a log file on local disk. The log is used for many other things as well. Once in a while, I find that one of these processes pauses execution for 5 or more seconds. I see that nothing is written to the log for that duration of time. Every time this happens, it concides with the number of Gen2 GCs increasing by 1.
This is a rare event. This happens maybe once every 10000 Gen2 GCs across all processes.
Each machine has more than enough RAM to keep all processes in RAM.
This morning I had a 9 second pause in one of the processes and this time I captured Performance counters for the affected process and the entire machine. None of the other processes running at the time were affected. Analysis of the Performance Counters shows the following:
Comparing after the pause with before the pause:
Can anyone confirm that this activity can be attributed to swapping? Given that the machines have more than enough RAM, are there any suggestions for fixing these pauses?
Update #1 (3/5/2012):
Experienced a 6.5 second pause in one of the processes today. .NET Clr Memory performance counters show the size of the LOH did not change, but the size of the Gen 2 Heap and the Size of all heaps and Total committed bytes dropped by 700 Mb. Total reserved bytes dropped by 250 Mb. So it seems that a lot garbage in Gen2 was reclaimed on this particular GC.
Update #2 (3/6/2012):
Experienced a 7 second pause in one of the processes today. The following dropped: Gen 2 Heap Size (.NET CLR Memory) by 900 Mb Num Bytes in all Heaps (.NET CLR Memory) by 900 Mb Num Total Commited Bytes (.NET CLR Memory) by 800 Mb Num Total Reserved Bytes (.NET CLR Memory) by 540 Mb Virtual Bytes (Process) by 550 Mb Working Set (Process) by 800 Mb Working Set - Private (Process) Page File Bytes (Process) by 800 Mb Private Bytes (process) by 800 Mb
LOH stayed the same
It appears that a bona-fide Gen2 GC takes a couple seconds on a process of several gigs in size.
So why do some Gen2 GCs take 5 seconds and others take almost no time? Because I have Concurrent/Background Gc enabled and it appears as if when a Concurrent GC completes, the Gen2 GC counter is incremented. I think this is misleading.
With Concurrent GC disabled, the Gen2 GC counts drop substantially and every Gen2 GCs takes a few seconds.