I can't explain most of the memory used by a C# process. The total memory is 10 GB, but the total reachable and unreachable objects altogether total 2.5 GB. I wonder what these 7.5 GB could be?
I'm looking for the most likely explanations or a method to find out what this memory can be.
Here is the precise situation. The process is .NET 4.5.1. It downloads pages from internet and process them with machine learning. The memory is almost entirely in the Managed Heap as shown by VMMap. This seems to rule out unmanaged memory leak.
The process has been running for days and the memory slowly grew. At some point, the memory is 11 GB. I stop everything running in the process. I run garbage collections including large object heap compaction several times (with one minute of interval):
GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
GC.Collect();
The memory goes down to 10 GB. Then I create the dump:
procdump -ma psid
The dump is 10 GB as expected.
I open the dump with .NET memory profiler (version 5.6). The dump shows a total of 2.2 GB reachable objects and 0.3 GB unreachable objects. What could explain the remaining 7.5 GB ?
Possible explanations I've been thinking of :
After investigation, the problem happens to be heap fragmentation because of pinned buffers. I'll explain how to investigate and what pinned buffers are.
All profilers I've used agreed to say most of the heap is free. Now I needed to look at fragmentation. I can do it with WinDbg for example:
!dumpheap -stat
Then I looked at the "Fragmented blocks larger than..." section. WinDbg says objects lie between the free blocks making compaction impossible. Then I looked at what is holding these objects and if they are pinned, here for example object at address 0000000bfaf93b80:
!gcroot 0000000bfaf93b80
It displays the reference graph:
00000004082945e0 (async pinned handle)
-> 0000000535b3a3e0 System.Threading.OverlappedData
-> 00000006f5266d38 System.Threading.IOCompletionCallback
-> 0000000b35402220 System.Net.Sockets.SocketAsyncEventArgs
-> 0000000bf578c850 System.Net.Sockets.Socket
-> 0000000bf578c900 System.Net.SocketAddress
-> 0000000bfaf93b80 System.Byte[]
00000004082e2148 (pinned handle)
-> 0000000bfaf93b80 System.Byte[]
The last two lines tell you the object is pinned.
Pinned objects are buffers than can't be moved because their address is shared with non-managed code. Here you can guess it is the system TCP layer. When managed code needs to send the address of a buffer to external code, it needs to "pin" the buffer so that the address remains valid: the GC cannot move it.
These buffers, while being a very small part of the memory make compaction impossible and thus cause large memory "leak", even if it is not exactly a leak, more a fragmentation problem. This can happen on the LOH or on generational heaps just the same. Now the question is: what is causing these pinned objects to live forever: find the root cause of the leak that causes the fragmentation.
You can read similar questions here:
https://ayende.com/blog/181761-C/the-curse-of-memory-fragmentation
.NET deletes pinned allocated buffer (good explanation of pinned objects in the answer)
Note: the root cause was in a third party library AerospikeClient using the .NET async Socket API that is known for pinning the buffers sent to it. While AerospikeClient properly used a buffer pool, the buffer pool was re-created when re-creating their client. Since we re-created their client every hour instead of creating one forever, the buffer pool was re-created, causing a growing number of pinned buffers, in turn causing unlimited fragmentation. What remains unclear is why old buffers are never unpinned when transmission is over or at least when their client is disposed.