Occasionally our customers observe an out-of-memory exception in our application. Since we log their actions, we can roughly reproduce what they did, but if I do this and profile the application with dotMemory, I cannot reproduce the exception and the used memory (around 100 MB managed + 500MB unmanaged) is much less than the limit (2GB, since it is a 32bit application). Also, at the point where the exception is caught the current memory usage is requested using Process.GetCurrentProcess().WorkingSet64 which indicates a memory usage between 500 and 900 MB. I know that this number is not very reliable, but it is another indication that there should be enough memory available.
A relevant property of the application is that it deals with time series of measurements (pairs of DateTime and double stored in an array). These objects may be large enough for being stored at the large object heap (LOH). So, heap fragmentation does occur, but while profiling this did not seem to be a big deal. The size of the LOH was less than 100MB including the holes.
May it be possible that the garbage collector (GC) is called after an out-of-memory exception is thrown? I would think, that in the case of an unsatisfied memory allocation request the exception is thrown only if the GC fails to collect enough memory. But maybe this is different for memory allocated in the LOH compared to memory allocated in the generation 0 heap?
Does anyone have an idea, how we could tackle this problem?
We are using VS 2010 SP1 and .NET 4.0. The issue might be related to the question raised here, here and here, but I did not find a satisfying answer there.
Update: Added an exemplary stack trace and a chart of the heap fragementation
There is no unique place where out-of-memory exceptions are triggered, but since it was requested, I add a strack trace:
Exception of type 'System.OutOfMemoryException' was thrown.
mscorlib
at System.Runtime.Serialization.ObjectIDGenerator.Rehash()
at System.Runtime.Serialization.ObjectIDGenerator.GetId(Object obj, Boolean& firstTime)
at System.Runtime.Serialization.Formatters.Binary.ObjectWriter.InternalGetId(Object obj, Boolean assignUniqueIdToValueType, Type type, Boolean& isNew)
at System.Runtime.Serialization.Formatters.Binary.ObjectWriter.Schedule(Object obj, Boolean assignUniqueIdToValueType, Type type, WriteObjectInfo objectInfo)
at System.Runtime.Serialization.Formatters.Binary.ObjectWriter.WriteMembers(NameInfo memberNameInfo, NameInfo memberTypeNameInfo, Object memberData, WriteObjectInfo objectInfo, NameInfo typeNameInfo, WriteObjectInfo memberObjectInfo)
at System.Runtime.Serialization.Formatters.Binary.ObjectWriter.WriteMemberSetup(WriteObjectInfo objectInfo, NameInfo memberNameInfo, NameInfo typeNameInfo, String memberName, Type memberType, Object memberData, WriteObjectInfo memberObjectInfo)
at System.Runtime.Serialization.Formatters.Binary.ObjectWriter.Write(WriteObjectInfo objectInfo, NameInfo memberNameInfo, NameInfo typeNameInfo, String[] memberNames, Type[] memberTypes, Object[] memberData, WriteObjectInfo[] memberObjectInfos)
at System.Runtime.Serialization.Formatters.Binary.ObjectWriter.Write(WriteObjectInfo objectInfo, NameInfo memberNameInfo, NameInfo typeNameInfo)
at System.Runtime.Serialization.Formatters.Binary.ObjectWriter.Serialize(Object graph, Header[] inHeaders, __BinaryWriter serWriter, Boolean fCheck)
at System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Serialize(Stream serializationStream, Object graph, Header[] headers, Boolean fCheck)
... <methods from our application follow>
The following chart from dotMemory depicts the LOH fragmentation after working for about an hour with the tool:
Using the tool vmmap I found the reason for the problem: The actual memory, which is available for the managed heap, is much less than the 2GB limit. The are several shared libraries loaded for interaction with MS Office tools (~400 MB). There are also native code dlls (~300MB) which also allocate unmanaged heap (~300MB). There is also lot's of other stuff and, in the end, only around 700MB remain for the managed heap.
Since the is much less memory available than I originally thought, the LOH fragmentation may have more impact than I suspected and indeed: vmmap shows, that the largest free block in that memory area becomes smaller over timer, even though the available memory remains the same. I think, this proofs the fragmentation is the cause of the problem. The trigger of the exception is often the binary serialization which we use sometimes for deep copying objects. It seems to cause a peak in memory usage.
So what to do about it? I am considering the following options: