We have a Asp.net 4.5 mvc webapi which has about 100 app domains, each containing a extension.
Now from time to time we have hangs of the api. Not a single route is responding even a status api which only returns a string does not reply.
When it hangs the site has about 120 threads (which is quite normal) and about 12 GB RAM (which is unusual high).
When we do a Memory Dump we can see that the Site is always in the middle of a garbage collection.
Most of the time we see that the most threads hang in a stack with code handling the serialization between the app domains and is waiting for the GC. We also have a lot serializations, like for the app domain communication and in combination with some redis caches
Event when waiting about 5 Minutes the hang does not end. Is there any known issues of the Garbage Collection related to many app domains?
As the site is hosted in IIS the background GC should always be active.
When I look at the time in GC Performance counter I can see that the GC is nearly always running
I can See that when the site hangs its constantly 40% time in gc
When the site is in this state i can also see that the memory is permanently slightly increasing.
Any hints on what to test or try to improve?
Would it be likely to have benefits when upgrading the runtime to 4.5.2 ? like this:
ntdll!NtWaitForSingleObject+a
KERNELBASE!WaitForSingleObjectEx+94
clr!CLREventWaitHelper2+38
clr!CLREventWaitHelper+1f
clr!CLREventBase::WaitEx+70
clr!SVR::gc_heap::wait_for_gc_done+55
clr!SVR::WaitLonger+9e
clr!SVR::GCHeap::Alloc+224
clr!JIT_New+142
[[HelperMethodFrame]]
mscorlib_ni!System.Runtime.Serialization.ObjectManager.RegisterFixup(System.Runtime.Serialization.FixupHolder, Int64, Int64)+d1
mscorlib_ni!System.Runtime.Serialization.Formatters.Binary.__BinaryParser.Run()+128
mscorlib_ni!System.Runtime.Serialization.Formatters.Binary.ObjectReader.Deserialize(System.Runtime.Remoting.Messaging.HeaderHandler, System.Runtime.Serialization.Formatters.Binary.__BinaryParser, Boolean, Boolean, System.Runtime.Remoting.Messaging.IMethodCallMessage)+db
mscorlib_ni!System.Runtime.Serialization.Formatters.Binary.BinaryFormatter.Deserialize(System.IO.Stream, System.Runtime.Remoting.Messaging.HeaderHandler, Boolean, Boolean, System.Runtime.Remoting.Messaging.IMethodCallMessage)+1bf
mscorlib_ni!System.Runtime.Remoting.Channels.CrossAppDomainSerializer.DeserializeObject(System.IO.MemoryStream)+f8
mscorlib_ni!System.Runtime.Remoting.Messaging.SmuggledMethodCallMessage.FixupForNewAppDomain()+de8a4e
mscorlib_ni!System.Runtime.Remoting.Channels.CrossAppDomainSink.DoDispatch(Byte[], System.Runtime.Remoting.Messaging.SmuggledMethodCallMessage, System.Runtime.Remoting.Messaging.SmuggledMethodReturnMessage ByRef)+33
mscorlib_ni!System.Runtime.Remoting.Channels.CrossAppDomainSink.DoTransitionDispatchCallback(System.Object[])+92
clr!CallDescrWorkerInternal+83
clr!CallDescrWorkerWithHandler+4a
clr!DispatchCallDebuggerWrapper+1f
clr!DispatchCallSimple+88
clr!ThreadNative::InternalCrossContextCallback+2ea
[[ContextTransitionFrame]]
[[HelperMethodFrame_PROTECTOBJ] (System.Threading.Thread.InternalCrossContextCallback)] System.Threading.Thread.InternalCrossContextCallback(System.Runtime.Remoting.Contexts.Context, IntPtr, Int32, System.Threading.InternalCrossContextDelegate, System.Object[])
mscorlib_ni!System.Runtime.Remoting.Channels.CrossAppDomainSink.DoTransitionDispatch(Byte[], System.Runtime.Remoting.Messaging.SmuggledMethodCallMessage, System.Runtime.Remoting.Messaging.SmuggledMethodReturnMessage ByRef)+a0
mscorlib_ni!System.Runtime.Remoting.Channels.CrossAppDomainSink.SyncProcessMessage(System.Runtime.Remoting.Messaging.IMessage)+15d
mscorlib_ni!System.Runtime.Remoting.Proxies.RemotingProxy.CallProcessMessage(System.Runtime.Remoting.Messaging.IMessageSink, System.Runtime.Remoting.Messaging.IMessage, System.Runtime.Remoting.Contexts.ArrayWithSize, System.Threading.Thread, System.Runtime.Remoting.Contexts.Context, Boolean)+8c
mscorlib_ni!System.Runtime.Remoting.Proxies.RemotingProxy.InternalInvoke(System.Runtime.Remoting.Messaging.IMethodCallMessage, Boolean, Int32)+22c
mscorlib_ni!System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(System.Runtime.Remoting.Proxies.MessageData ByRef, Int32)+1f4
clr!CTPMethodTable__CallTargetHelper3+12
clr!CallTargetWorker2+74
clr!CTPMethodTable::OnCall+1fb
clr!TransparentProxyStub_CrossContextPatchLabel+a
[[TPMethodFrame] (SR.BusPortal.Providers.Contract.Common.IAdapterSearcher.SearchAsync)] SR.BusPortal.Providers.Contract.Common.IAdapterSearcher.SearchAsync(SR.BusPortal.Providers.Contract.Common.AdapterSearchParameters)
SR.BusPortal.Search.Steps.SearchStepOneWay`2+<SearchOneWayAsync>d__3[[System.__Canon, mscorlib],[System.__Canon, mscorlib]].MoveNext()+73
After Some further investigation the appdomains weren't the reason. I hope this saves someone else a lot of search :-)
We had a big in Memory GraphDatabase (which used about 30GB of RAM) in the webapi process. As a result our webapi project and the graphdatabase in the same process were problematic and the GC never succeeded to end the process. With a non asynchronous gc the problem was better but some times a little laggy.
After seperating this database to its own service this behavior never happend again.
There are also many posts about how to optimize code for GC which may help