We're having a problem that NServiceBus crashes after about 4-5 minutes after connection lost to RabbitMQ Server.
To reproduce, I started my app, saw that RabbitMQ sees the connections, disconnected my network cable, and waited. After about 5 minutes NServiceBus host crashed.
When running in Debug, I got the following error message:
Additional information: The runtime has encountered a fatal error. The address of the error was at 0xf6a94323, on thread 0xf8b8. The error code is 0x80131623. This error may be a bug in the CLR or in the unsafe or non-verifiable portions of user code. Common sources of this bug include user marshaling errors for COM-interop or PInvoke, which may corrupt the stack.
On our server we have the following in EventLog:
Application: NServiceBus.Host.exe
Framework Version: v4.0.30319
Description: The application requested process termination through System.Environment.FailFast(string message).
Message: The following critical error was encountered by NServiceBus:
Repeated failures when communicating with the broker
NServiceBus is shutting down.
Stack:
at System.Environment.FailFast(System.String, System.Exception)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
This is our RabbitMQ connection string:
<add name="NServiceBus/Transport" connectionString="host=our_host_address;VirtualHost=OurVirtualHost;UserName=OurUser;Password=******;PrefetchCount=1;DequeueTimeout=30" />
What's causing this crash? is there a way to recover from it / catch it? how can we handle disconnections from RabbitMQ server gracefully?
This happens because the circuit breaker makes sure the service does not hang but shut down if it is unable to do its work.
You can configure the endpoint to have a longer timeout if the connection is dropped, see "controlling behavior when broker connection is lost" for more information
In addition, you can set the service recovery to restart on failure.