c#ado.net circuit-breaker polly exponential-backoff

Can a Polly Circuit Breaker have an exponential durationOfBreak?

We are trying to implement a retry policy for our database logic when we receive timeout exceptions due to exhausting the connection pool. This happens when we have a spike of unusually large activity for a small period of time. We have increased our max pool size to try to avoid this situation, but we would also like to have the retry logic in place as a backup plan.

The documentation for connection pooling states that:

When connection pooling is enabled, and if a timeout error or other login error occurs, an exception will be thrown and subsequent connection attempts will fail for the next five seconds, the "blocking period". If the application attempts to connect within the blocking period, the first exception will be thrown again. Subsequent failures after a blocking period ends will result in a new blocking periods that is twice as long as the previous blocking period, up to a maximum of one minute.

Polly seems well suited to address this problem, through the PolicyWrap combination of Fallback, WaitAndRetry, and Circuit Breaker policies. There's a good picture of this here

Ideally, I was hoping to be able to specify an exponential durationOfBreak for the Circuit Breaker to match the doubling period described above. I haven't seen any examples online of how this could be possible, so maybe it's not possible?

What is the desired configuration approach here? Is it to specify a Circuit Breaker with a 5 second durationOfBreak and then use an exponential retry for the WaitAndRetry component of 5, 10, 20, 40, and 60 seconds? That seems unfortunate in the case that connections have just become available and your old operation just started its 40 second wait while a new operation would work immediately.

Another possibility is to have a 5 second durationOfBreak and then have the WaitAndRetry component use a very small wait with lots of retries, even though we know many of these retries will fail if they come before the documentation states.

I appreciate your feedback!

Solution

Polly does not provide circuit-breakers with varying (for example exponential) duration-of-break.

The following might at first seem counter-intuitive, but: It sounds as if this situation does not require a circuit-breaker with exponential back-off, because the ADO.NET connection pool algorithm described is already effectively providing that.

Reasoning: The purpose of a circuit-breaker is to cease putting calls through to a downstream system which is unlikely to cope with them, so as to: (a) fail fast to the caller; (b) protect the underlying system from excessive load. It sounds as if ADO.NET's algorithm is already fulfilling both these goals.

Similarly, the goal of an exponential back-off retry policy is to prevent retries themselves "multiplying up" the load (creating a self-induced DDOS attack on the underlying system ... more requests come in and existing requests are also retrying). Again, it sounds like ADO.NET's force-you-to-back-off algorithm is enforcing its own exponential back-off to protect the underlying db, so there may (*) not be any benefit in layering your own Polly exponential back-off on top of that.

On the basis ADO.NET is providing its own defences, I would be tempted to do something simple like use a retry policy with fixed retry interval of 5 seconds or 5-plus-tiny-shim seconds. (Whatever "blocking period" is in force, it seems it will be a multiple of 5 seconds.)

This suggestion is based on an assumption this ADO.NET connection pool management is (in respect of this blocking period) all happening caller-side; ie that the ADO.NET code embedded within the calling app is deciding its connection pool is fully utilized and rejecting further connection attempts in the blocking period without putting a network call through to the underlying SQL server to check. If that assumption is incorrect, then the advice (*) above may be bad, and you would be better off using an exponential back-off retry policy to avoid connection retry attempts overloading the database server.

Caveat: I have not worked directly with this particular ADO.NET limit. Those who have might have better advice. Those who know the internal ADO.NET architecture better might know better how 'expensive' it is to keep making tries every five seconds (as I have suggested) which may get rejected.

Addendum: This discussion also ignores any dimension of high parallel demand within the caller causing thread/CPU starvation or similar. If that is a question, consider pro-active load shedding at some known tolerable limit.