Search code examples
c#pollyretry-logicexponential-backoff

Polly DecorrelatedJitterBackoffV2 - how calculate max time required to complete all retries?


We have a listener that receives a message from a Service Bus queue and then sends the body to an API.

We use Polly for resilience in the cloud, namely the DecorrelatedJitterBackoffV2 policy.

Our concern with this policy is that we are unsure of how to calculate the maximum time that it could take to complete all retries, e.g. when medianFirstRetryDelay is set to 500ms and retryCount is set to 3.

This is important to us because of the message lock duration on the Service Bus queue. We want to ensure that the lock duration exceeds the time required to complete all retries.


Solution

  • Retry

    If you use DecorrelatedJitterBackoffV2 to generate the sleep durations then you can iterate through the result since it is an IEnumerable

    IEnumerable<TimeSpan> delays = Backoff.DecorrelatedJitterBackoffV2(
        medianFirstRetryDelay: TimeSpan.FromMilliseconds(500),
        retryCount: 3);
    
    foreach (var delay in delay) 
    ...
    

    Please bear in mind that the generated TimeSpans can vary a lot between each method call.
    I've generated five times the sequences and I've got these

    [
       00:00:00.5042179,
       00:00:00.2196652,
       00:00:00.9364482
    ]
    
    [
       00:00:00.5060196,
       00:00:00.8691744,
       00:00:00.8905491
    ]
    
    [
       00:00:00.3786930,
       00:00:01.0092010,
       00:00:00.0805103
    ]
    
    [
       00:00:00.6507813,
       00:00:00.1045026,
       00:00:00.9623235
    ]
    
    [
       00:00:00.4164084,
       00:00:00.6975145,
       00:00:01.5628308
    ]
    

    If you calculate the sum of the timespans in each sequence by delays.Select(t => t.TotalMilliseconds).Sum() then the results vary between 1.5 seconds and 2.5 seconds (usually).

    Timeout

    You can maximize each operation's duration by applying a local timeout policy on it.

    Local in this context means the following:

    • The retry and the timeout policies are chained
    • The timeout is the inner and the retry is the outer
    • The retry triggers for the TimeoutRejectedException as well

    Let's do the math

    To calculate the worst case scenario you can do the following:

    • As we have seen the delays all together adds up in worst case 2.5 seconds
      • Let's round it to 3 seconds for the sake of simplicity
    • If you have local timeouts then you know how much time does it take (in worst case) for each attempt to fail
      • Since you set the retryCount to 3 that means you have 4 attempts (the initial call and the 3 retries)

    If you set the timeout to 1.5 seconds that means it worst case it will finish in 9 seconds (4x 1.5seconds + 3seconds sleep).

    Of course if you execute some time consuming code in the onTimeout or inside the onRetry then should add those to your calculation as well.