Search code examples
c#pollyretry-logicexponential-backoffresiliency

Maximum number of retries using DecorrelatedJitterBackoff


I am using polly DecorrelatedJitterBackoff policy for retrying the http request. My use case is some thing like when the timeSpan reaches 300 seconds it should retry int.maximum number of times for every 300 seconds.

I am trying to achieve this using the following code. I used int.MaxValue which gave out of range exception so i am using 2100000000. The code works but takes too much of time to execute.Please suggest an efficient way to achieve this?

    private static readonly List<int> ExceptionCodes = new List<int> { 408, 429, 500, 503, 504, 520 };
    var delay = Backoff.DecorrelatedJitterBackoffV2(medianFirstRetryDelay: TimeSpan.FromMilliseconds(2500), 10);

    var decorrelatedJitterDelay = this.GetTimeSpanList(delay.ToArray());

    this.RetryPolicy = Policy.HandleResult<HttpResponseMessage>(r => ExceptionCodes.Contains((int)r.StatusCode))
     .WaitAndRetryAsync(decorrelatedJitterDelay);

     var policyResult = this.RetryPolicy.ExecuteAndCaptureAsync(() => this.RequestServer(equipmentId));



     private IEnumerable<TimeSpan> GetTimeSpanList(TimeSpan[] delay)
            {
                var index = 0;
                var timeSpanList = new List<TimeSpan>();
                foreach (var time in delay)
                {
                    if (time > TimeSpan.FromSeconds(300))
                    {
                       var timeDelay = TimeSpan.FromSeconds(300);
                       delay[index] = timeDelay;
                       timeSpanList.Add(delay[index]);
                    }

                    index++;
                }

               // 2100000000  is the maximum capacity of List<>.
                for (int i = index; i < 2100000000 - index; i++)
                {
                    timeSpanList.Add(TimeSpan.FromSeconds(300));
                }

                return timeSpanList;
            }

Thanks in advance


Solution

  • There are several tiny things that needs to be modified in order to achieve the desired behaviour.

    sleepDurationProvider

    If you look at the definition of the sleepDurationProvider parameter of the WaitAndRetry method then you can see it is a function which will produce a timeSpan based on some input like current retry count, context, etc..

    Func<int, TimeSpan>
    Func<int, DelegateResult<HttpResponseMessage>, Context, TimeSpan>
    ...
    

    So, instead of specifying each sleep duration in advance we can calculate them on demand. This is really good because we can take advantage of yield return to create a new TimeSpan on demand by putting into account the previous ones.

    Here is a sample method, which will generate TimeSpans on demand:

    private static IEnumerable<TimeSpan> GetDelay()
    {
        TimeSpan fiveMinutes = TimeSpan.FromMinutes(5);
        var initialBackOff =  Backoff.DecorrelatedJitterBackoffV2(medianFirstRetryDelay: TimeSpan.FromMilliseconds(2500), 10);
        foreach (var delay in initialBackOff.Where(time => time < fiveMinutes))
        {
            yield return delay;
        }
    
        while (true)
        {
            yield return fiveMinutes;
        }
    }
    
    • We generate the first 10 duration with the DecorrelatedJitterBackoffV2.
    • We filter out those that are greater than 5 minutes (Where(time => time < fiveMinutes))
    • After the retry policy exhausted the initial backoffs the we will always return with 5 minutes.
    • Please note that this Iterator never returns.
      • But because it is consumed on demand that's why it is not a big problem.

    Let's test this method by querying the first 20 sleep durations:

    foreach (var ts in GetDelay().Take(20))
    {
        Console.WriteLine(ts.TotalSeconds);
    }
    

    The output will be:

    0.5985231
    4.0582524
    5.1969925
    15.4724158
    16.4869722
    15.8198397
    75.7497326
    118.5080045
    272.2401684
    300
    300
    300
    300
    300
    300
    300
    300
    300
    300
    300
    

    WaitAndRetry vs WaitAndRetryForever

    Even though the former does have several overloads which accepts an IEnumerable<TimeSpan> parameter I would not recommend it. Most of the overloads require an explicit retryCount and that's why in most people mind this function is considered a predefined, finite retry executor.

    I do suggest to use WaitAndRetryForever because it expresses the intent. Without the need to look at the sleep duration generator it is obvious what do we want.

    Here is the refined RetryPolicy definition:

    var sleepDurations = GetDelay().GetEnumerator();
    var retryPolicy = Policy
        .HandleResult<HttpResponseMessage>(r => ExceptionCodes.Contains((int)r.StatusCode))
        .WaitAndRetryForever(retry =>
        {
            sleepDurations.MoveNext();
            return sleepDurations.Current;
        });
    
    • WaitAndRetryForever does not have any overload which accepts IEnumerable<TimeSpan> that's why we have to use some boilerplate code.
    • sleepDurations is an iterator and each time when the retry policy needs to calculate the sleep duration we move that forward.
    • This algorithm does not put into account the current retry count (retry) so you could use the discards there if you wish (.WaitAndRetryForever(_ => ...)

    Execute vs ExecuteAsync vs ExecuteCapture vs ...

    Depending on how do you specify your policy you can call Execute or ExecuteAsync. The former is for sync operations and the latter one is for async I/O operations.

    RetryPolicy<HttpResponseMessage> retryPolicy = Policy.....WaitAndRetryForever(...
    retryPolicy.Execute(() => ....);
    

    or

    AsyncRetryPolicy<HttpResponseMessage> retryPolicy = Policy.....WaitAndRetryForeverAsync(...
    await retryPolicy.ExecuteAsync(async () => await ...)
    
    • ExecuteAsync anticipates an async function (Func<Task<...>>) that's why we have to use async() => await ...
    • ExecuteAsync does return with a Task so it should be awaited as well.

    If your RequestServer is a sync method then use the former if it is async then use the latter.

    I also encourage you to use the simple Execute instead of ExecuteAndCapture if you are not interested about the policy related information just about the result itself.

    var result = retryPolicy.Execute(() => this.RequestServer(equipmentId));
    sleepDurations.Dispose();