Search code examples
c#.nettimeoutpollyretry-logic

Retry strategy with exponential timeout


I use Polly to retry HTTP Requests. The following code works fine:

IAsyncPolicy<HttpResponseMessage> waitTimeout = Policy.TimeoutAsync<HttpResponseMessage>(TimeSpan.FromSeconds(5));

// Retry policy for transient errors
IAsyncPolicy<HttpResponseMessage> retry = HttpPolicyExtensions.HandleTransientHttpError()
    .Or<TaskCanceledException>()
    .Or<TimeoutRejectedException>()
    .OrResult(msg => msg.StatusCode == System.Net.HttpStatusCode.ServiceUnavailable)
    .OrResult(msg => msg.StatusCode == System.Net.HttpStatusCode.InternalServerError)
    .WaitAndRetryAsync(12, _ => TimeSpan.FromSeconds(10));

// Register the HTTP client with the Polly policies
builder.Services.AddHttpClient(
        "ApiClient",
        client => {client.Timeout = TimeSpan.FromSeconds(230); })
    .AddHeaderPropagation()
    .AddPolicyHandler(Policy.WrapAsync(retry, waitTimeout));

The time before the next retry is 10 seconds and the timeout for each request within the retry is 5 seconds. The problem is that the API service can respond slowly. I would like to have the exponential timeout for each request within retry (not delay time)


Solution

  • By design every resiliency policy works independently. That means none of the policies have any idea where are they in the policy chain. You can think of the policy chain as an extra resiliency policy, sometimes referred as escalation

    Whenever we try to deal with a given problem we are doing it in a given context, which might not be sufficient. The context determines our possibilities and scope. Others might live in different context, which might open up other opportunities. I used the verb might, because escalation does not necessary solve the problem. It is based on an assumption, that if I have right A and B then my supervisor would have more privileges than I do.

    Let's apply this knowledge to your problem. Your timeout and retry policies are working independently, but the WrapAsync method allows us to chain them. The timeout becomes the inner policy and the retry becomes the outer.

    The Polly.Context is the utility to communicate between the policies. In your case that means the retry's OnRetry{Async} could store the retry attempt number in the context. This information could be read by the timeout policy whenever it generates the new timeout threshold.

    Timeout

    The timeout policy has several overloads which anticipates a timeoutProvider. In some cases the function which receives the Context and returns a TimeSpan, like this:

    public static AsyncTimeoutPolicy TimeoutAsync(Func<Context, TimeSpan> timeoutProvider)
    

    Here is how to use it:

    var waitTimeout = Policy
       .TimeoutAsync(ctx => TimeSpan.FromSeconds(Math.Pow(2, ctx.GetAttemptNumber() + 1)));
    

    In order to ease the communication with the Context some extension methods could be introduced:

    public static class ContextExtensions
    {
        private static readonly string key = "AttemptNumber";
    
        public static Context SetAttemptNumber(this Context context, int attemptNumber)
        {
            context[key] = attemptNumber;
            return context;
        }
    
        public static int GetAttemptNumber(this Context context)
        {
            context.TryGetValue(key, out object obj);
            return obj is int attemptNumber ? attemptNumber : 0;
        }
    }
    

    With these in our hand the GetAttemptNumber method could be used inside the timeout policy and the SetAttemptNumber inside the retry.

    Retry

    In case of retry there are a tons of overloads. Here is one of them that can be useful to access Context inside the onRetry:

    public static AsyncRetryPolicy WaitAndRetryAsync(this PolicyBuilder policyBuilder, int retryCount, Func<int, TimeSpan> sleepDurationProvider, Action<Exception, TimeSpan, int, Context> onRetry)
    

    Here is how to use it:

    var retry = Policy
        .Handle<TimeoutRejectedException>()
        .WaitAndRetryAsync(
            retryCount: 5, 
            sleepDurationProvider: _ => TimeSpan.FromMilliseconds(100), 
            onRetry: (ex, ts, attemptCount, ctx) => ctx.SetAttemptNumber(attemptCount));
    

    Together

    var timeout = Policy.TimeoutAsync(
        timeoutProvider: ctx => 
        {
            var ts = TimeSpan.FromSeconds(Math.Pow(2, ctx.GetAttemptNumber() + 1));
            Console.WriteLine($"New timeout: {ts}");
            return ts;
        }, 
        onTimeoutAsync: (ctx, ts, t) => { Console.WriteLine($"Timeout: {ts}"); return Task.CompletedTask; });
    
    var retry = Policy
        .Handle<TimeoutRejectedException>()
        .WaitAndRetryAsync(
            retryCount: 5, 
            sleepDurationProvider: _ => TimeSpan.FromMilliseconds(100), 
            onRetry: (ex, ts, attemptCount, ctx) => 
            {
                Console.WriteLine($"Retry after {attemptCount}th attempt");
                ctx.SetAttemptNumber(attemptCount);
            });
    
    var strategy = Policy.WrapAsync(retry, timeout);
    
    var ctx = new Context();
    await strategy.ExecuteAsync(async (ctx, ct) => {
        Console.WriteLine($"-=< {ctx.GetAttemptNumber()}th attempt >=-");
        Console.WriteLine($"Execution started: {DateTime.UtcNow}");
        await Task.Delay(100_000, ct);
    }, ctx, CancellationToken.None);
    

    Output:

    New timeout: 00:00:02
    -=< 0th attempt >=-
    Execution started: 11/22/2024 10:39:09 AM
    Timeout: 00:00:02
    Retry after 1th attempt
    
    New timeout: 00:00:04
    -=< 1th attempt >=-
    Execution started: 11/22/2024 10:39:11 AM
    Timeout: 00:00:04
    Retry after 2th attempt
    
    New timeout: 00:00:08
    -=< 2th attempt >=-
    Execution started: 11/22/2024 10:39:15 AM
    Timeout: 00:00:08
    Retry after 3th attempt
    
    New timeout: 00:00:16
    -=< 3th attempt >=-
    Execution started: 11/22/2024 10:39:23 AM
    Timeout: 00:00:16
    Retry after 4th attempt
    
    New timeout: 00:00:32
    -=< 4th attempt >=-
    Execution started: 11/22/2024 10:39:40 AM
    Timeout: 00:00:32
    Retry after 5th attempt
    
    New timeout: 00:01:04
    -=< 5th attempt >=-
    Execution started: 11/22/2024 10:40:12 AM
    Timeout: 00:01:04
    
    Unhandled exception. Polly.Timeout.TimeoutRejectedException: The delegate executed asynchronously through TimeoutPolicy did not complete within the timeout.
    ...
    

    Working example: https://dotnetfiddle.net/J77BVq

    Please note that the execution is force-stopped on dotnet fiddle after 10 seconds.