Search code examples
c#.net-coretimeoutdotnet-httpclientpolly

Demystify HTTP timeout and retry with Polly


Register services:

var host = new HostBuilder().ConfigureServices(services =>
{
    services.AddHttpClient<Downloader>(client =>
    {
        client.Timeout = TimeSpan.FromSeconds(1); // -- T1
    })
    .AddPolicyHandler(HttpPolicyExtensions
        .HandleTransientHttpError()
        .Or<HttpRequestException>()
        .WaitAndRetryAsync(Backoff.DecorrelatedJitterBackoffV2(
            TimeSpan.FromSeconds(5), // -- T2
            retryCount: 3)))
    .AddPolicyHandler(Policy.TimeoutAsync<HttpResponseMessage>(10)) // -- T3
    .AddPolicyHandler(HttpPolicyExtensions
        .HandleTransientHttpError()
        .CircuitBreakerAsync(5, TimeSpan.FromSeconds(30))); // -- T4

    services.AddTransient<Downloader>();

}).Build();

The implementation of Downloader:

class Downloader
{
    private HttpClient _client;
    public Downloader(IHttpClientFactory factory)
    {
        _client = factory.CreateClient();
    }

    public void Download(List<Uri> links)
    {
        await Parallel.ForEachAsync(
            links, 
            async (link, _cancelationToken) =>
            {
                await _client.GetStreamAsync(uri, _cancellationToken);
            });
    }
}

In this pseudocode, I'm confused about the correlation between the timeouts, and how/when an HTTP request will be resubmitted. Specifically:

  • How T1, T2, T3, and T4 are "orchestrated"? I am assuming if the endpoint does not respond in T1, await _client.GetStreamAsync throws timeout exception, then in intervals related to T2, the HTTP request will be submitted for max 3 times, or if the circuit breaker timer reaches T4. Then what is the role of T3?

  • Is all the configuration related to the client and HttpMessageHandler, and I still need to wrap the call to GetStreamAsync as the following?!

Policy
    .Handle<Exception>()
    .RetryAsync(3)
    .ExecuteAsync(
        async () => await _client.GetStreamAsync(uri, _cancellationToken));

Solution

  • First let me provide some suggests and then discuss your questions.

    Prefer Wrap over multiple AddPolicyHandlers

    The AddPolicyHandler registers a PolicyHttpMessageHandler which is a DelegatingHandler. In your case you have 3 DelegatingHandlers so the exception propagation is done by the ASP.NET Core instead of Polly.

    If you would prefer Policy.WrapAsync then you could chain the policies in a Polly way (escalation).

    var T2 = HttpPolicyExtensions
            .HandleTransientHttpError()
            .Or<HttpRequestException>()
            .WaitAndRetryAsync(Backoff.DecorrelatedJitterBackoffV2(
                TimeSpan.FromSeconds(5),
                retryCount: 3));
    
    var T3 = Policy.TimeoutAsync<HttpResponseMessage>(10);
    
    var T4 = HttpPolicyExtensions
            .HandleTransientHttpError()
            .CircuitBreakerAsync(5, TimeSpan.FromSeconds(30));
    
    var resilienceStrategy = Policy.WrapAsync<HttpResponseMessage>(T2, T3, T4);
    var host = new HostBuilder().ConfigureServices(services =>
    {
        services
          .AddHttpClient<Downloader>(client =>
            client.Timeout = TimeSpan.FromSeconds(1))
          .AddPolicyHandler(resilienceStrategy);
    
        services.AddTransient<Downloader>();
    
    }).Build();
    

    Further suggestions

    Swapping Timeout and Circuit Breaker

    By the way it might make sense to swap the Timeout and the CircuitBreaker policies. If the Timeout would be the most inner and you would adjust the Circuit Breaker policy to be aware of timeout problems (.Or<TimeoutRejectedException>()) then it could break for that as well.

    var T3 = Policy.TimeoutAsync<HttpResponseMessage>(10);
    
    var T4 = HttpPolicyExtensions
            .HandleTransientHttpError()
            .Or<TimeoutRejectedException>()
            .CircuitBreakerAsync(5, TimeSpan.FromSeconds(30));
    
    var resilienceStrategy = Policy.WrapAsync<HttpResponseMessage>(T2, T4, T3);
    

    Retry in more cases

    It might make sense to perform retry in case of timeout or broken circuit as well

    var T2 = HttpPolicyExtensions
            .HandleTransientHttpError()
            .Or<HttpRequestException>()
            .Or<TimeoutRejectedException>()
            .Or<BrokenCircuitException>()
            .WaitAndRetryAsync(Backoff.DecorrelatedJitterBackoffV2(
                TimeSpan.FromSeconds(5),
                retryCount: 3));
    

    Reply to question #2

    Please allow me to start with your second question.

    Is all the configuration related to the client and HttpMessageHandler, and I still need to wrap the call to GetStreamAsync as the following?!

    No, you don't need to do that. Since you have decorated your HttpClient with your resilience policies that's why you don't need to do the same for each HttpClient's method call.

    Reply to question #1

    Please allow me to separate this into multiple questions

    How T1, T2, T3, and T4 are "orchestrated"?

    • T1 (HttpClient's Timeout) acts a global timeout. This means if you need to perform several retries then they will be cancelled after 1 second. So, this is an overarching time constraint for all of the attempts.
    • T2 (Retry's backoff's medianFirstRetryDelay) provides the sleep duration sequence between retries in an exponential backoff manner with jitter. In other words rather than waiting always the same amount of time between each retry attempt it waits every time more and more.
    • T3 (Timeout) acts a local timeout. In other words this timeout is rested and enforced for each and every retry attempt
      • This is the contrast of HttpClient's Timeout which is a global timeout
      • If you define a timeout policy as the most outer policy (the left-most parameter of Policy.WrapAsync) then it also acts as global
    • T4 (Circuit Breaker's breakDuration) acts as a gatekeeper. If the CB breaks (in your case after 5 successive failures) then it transitions itself into Open state.
      • Any request is terminated with a BrokenCircuitException while the CB is Open
      • After the breakDuration elapsed then it transitions into HalfOpen state and allows a single probe. If it succeeds then CB moves back to Closed state otherwise to Open

    I am assuming if the endpoint does not respond in T1, await _client.GetStreamAsync throws timeout exception, then in intervals related to T2, the HTTP request will be submitted for max 3 times, or if the circuit breaker timer reaches T4. Then what is the role of T3?

    With the previous point I think I've addressed this question :). Because you have set the global timeout to 1 seconds that's why your local timeout (10 seconds) will never fire.

    If you would set a higher value for your global timeout then the per request (retry attempt based) timeout could fire.


    If any of the above points is unclear please let me know and I will link some of my earlier post which discuss that in great detail.


    UPDATE #1

    Could you please elaborate a bit more on T1 vs T3?

    In the following SO topics I try to make it clear what is the difference between global and local timeouts:

    And finally here is an SO topic which covers how to have longer Timeout than the HttpClient's Timeout.

    Should I implement the logic of catching CB's Open, HalfOpen and Closed states and buffering and holding requests w.r.t. the state, or CB internally buffers and resubmits requests when appropriate?

    Circuit Breaker does not work like that. Circuit Breaker does not maintain something like a request queue. It is just a proxy, which can short cut the execution of requests if the downstream system is treated as temporarily unavailable. The CB itself is not performing any retry logic.

    The Rate limiter policy also works in the same way. It does not hold the requests until there is enough throughput.

    What you can do is to create a CB aware retry logic and combine them