Search code examples
c#.netpollyretry-logicexponential-backoff

Polly WaitAndRetryAsync hangs after one retry


I'm using Polly in very basic scenario to do exponential backoff if an HTTP call fails:

protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
{
    return await HandleTransientHttpError()
        .Or<TimeoutException>()
        .WaitAndRetryAsync(4, retryAttempt => TimeSpan.FromSeconds(Math.Pow(3, retryAttempt)))
        .ExecuteAsync(async () => await base.SendAsync(request, cancellationToken).ConfigureAwait(false));
}

private static PolicyBuilder<HttpResponseMessage> HandleTransientHttpError()
{
    return Policy
        .HandleResult<HttpResponseMessage>(response => (int)response.StatusCode >= 500 || response.StatusCode == System.Net.HttpStatusCode.RequestTimeout)
        .Or<HttpRequestException>();
}

I have a test API that just creates an HttpListener and loops in a while(true). Currently, I'm trying to test if the client retries correctly when receiving 500 for every single call.

while (true)
{
    listener.Start();
    Console.WriteLine("Listening...");
    HttpListenerContext context = listener.GetContext();
    HttpListenerRequest request = context.Request;

    HttpListenerResponse response = context.Response;
    response.StatusCode = (int)HttpStatusCode.InternalServerError;

    //Thread.Sleep(1000 * 1);
    string responseString = "<HTML><BODY> Hello world!</BODY></HTML>";
    byte[] buffer = System.Text.Encoding.UTF8.GetBytes(responseString);
    response.ContentLength64 = buffer.Length;
    System.IO.Stream output = response.OutputStream;
    output.Write(buffer, 0, buffer.Length);
    output.Close();
    listener.Stop();
}

With the above code all works well and the retries happen after 3, 9, 27 and 81 seconds of waiting, respectively.

However, if I uncomment the Thread.Sleep call, the client retries once and then just hangs until the call times out for the other 3 retries, which is not the correct behavior.

The same thing also happens with the actual production API, which leads me to believe it's not a problem with my test API.


Solution

  • Using Polly within HttpClient doesn't work very well. A single SendAsync is intended to be a single call. I.e.:

    • Any HttpClient timeouts will apply to the single SendAsync call.
    • Some versions of HttpClient also dispose their content, so it can't be reused in the next SendAsync call.
    • As noted in the comments, this kind of hang is a known issue and cannot be fixed by Polly.

    Bottom line: overriding SendAsync is great for adding pre-request and post-request logic. It's not the right place to retry.

    Instead, use a regular HttpClient and have your Polly logic retry outside the GetStringAsync (or whatever) call.