Search code examples
c#restsharppollyretry-logiccircuit-breaker

Using Polly v8 and RestSharp, how would I build a Generic ResiliencePipeline to account for response header retry-after, exceptions, and logging


I have an API client class built using _client = new RestSharp.RestClient(...); with methods of this form:

public async Task<TResponse> PostAsync<TRequest, TResponse>(string resource, TRequest payload) where TRequest : class
{
    var request = new RestRequest(resource);
    request.AddJsonBody(payload);
    var response = await PolicyFactory.RestSharpPolicy<B2BResponse<TResponse>>(_logger).ExecuteAsync(
        async token => await _client.ExecuteAsync<B2BResponse<TResponse>>(request, Method.Post, token));
    LogAndRaiseErrors(request, response);
    return response.Data.Data;
}

Here's the LogAndRaiseErrors method:

protected void LogAndRaiseErrors(RestRequest request, RestResponse response)
{
    if (response.StatusCode != HttpStatusCode.OK)
    {
        _logger.LogError("API Error:{response}", SerializeResponse(response));
        throw new BigCommerceException(response.Content);
    }
    _logger.LogDebug("B2B API Call:\n{response}", SerializeResponse(response));
} 

I've had a look at the Polly documentation, but it is a bit sparse.
How would I construct the ResiliencePipeline used in the PostAsync method to achieve the following primary goals:

  • Read the response.Headers.FirstOrDefault(h=>h.Name?.ToLower() == "retry-after")?.Value and delay for the specified seconds, which the existing code does not do at all.
  • I Believe retry-after should affect all threads which may be calling the given API? add Circuit Breaker?

Any pointers on secondary goals would be great:

  • Log every request (currently done in LogAndRaiseErrorsMethod)
  • Log when retries occur including the delay specified in the header.
  • Move the LogAndRaiseErrors(...) into the Policy so the policy can deal with logging and any exceptions
  • Add Policy to retry for network failures/timeouts etc.
  • I suspect we need some jitter (is that what that's for?) so not all threads pile on the API at once?

UPDATE here's what I tried:
I think I have retry working correctly, but not circuit breaker. I moved pipeline creation to the constructor so there is only one pipeline (logging shows it's called once)

public ApiClient(..., ILogger<B2BClient> logger)
{
    ...
    _client = client;
    _resiliencePipeline = PolicyFactory.GetRestSharpPolicy(_logger);
}

public async Task<TResponse> GetAsync<TResponse>(string resource)
{...}

public async Task<TResponse> PostAsync<TRequest, TResponse>(string resource, TRequest payload) where TRequest : class
{
    var request = new RestRequest(resource);
    request.AddJsonBody(payload);
    var response = await _resiliencePipeline.ExecuteAsync(
        async token => await _client.ExecuteAsync<BigCommerceResponse<TResponse>>(request, Method.Post, token));
    LogAndRaiseErrors(request, response);
    return response.Data.Data;
}

Here's my pipeline creation, but I never see any Circuit Breaker log entries, but I do see a lot of Retry log entries for Retry.

public static class PolicyFactory
{

    public static ResiliencePipeline<RestResponse> GetRestSharpPolicy(ILogger logger)
    {
        logger.LogInformation("Building ResiliencePipeline");
        return new ResiliencePipelineBuilder<RestResponse>()
            .AddCircuitBreaker(new CircuitBreakerStrategyOptions<RestResponse>
            {
                FailureRatio = 0,
                ShouldHandle = new PredicateBuilder<RestResponse>()
                    .HandleResult(static result => result.StatusCode == HttpStatusCode.TooManyRequests),
            
                OnOpened = args =>
                {
                    logger.LogWarning("Circuit Breaker Opened on {StatusCode} for {Duration}s ({ResponseUri})",
                        args.Outcome.Result.StatusCode, args.BreakDuration.TotalSeconds, args.Outcome.Result.ResponseUri);
                    return ValueTask.CompletedTask;
                },
                OnClosed = args =>
                {
                    logger.LogWarning("Circuit Breaker Closed on {StatusCode} ({ResponseUri})",
                        args.Outcome.Result.StatusCode, args.Outcome.Result.ResponseUri);
                    return ValueTask.CompletedTask;
                }
            })
            .AddRetry(new RetryStrategyOptions<RestResponse>
            {
                ShouldHandle = new PredicateBuilder<RestResponse>()
                    .HandleResult(static result => result.StatusCode == HttpStatusCode.TooManyRequests),
                DelayGenerator = delayArgs =>
                {
                    var retryAfter = delayArgs.Outcome.Result.Headers.FirstOrDefault(h => h.Name?.ToLower() == "retry-after")?.Value.ToString();
                    return int.TryParse(retryAfter, out var seconds)
                        ? new ValueTask<TimeSpan?>(TimeSpan.FromSeconds(seconds))
                        : new ValueTask<TimeSpan?>(TimeSpan.FromSeconds(0.5));
                },
                MaxRetryAttempts = 5,
                OnRetry = args =>
                {
                    logger.LogWarning("Retry Attempt:{Attempt} Delay:{Delay}",
                        args.AttemptNumber, args.RetryDelay);
                    return ValueTask.CompletedTask;
                }
            })
            .Build();
    }
}

Solution

  • There are a couple of tiny things that need to be fixed to make it work.

    Registration order

    There is a huge difference between the following two pipelines

    var pipeline = new ResiliencePipelineBuilder()
        .AddCircuitBreaker( new() { ... }) // outer
        .AddRetry(new() { ... }) // inner
        .Build();
    
    var pipeline = new ResiliencePipelineBuilder()
        .AddRetry(new() { ... }) // outer
        .AddCircuitBreaker( new() { ... }) // inner
        .Build();
    

    There is a concept called escalation. If the inner strategy does not handle the response/exception then it passes the outcome to next strategy in the chain.

    So, if you define your pipeline that the inner strategy handles the 429 status code then the outer will never trigger. That's why you have to change the order and move the CB to be the innermost.

    We have created a dedicated section on pollydoc to illustrate the differences between the registration orders.

    Circuit Breaker settings

    Please bear in mind that the default values for CB are sensible for production load, not for testing.

    Property Default Value Description
    FailureRatio 0.1 The failure-success ratio that will cause the circuit to break/open. 0.1 means 10% failed of all sampled executions.
    MinimumThroughput 100 The minimum number of executions that must occur within the specified sampling duration.
    SamplingDuration 30 seconds The time period over which the failure-success ratio is calculated.

    This means during 30 seconds you should have at least 100 requests with at least 10 failures to open the circuit.

    The FailureRatio should be read like this

    • 0: A single failed request is enough to open the circuit
    • 1: 100 requests should fail to open the circuit
    • 0.2: 20 requests should fail to open the circuit

    I'm not sure that 0 was intentional or not. If not then I would suggest to play with this code (with different FailureRatio values and different greater than values) to better understand how CB works:

    var pipeline = new ResiliencePipelineBuilder<int>()
        .AddCircuitBreaker(new CircuitBreakerStrategyOptions<int>
        {
            FailureRatio = 1, //0, 0.2, 0.5
            ShouldHandle = new PredicateBuilder<int>().HandleResult(result => result > 100), // 50, 100, 150, 190
            OnOpened = static args => { Console.WriteLine("Opened"); return default; },
            OnClosed = static args => { Console.WriteLine("Closed"); return default; }
        }).Build();
    
    for(int i = 1; i < 200; i++)
    {
        try
        {
           _ = pipeline.Execute(() => i);
        }
        catch(BrokenCircuitException)
        {
            Console.WriteLine($"{i}: BCE");
        }
    }
    

    Retry after

    In your solution you are using the RetryAfter header only for the retry. But you could use that for the Circuit Breaker as well.

    By default the CB breaks for 5 seconds. But you can set the break duration dynamically via the BreakDurationGenerator.

    That would shortcut any outgoing requests with a BrokenCircuitException until the RetryAfter is not elapsed. That would require some change in your Retry strategy to handle BrokenCircuitException as well.

    Please also bear in mind RetryAfter could contain not just delta value but also date time. That means your TimeSpan.FromSeconds could fail if the retry after was set like this: Retry-After: Wed, 10 Jul 2024 09:35:00 GMT


    UPDATE #1

    I'm not sure how I should retrieve and share the delay, I don't have access to args.Outcome in the BreakDurationGenerator

    Hmmm, that's strange. I thought we have added the Outcome to the BreakDurationGeneratorArguments. Gladly that seems an easy fix I'll fly a PR for that this week. Adding the Outcome in a backward compatible way seems quite challenging. I'm looking for another alternative.

    What is the best way to inform Retry the same delay that the CircuitBreaker has.... Should I ditch the circuit breaker and just use retry? am I overcomplicating things?

    The suggested way is to use the Context to exchange information between multiple strategies. Here you can see the details.