Search code examples
c#pollycircuit-breakerretry-logicocelot

Polly re-try policy not working in conjunction with circuit breaker with Ocelot


I wanted to use Polly re-try and circuit breaker with Ocelot api gateway. I am trying to wrap policies with DelegatingHandler, the circuit breaker works, but re-try not works.

Below code just throw the exception, but NO re-try happening. When I am calling the API 3 times, circuit opens.

"ExceptionsAllowedBeforeBreaking": 3,
.CircuitBreakerAsync(route.QosOptions.ExceptionsAllowedBeforeBreaking,

[HttpGet("RaiseException")]
    public async Task<int> RaiseException()
    {
        await Task.Delay(1);
        throw new Exception("Mock Exception");
    }

pollyQosProvider

Custom Handler:

 public class PollyWithInternalServerErrorCircuitBreakingDelegatingHandler : DelegatingHandler
{
    private readonly IOcelotLogger _logger;
    private readonly Polly.Wrap.AsyncPolicyWrap<HttpResponseMessage> _circuitBreakerPolicies;
    public PollyWithInternalServerErrorCircuitBreakingDelegatingHandler(DownstreamRoute route, IOcelotLoggerFactory loggerFactory)
    {
        _logger = loggerFactory.CreateLogger<PollyWithInternalServerErrorCircuitBreakingDelegatingHandler>();

        var pollyQosProvider = new PollyQoSProvider(route, loggerFactory);

        var retryPolicy = HttpPolicyExtensions.HandleTransientHttpError()
                            .OrResult(r => r.StatusCode == HttpStatusCode.NotFound)
                            .WaitAndRetryAsync(2, retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt)));

        var responsePolicy = Policy.HandleResult<HttpResponseMessage>(r => r.StatusCode == HttpStatusCode.InternalServerError)
            .CircuitBreakerAsync(route.QosOptions.ExceptionsAllowedBeforeBreaking,
                TimeSpan.FromMilliseconds(route.QosOptions.DurationOfBreak));
        _circuitBreakerPolicies = Policy.WrapAsync(pollyQosProvider.CircuitBreaker.Policies)
            .WrapAsync(retryPolicy).WrapAsync(responsePolicy);
    }

    protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        try
        {
            return await _circuitBreakerPolicies.ExecuteAsync(() => base.SendAsync(request, cancellationToken));
        }
        catch (BrokenCircuitException ex)
        {
            _logger.LogError($"Reached to allowed number of exceptions. Circuit is open", ex);
            throw;
        }
        catch (HttpRequestException ex)
        {
            _logger.LogError($"Error in CircuitBreakingDelegatingHandler.SendAsync", ex);
            throw;
        }
    }
}

Ocelot Builder Extensions:

public static class OcelotBuilderExtensions
{
    public static IOcelotBuilder AddPollyWithInternalServerErrorHandling(this IOcelotBuilder builder)
    {
        var errorMapping = new Dictionary<Type, Func<Exception, Error>>
        {
            {typeof(TaskCanceledException), e => new RequestTimedOutError(e)},
            {typeof(TimeoutRejectedException), e => new RequestTimedOutError(e)},
            {typeof(BrokenCircuitException), e => new RequestTimedOutError(e)}
        };

        builder.Services.AddSingleton(errorMapping);

        DelegatingHandler QosDelegatingHandlerDelegate(DownstreamRoute route, IOcelotLoggerFactory logger)
        {
            return new PollyWithInternalServerErrorCircuitBreakingDelegatingHandler(route, logger);
        }

        builder.Services.AddSingleton((QosDelegatingHandlerDelegate)QosDelegatingHandlerDelegate);

        return builder;
    }
}

Program.cs

var builder = WebApplication.CreateBuilder(args);

        //Ocelot add it's configuration file
        builder.Configuration.AddJsonFile($"ocelot.config.{builder.Environment.EnvironmentName}.json", optional: false, reloadOnChange: true);
        builder.Services.AddOcelot(builder.Configuration)
            .AddPollyWithInternalServerErrorHandling();

Ocelot Configuration

 "UpstreamHttpMethod": [ "GET" ],
  "QoSOptions": {
    //Number of exceptions which are allowed before the circuit breaker is triggered.
    "ExceptionsAllowedBeforeBreaking": 3,
    //Duration in milliseconds for which the circuit breaker would remain open after been tripped
    "DurationOfBreak": 5000,
    //Duration after which the request is considered as timedout
    "TimeoutValue": 100000
  }

Solution

  • Even though the problem has been solved by removing Ocelot, let me share my thoughts about your policies.


    The escalation policy

    The way how you chain the policies to each other defines an escalation order.

    So, first let's see how does your policy chain looks like.

    The PollyQoSProvider helper class defines two policies in the following order:

    • A Circuit Breaker which triggers for HttpRequestException, TimeoutRejectedException and TimeoutException
    • And a Timeout

    These policies do not return any value.

    You have defined two other policies:

    • A Retry which triggers for HttpRequestException or when the status code is either 404 or 408 or 5XX
    • A Circuit Breaker which triggers when the status code is 500

    These policies do return with an HttpResponseMessage.

    You have chained them in the following order (from the most outer to the most inner):

    Circuit Breaker which triggers for HttpRequestException, TimeoutRejectedException and TimeoutException

    Timeout

    Retry which triggers for HttpRequestException or when the status code is either 404 or 408 or 5XX

    Circuit Breaker which triggers when the status code is 500

    This may or may not be your desired resiliency strategy. I would advice you to reassess whether this is what you really want/need.

    Chaining the policies

    The policies of the PollyQoSProvider are defined for async methods (Task) whereas yours are defined for async functions (Task<HttpResponseMessage>). The static WrapAsync does not allow to combine these two types of policies. On the hand the instance level WrapAsync does. (For more information about this constraints please read this SO topic.)

    Because you have used the combination of the two that's why the chained policy is an IAsyncPolicy<HttpResponseMessage>. Even though it's working, I usually suggest to use only the static WrapAsync to chain policies due to its compile-time compatibility guarantees.

    Multi level circuit breakers

    I'm using Polly for awhile and I haven't encountered any use case where multiple (nested) circuit breakers would be really required. Most of time you can (and should) solve it with a single CB which can trigger for multiple different conditions:

    Circuit Breaker which triggers when the status code is 500 or for the following exceptions: HttpRequestException, TimeoutRejectedException and TimeoutException

    Policy<HttpResponseMessage>
      .HandleResult(r => r.StatusCode == HttpStatusCode.InternalServerError)
      .Or<HttpRequestException>()
      .Or<TimeoutRejectedException>()
      .Or<TimeoutException>()
      .CircuitBreakerAsync(
    

    Circuit breaker and its shared state

    The CB was designed in a way that it can be shared between multiple components. If you have already detected that the downstream is temporarily inaccessible then use this information everywhere rather than issue new requests and come to the same conclusion.

    So, defining a CB inside a DelegatingHandler is against this. Each and every DelegatingHandler will have its own CB so, they do not share state via the ICircuitController. Aim for reusing CB policy.

    Timeout strategies

    Timeout can work in optimistic or in pessimistic mode. Even though your code looks like at first glance it uses optimistic, unfortunately it does not.

    
    protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        try
        {
            return await _circuitBreakerPolicies.ExecuteAsync(() => base.SendAsync(request, cancellationToken));
        }
        ...
    }
    

    The proper way would be the following by using a different overload of ExecuteAsync:

    await _circuitBreakerPolicies.ExecuteAsync((ct) => base.SendAsync(request,ct), cancellationToken);
    

    OFF: The PollyQoSProvider's CB was defined in a way that it can break for optimistic timeout (TimeoutRejectedException) and pessimistic timeout (TimeoutException) as well.