Search code examples
c#.netpollyretry-logiccircuit-breaker

How to migrate a standard CircruitBreaker policy from Polly V7 to V8?


Up until now, I used Polly's standard CircuitBreaker policy like this:

int retryCount = 2;

PolicyBuilder policyBuilder = Policy
    .HandleInner<WebException>()
    .Or<WebException>()
    .Or<TimeoutException>()
    .Or<TimeoutRejectedException>();

AsyncCircuitBreakerPolicy asyncCircuitBreakerPolicy = policyBuilder
    .CircuitBreakerAsync(1, TimeSpan.FromSeconds(1), OnBreak, OnReset);

AsyncRetryPolicy asyncWaitAndRetry = policyBuilder
    .WaitAndRetryAsync(retryCount, SleepDurationProvider, OnRetry);

AsyncPolicyWrap defaultPolicy = Policy.WrapAsync(
    asyncCircuitBreakerPolicy,
    asyncWaitAndRetry);

I.e. retry 2 times when an exception is thrown, and after a total of three failures the exception bubbles up to the CircuitBreaker which triggers immediately.

Polly V8 doesn't have this "standard" CircuitBreaker anymore, but something similar to the old "advanced" one. It wants a MinimumThroughput of at least 2, and a failure rate instead of a fixed count. Additionally, it rethrows all Exceptions.

Now I wonder how to migrate to V8. I was thinking about flipping the order of Retry and CircuitBreaker, setting MinimumThroughput = retryCount + 1 and FailureRatio = 1. But then there's also SamplingDuration, so I'd need to make that somehow depend on the expected timeouts, plus the waiting time between retries etc.

Is there another approach to do this? Should I just write my own ResilienceStrategy?


Solution

  • At the time of writing there is an interoperability layer (wrapper) between V8 and V7, but unfortunately it works in one direction only: from V8 to V7 via .AsAsyncPolicy() and .AsSyncPolicy(). In other words, currently there is no built-in support to wrap your existing V7 policy into a V8 strategy.

    So, what can we do?

    Option #1

    Based on your description there are two blockers to use the V8's circuit breaker:

    • MinimumThroughput's minimal value (2)
    • Setting the correct value to SamplingDuration

    Gladly whenever you switch the Circuit Breaker to manual control then none of these matter: FailureRatio, MinimumThroughput, SamplingDuration and BreakDuration.

    In other words, regardless of their values you can ask the Circuit Breaker to transition into Isolated state or to Closed.

    Here is a sample code how to do that:

    static void Report(string message) => Console.WriteLine($"{DateTime.UtcNow.TimeOfDay}:{message}");
    const int MaxRetries = 2;
    
    var manualControl = new CircuitBreakerManualControl();
    var timer = new System.Timers.Timer(1500);
    timer.Elapsed += async (s, e) => await manualControl.CloseAsync();
    
    var pipeline = new ResiliencePipelineBuilder()
        .AddCircuitBreaker(new CircuitBreakerStrategyOptions
        {
            ManualControl = manualControl,
            OnOpened = static args =>
            {
                Report("Open");
                return default;
            },
            OnClosed = static args =>
            {
                Report("Close");
                return default;
            }
        })
        .AddRetry(new RetryStrategyOptions
        {
            ShouldHandle = new PredicateBuilder().Handle<SomeExceptionType>(),
            MaxRetryAttempts = MaxRetries,
            Delay = TimeSpan.Zero,
            OnRetry = async args =>
            {
                if(args.AttemptNumber == MaxRetries -1)
                {
                    await manualControl.IsolateAsync();
                    timer.Start();
                }
            }
        })
        .Build();
    
    for (int i = 0; i < 10; i++)
    {
        try
        {
            await pipeline.ExecuteAsync((ct) => { Report("Called"); throw new SomeExceptionType(); }, CancellationToken.None);
        }
        catch(Exception ex)
        {
            Report(ex.GetType().Name);
            await Task.Delay(450);
        }
    }
    
    • We have created a "blank" Circuit Breaker and added manual control for the state transitions via the manualControl.
    • We have created a Retry strategy to trigger for SomeExceptionType and retry at most MaxRetries times.
      • In its OnRetry before the last attempt we break the circuit
      • We also start a timer to transition the circuit breaker back to Closed after certain period of time.

    Here is a sample output:

    13:04:55.2422830:Called
    13:04:55.2486120:Called
    13:04:55.2505380:Open
    13:04:55.2523110:Called
    13:04:55.2536750:SomeExceptionType
    13:04:55.7084040:IsolatedCircuitException
    13:04:56.1594240:IsolatedCircuitException
    13:04:56.6119060:IsolatedCircuitException
    13:04:56.7744830:Close
    
    13:04:57.0636680:Called
    13:04:57.0637870:Called
    13:04:57.0638610:Open
    13:04:57.0640420:Called
    13:04:57.0654120:SomeExceptionType
    13:04:57.5166330:IsolatedCircuitException
    13:04:57.9679080:IsolatedCircuitException
    13:04:58.2533750:Close
    
    13:04:58.4195490:Called
    13:04:58.4199850:Called
    13:04:58.4218660:Open
    13:04:58.4220250:Called
    13:04:58.4244720:SomeExceptionType
    13:04:58.8763750:IsolatedCircuitException
    13:04:59.3277210:IsolatedCircuitException
    13:04:59.7532630:Close
    

    There are couple of gotchas here:

    First, we are moving the Circuit Breaker into an Isolated state before the last attempt, not after. This is because the OnRetry runs before the next retry attempt. One way to solve this problem is to move the OnRetry's logic into the catch block

    ...
    var pipeline = new ResiliencePipelineBuilder()
        .AddCircuitBreaker(new CircuitBreakerStrategyOptions
        {
            ManualControl = manualControl,
            OnOpened = static args =>
            {
                Report("Open");
                return default;
            },
            OnClosed = args =>
            {
                Report("Close");
                return default;
            }
        })
        .AddRetry(new RetryStrategyOptions
        {
            ShouldHandle = new PredicateBuilder().Handle<SomeExceptionType>(),
            MaxRetryAttempts = MaxRetries,
            Delay = TimeSpan.Zero,
        })
        .Build();
    
    for (int i = 0; i < 10; i++)
    {
        try
        {
            await pipeline.ExecuteAsync(_ => { Report("Called"); throw new SomeExceptionType(); }, CancellationToken.None);
        }
        catch(SomeExceptionType ex)
        {
            await manualControl.IsolateAsync();
            timer.Start();
        }
        catch (Exception ex)
        {
            Report(ex.GetType().Name);
        }
        await Task.Delay(450);
    }
    

    Now a sample output looks like this:

    13:09:21.2632760:Called
    13:09:21.2694840:Called
    13:09:21.2695320:Called
    13:09:21.2724070:Open
    13:09:21.7255970:IsolatedCircuitException
    13:09:22.1786920:IsolatedCircuitException
    13:09:22.6303060:IsolatedCircuitException
    13:09:22.8046920:Close
    
    13:09:23.0817970:Called
    13:09:23.0821610:Called
    13:09:23.0822510:Called
    13:09:23.0826860:Open
    13:09:23.5354960:IsolatedCircuitException
    13:09:23.9883700:IsolatedCircuitException
    13:09:24.2743980:Close
    
    13:09:24.4395530:Called
    13:09:24.4398200:Called
    13:09:24.4398690:Called
    13:09:24.4401100:Open
    13:09:24.8934820:IsolatedCircuitException
    13:09:25.3463760:IsolatedCircuitException
    13:09:25.7748690:Close
    

    The second problem and this is the bigger issue: The Circuit Breaker's control code is scattered everywhere. It is hard to encapsulate the current solution into a reusable format.

    Option #2

    Because your Circuit Breaker should break immediately after the retries have been exhausted that's why you can implement the required CB functionalities by yourself. Here is an example:

    public class Foo
    {
        static readonly ResiliencePipeline pipeline = new ResiliencePipelineBuilder()
        .AddRetry(new RetryStrategyOptions
        {
            ShouldHandle = new PredicateBuilder().Handle<SomeExceptionType>(),
            MaxRetryAttempts = 2,
            Delay = TimeSpan.Zero,
        })
        .Build();
    
        const int NotAllowed = 0;
        const int Allowed = 1;
        private static int isAllowed = Allowed;
    
        public async ValueTask Bar(Func<ValueTask> callback)
        {
            if(isAllowed == NotAllowed)
                throw new BrokenCircuitException();
    
            try
            {
                await pipeline.ExecuteAsync(_ => callback(), CancellationToken.None);
            }
            catch(SomeExceptionType)
            {
                Report("Open");
                Interlocked.Exchange(ref isAllowed, NotAllowed);
                System.Timers.Timer timer = new (1500);
                timer.Elapsed += (s, e) => { Report("Close"); Interlocked.Exchange(ref isAllowed, Allowed); };
                timer.Start();
            }
        }
    }
    

    Here we basically mimic an atomic boolean flag to capture whether the Circuit breaker is in closed or in open state.

    If you run a test like this

    var foo = new Foo();
    for (int i = 0; i < 10; i++)
    {
        try
        {
            await foo.Bar(() => { Report("Called"); throw new SomeExceptionType(); });
        }
        catch(BrokenCircuitException)
        {
            Report("Broken");
            await Task.Delay(450);
        }
    }
    

    then the output will be something like this:

    13:50:50.2469980:Called
    13:50:50.2539870:Called
    13:50:50.2540370:Called
    13:50:50.2546650:Open
    13:50:50.2552080:Broken
    13:50:50.7074720:Broken
    13:50:51.1621140:Broken
    13:50:51.6145440:Broken
    13:50:51.7849980:Close
    
    13:50:52.0653560:Called
    13:50:52.0657160:Called
    13:50:52.0658150:Called
    13:50:52.0660950:Open
    13:50:52.0663610:Broken
    13:50:52.5180710:Broken
    13:50:52.9704980:Broken
    13:50:53.2559450:Close
    
    13:50:53.4215020:Called
    13:50:53.4221810:Called
    13:50:53.4222580:Called
    13:50:53.4225390:Open
    

    Here the core logic is encapsulated and reusable. With minimal effort the hard coded things can be passed as parameters. Unfortunately this solution is not composable with other resilience strategies.


    So, if you want to implement this specific resilience logic then there are many options. If you need to compose it then you have to implement a custom ResilienceStrategy.