I have attempted to upgrade to V8 of Polly but now the status of a CircuitBreaker becomes stuck as open indefinitely.
My classic AdvancedCircuitBreakerAsync
continues to work following a Nuget download of the V8 Polly package but then I tried to adopt the substantial V8 api changes including ResiliencePipelineBuilder
, CircuitBreakerStrategyOptions
and CircuitBreakerStateProvider
. Now I am experiencing the stuck open circuit problem.
The V8 problem only occurs when I skip repeated calls to _resiliencePipeline.ExecuteAsync(...)
with the following logic:
if (_circuitBreakerStateProvider.CircuitState == CircuitState.Open)
{
// return fallback value indicating remote service is unavailable but
// the CircuitState never progresses to HalfOpen
return null;
}
await _resiliencePipeline.ExecuteAsync( ... );
The equivalent logic with a classic circuit breaker works as expected:
if (_breakerPolicy.CircuitState == CircuitState.Open)
{
return null;
}
// after the configured durationOfBreak timespan expires we get here with circuit at HalfOpen
await _breakerPolicy.ExecuteAsync( ... );
It seems that the circuit state will not progress from Open to HalfOpen without further invocation of _resiliencePipeline.ExecuteAsync()
UPDATE #1
As requested by @Peter Csala.
Here is my pipeline config, it is designed for a fail-fast requirement in a low traffic situation. There is no direct DI configuration for the ResiliencePipeline
. The Pipeline is declared and held in an application RpcFacade
class that is itself a singleton, a breakpoint confirms the pipeline is built once and all testing is through single user manual UI testing of a Blazor server application.
services.AddSingleton<RpcFacade>();
_breakerState = new CircuitBreakerStateProvider();
var builder = new ResiliencePipelineBuilder<List<LiveAgentInfo>>().AddCircuitBreaker( new CircuitBreakerStrategyOptions<List<LiveAgentInfo>>
{
FailureRatio = 0.1,
SamplingDuration = TimeSpan.FromSeconds( 10 ),
MinimumThroughput = 2,
BreakDuration = TimeSpan.FromSeconds( 15 ),
ShouldHandle = new PredicateBuilder<List<LiveAgentInfo>>().Handle<RpcException>(),
StateProvider = _breakerState
} );
Update #2
I have enabled additional Polly logging. This logs an entry "Resilience event occurred. EventName: 'OnCircuitOpened'" as the Circuit transitions from Closed to Open on the second failing GRPC call to a remote GRPC service that is not reachable.
Logging of _circuitBreakerStateProvider.CircuitState
just before the _resiliencePipeline.ExecuteAsync( ... )
confirms the Closed to Open transition so the _circuitBreakerStateProvider
instance is maintaining a valid observation of the CircuitBreaker's internal state.
Update #3
Further testing reveals another insight. In the original code above I showed how I was returning a fallback value when CircuitState
== CircuitState.Open
in order to avoid calling _resiliencePipeline.ExecuteAsync(...)
during the Open 15 second window.
If I always call _resiliencePipeline.ExecuteAsync( ... )
even when the circuit status is open then I get BrokenCircuitExceptions
during the open window, then after 15 seconds the circuit breaker lets through a remote call which triggers an RpcException
. At this point I see a log entry "Resilience event occurred. EventName: 'OnCircuitHalfOpened' in the log".
It seems the circuit breaker only identifies that it has reached a HalfOpen
state condition during resiliencePipeline.ExecuteAsync( ... )
. It then makes the external grpc call expressed in my lambda which fails and the state returns to Open. From the external perspective of the CircuitBreakerStateProvider
the state appears stuck at Open.
As a workaround, I can return a fallback value in my BrokenCircuitException
catch, this fails fast and is outcome I wanted.
TL;DR: V7's CircuitState
property's getter is more complex than V8's.
In order to understand why V7 does and why V8 does not transition from Open
into HalfOpen
state without invoking ExecuteAsync
we need to look a bit under the hood. It won't be painful I promise.
In both cases we have a controller class which is stateful and does the heavy-lifting. The policy/strategy uses the controller to ask for the state transitions.
Here we have a CircuitStateController
with a bunch of fields.
The important ones from this question perspective:
protected readonly TimeSpan _durationOfBreak;
protected long _blockedTill;
protected CircuitState _circuitState;
...
protected readonly Action _onHalfOpen;
_blockedTill
captures a date time until when the CB must remain in Open
state before it could transition into HalfOpen
_durationOfBreak
_onHalfOpen
is called ONLY when the CircuitState
is evaluatedpublic CircuitState CircuitState
{
get
{
if (_circuitState != CircuitState.Open)
{
return _circuitState;
}
using (TimedLock.Lock(_lock))
{
if (_circuitState == CircuitState.Open && !IsInAutomatedBreak_NeedsLock)
{
_circuitState = CircuitState.HalfOpen;
_onHalfOpen();
}
return _circuitState;
}
}
}
Why is this important from the question perspective? Because in your early exit condition you are directly asking the Circuit Breaker to please re-evaluate the circuit state (_breakerPolicy.CircuitState == CircuitState.Open
). That's why it transitions from Open
to HalfOpen
.
This also means that V7 does not transition from Open
to HalfOpen
automatically. If your CB breaks and you don't assess the CircuitState
either directly or via the ExecuteAsync
it will remain in Open
state (and your onHalfOpen
won't be triggered).
In the new version the _onHalfOpen
user delegate is called only from the ScheduleHalfOpenTask
method:
private Task ScheduleHalfOpenTask(ResilienceContext context)
{
_executor.ScheduleTask(() => _onHalfOpen!(new OnCircuitHalfOpenedArguments(context)).AsTask(), context, out var task);
return task;
}
and this method is only being called from the OnActionPreExecuteAsync
.
Here is an excerpt of the method:
public async ValueTask<Outcome<T>?> OnActionPreExecuteAsync(ResilienceContext context)
{
...
lock (_lock)
{
// check if circuit can be half-opened
if (_circuitState == CircuitState.Open && PermitHalfOpenCircuitTest_NeedsLock())
{
_circuitState = CircuitState.HalfOpen;
_telemetry.Report(new(ResilienceEventSeverity.Warning, CircuitBreakerConstants.OnHalfOpenEvent), context, new OnCircuitHalfOpenedArguments(context));
isHalfOpen = true;
}
exception = ...
if (isHalfOpen && _onHalfOpen is not null)
{
task = ScheduleHalfOpenTask(context);
}
}
...
}
As you have already figured out this OnActionPreExecuteAsync
is called ONLY by the strategy whenever its ExecuteCore
is being executed.
The CircuitBreakerStateProvider
does not perform too much things whenever you retrieve the circuit state:
public CircuitState CircuitState => _circuitStateProvider?.Invoke() ?? CircuitState.Closed;
The invoked method is specified inside the Initialize
call (from the strategy).
stateProvider?.Initialize(() => _controller.CircuitState);
The controller's CircuitState
property's getter is super simple.
public CircuitState CircuitState
{
get
{
EnsureNotDisposed();
lock (_lock)
{
return _circuitState;
}
}
}
As you can see it does not perform any check whether it should transition to HalfOpen
or not.
Open
to HalfOpen
because you access the CircuitState
and the getter might change the state of the circuit breaker.Open
to HalfOpen
automatically because the state provider simply returns the current state of the controller and does NOT induce any state change.I hope it was not painful and the description clarified certain things. :)
Update #1
I forgot to mention that we have documented the suggested solution on Polly doc. Basically the suggestion is to use ExecuteOutcomeAsync
to do not throw exception.