I am new to circuit breakers and have recently implemented them in one of my services. I was going through the documentation Resilience 4J official documentation and found two properties that we can configure for circuit breakers.
Both the above properties specify the number of calls that must be made to the services to determine whether the circuit breaker should remain open or should be closed. I need to understand the subtle difference between the two properties and any relationship that they might have with each other? Should they be configured irrespective of each other or should there be a relation between the two?
Also, is there a relationship between the above two and permittedNumberOfCallsInHalfOpenState
. For instance, if I am configuring permittedNumberOfCallsInHalfOpenState
as 5 but my slidingWindowSize
/minimumNumberOfCalls
are configured as 10, then how will the circuit breaker state be re-validated? Because it needs a minimum of 10 requests before it could re-evaluate the new state for circuit breaker but we are permitting only 5 requests when it is in open-state?
This answer from the father of Resilience4j helped me:
In a production system you should not set minimumNumberOfCalls to 1. For testing it is okay, but 3 is better.
Let's assume you have minimumNumberOfCalls=3, slidingWindowSize = 10 and slidingWindowType = COUNT_BASED: That means the CircuitBreaker is calculating the failure rate and slow call rate based on the last 10 calls, as soon as 3 calls have been recorded.
Let's assume 2 calls are slow and 1 call is fast: That means the slow call rate is above 50% and the CircuitBreaker will transition to OPEN.
The minimumNumberOfCalls setting makes even more sense, if slidingWindowType = TIME_BASED and the failure rate is calculated based on the calls from the last N seconds.
As for the permittedNumberOfCallsInHalfOpenState
question, after the wait-duration-in-open-state
period is over, it goes into max-wait-duration-in-half-open-state
and will allow a max of 5 calls. As long as the threshold is not met/exceeded, the circuit will close. It does not depend on the slidingWindowSize
/minimumNumberOfCalls
as per here.
The CircuitBreaker rejects calls with a CallNotPermittedException when it is OPEN. After a wait time duration has elapsed, the CircuitBreaker state changes from OPEN to HALF_OPEN and permits a configurable number of calls to see if the backend is still unavailable or has become available again. Further calls are rejected with a CallNotPermittedException, until all permitted calls have completed. If the failure rate or slow call rate is then equal or greater than the configured threshold, the state changes back to OPEN. If the failure rate and slow call rate is below the threshold, the state changes back to CLOSED.
P.S If you don't configure any property, the default value is used as per here.