Currently, Polly Retry policy retires all the failed requests independently. So, if there are 10 requests failing and I have set the retry forever policy then it will send 10 more requests every time a retry happens and the server will never heal.
How to asynchronously pass all failed requests and retry only one request and resume the normal flow if a retry is successful?
I can't (don't want to) use Circuit Breaker because my service is a Background worker service and Circuit Breaker breaks the whole background service logic.
// Current code with only retry policy
var retry = HttpPolicyExtensions.HandleTransientHttpError().WaitAndRetryForeverAsync(retryNo => new TimeSpan(0, retryNo > 3 ? 10 : (retryNo * 2), 0));
builder.Services.AddHttpClient<TestClient>().AddPolicyHandler(retry);
Use Case: I have written a background service and continuously scrapes a website that contains 30000+ pages. Inorder to prevent overloading the site, I am using SemaphoreSlim
(or Bulkhead) to limit no of requests that send to the server at a point in time.
Still, there is a chance that the server rejects my request. At that time, I need to retry only one failed request unit the server starts accepting my request again. Since I am sending multiple requests at the same time, Polly is retrying all the failed requests, this makes the server unhappy.
Expectation:
10 Request Fails -> Retry 1 request (unit success) -> If successful then resent remaining 9 request.
According to my understanding you have a single HttpClient
which is used to issue N rate-limited, concurrent requests against the same downstream system.
You want to handle the following failure scenarios:
The Circuit Breaker policy works as a proxy. It tracks the outgoing communication and if there are too much successive failures then it prevents further requests. It does that by short-cutting the requests by throwing an BrokenCircuitException
.
After a certain period of time CB will allow a single request to go out against the downstream system and if it succeeds then it allows all outgoing communication but if it fails then it will short-cut them. Here I have detailed how does CB work.
You can adjust your retry policy to be aware of this exception. This means that your retry requests will be still issued but will not leave your application domain. Fortunately in Polly you can define multiple triggers for a policy:
HttpPolicyExtensions
.HandleTransientHttpError()
.Or<BrokenCircuitException>()
.WaitAndRetryForeverAsync(retryNo => new TimeSpan(0, retryNo > 3 ? 10 : (retryNo * 2), 0));
So, either it was a HttpRequestException
or a BrokenCircuitException
it will trigger. It will also trigger if the HttpStatusCode
is either 408 or 5xx.
Now what's left is to combine the retry and circuit breaker policies into a resilient strategy. You can do that by using one of the following:
.AddPolicyHandler(retryPolicy.Wrap(cbPolicy))
//OR
.AddPolicyHandler(Policy.Wrap(retryPolicy, cbPolicy))
Please be aware of the ordering. It is important to register the cb as the inner policy and the retry as the outer to be able to rely on escalation. Here I have detailed this exact scenario.
NOTE: If you want to you can use different delay while the Circuit Breaker is Open. I have detailed here how you can do that by using the Context
object.
The above solution works fine if the application does not crash. If it does then you have to start the whole processing from the beginning.
If you need to avoid this situation then you need to store somewhere your workitems (to-be-processed urls).
I would suggest the following architecture:
With this architecture you don't need an explicit retry policy, since your queue/database preserves those items that did not succeed. So your fetch logic would retrieve the same job until it eventually completed.
You can further extend this concept by creating a dead letter queue where you can store those workitems that are failed N times. With that your queue won't be polluted with "permanent" workitems.