I'd like to use WaitAndRetryAsync
to help retry http 429 (throttling) errors. The retry delay is returned as a property on the exception itself.
But I need to add the accumulated time and abandon the retry loop if the overall duration exceeds a certain amount.
policy = Policy.Handle<DocumentClientException>(ex => ex.StatusCode == (HttpStatusCode)429)
.WaitAndRetryAsync(
retryCount: retries,
sleepDurationProvider: (retryCount, exception, context) => {
DocumentClientException dce = exception as DocumentClientException;
// Here I would like to check the total time and NOT return a RetryAfter value if my overall time is exceeded. Instead re-throw the 'exception'.
return dce.RetryAfter;
},
onRetryAsync: async (res, timespan, retryCount, context) => {
});
When the overall time is exceeded I'd like to re-throw the 'exception' handled in the sleepDurationProvider
.
Is there a better way to handle this?
This first example below limits the total waits between retries to a total timespan myWaitLimit
, but takes no account of how long the calls to CosmosDB spend before returning DocumentClientException
. Because Polly Context
is execution-scoped, this is thread-safe. Something like:
policy = Policy.Handle<DocumentClientException>(ex => ex.StatusCode == (HttpStatusCode)429)
.WaitAndRetryAsync(
retryCount: retries,
sleepDurationProvider: (retryCount, exception, context) => {
DocumentClientException dce = exception as DocumentClientException;
TimeSpan toWait = dce.RetryAfter;
TimeSpan waitedSoFar;
if (!Context.TryGetValue("WaitedSoFar", out waitedSoFar)) waitedSoFar = TimeSpan.Zero; // (probably some extra casting actually needed between object and TimeSpan, but this kind of idea ...)
waitedSoFar = waitedSoFar + toWait;
if (waitedSoFar > myWaitLimit)
throw dce; // or use ExceptionDispatchInfo to preserve stack trace
Context["WaitedSoFar"] = waitedSoFar; // (magic string "WaitedSoFar" only for readability; of course you can factor this out)
return toWait;
},
onRetryAsync: async (res, timespan, retryCount, context) => {
});
An alternative approach could limit the overall execution time (when 429s occur) using a timing-out CancellationToken
. The below approach will not retry further after the CancellationToken
has been signalled. This approach is modelled to be close to the functionality requested in the question, but the timeout clearly only takes effect if a 429 response is returned and the sleepDurationProvider
delegate is invoked.
CancellationTokenSource cts = new CancellationTokenSource();
cts.CancelAfter(/* my timeout */);
var policy = Policy.Handle<DocumentClientException>(ex => ex.StatusCode == (HttpStatusCode)429)
.WaitAndRetryAsync(
retryCount: retries,
sleepDurationProvider: (retryCount, exception, context) => {
if (cts.IsCancellationRequested) throw exception; // or use ExceptionDispatchInfo to preserve stack trace
DocumentClientException dce = exception as DocumentClientException;
return dce.RetryAfter;
},
onRetryAsync: async (res, timespan, retryCount, context) => {
});
If you don't wish to define policy
in the same scope as using it and close over the variable cts
(as the above example does), you can pass the CancellationTokenSource
around using Polly Context
as described in this blog post.
Alternatively, Polly provides a TimeoutPolicy
. Using PolicyWrap
you can wrap this outside the retry policy. A timeout can then be imposed on the overall execution whether a 429 occurs or not.
If the strategy is intended to manage Cosmos DB async calls which do not inherently take a CancellationToken
, you would need to use TimeoutStrategy.Pessimistic
if you wanted to enforce timeout at that time interval. However, note from the wiki how TimeoutStrategy.Pessimistic
operates: it allows the calling thread to walk away from the uncancellable call, but doesn't unilaterally cancel the uncancellable call. That call might either later fault, or continue to completion.
Obviously, consider what is best from among the above options, according to your context.