Search code examples
c#azure-functionsazure-application-insightspollyappinsights

Polly based retry logs intermittently not showing up in App Insights for Windows Azure Function


We have a Windows Azure function where we have Polly implemented as -

public async Task<T> ExecuteTaskWithRetryAsync<T>(int maxRetryAttempts, int delayBetweenFirstRetryInSeconds, string invocationIdentifier, Func<Task<T>> function)
{
    logger.LogInformation($"Executing {invocationIdentifier} with retries.");
    var retryPolicy = Policy
        .Handle<HttpRequestException>(ex => ex.StatusCode != HttpStatusCode.BadRequest && ex.StatusCode != HttpStatusCode.Unauthorized)
        .WaitAndRetryAsync(
        maxRetryAttempts,
        retryAttempt => TimeSpan.FromSeconds(delayBetweenFirstRetryInSeconds * Math.Pow(2, retryAttempt - 1)),
        (exception, timeSpan, retryCount, context) =>
        {
            logger.LogWarning("Execution of {identifier} failed. Waiting {sec} seconds before retry. Attempt  number {retryCount}. Exception - {exceptionMessage}", invocationIdentifier, timeSpan.Seconds, retryCount, exception.Message);
        });
    return await retryPolicy.ExecuteAsync<T>(async () =>
    {
        T result = await function();
        logger.LogInformation($"Successfully executed {invocationIdentifier}");
        return result;
    });

And being used as

var response = await retryService.ExecuteTaskWithRetryAsync(retryOptions.MaxRetryAttempts, retryOptions.DelayBetweenRetriesInSeconds, nameof(GetFromMetaRPAsync), async () =>
{
    return await httpClient.GetAsync(endpoint).ConfigureAwait(false);
});
if (!response.IsSuccessStatusCode)
{
    throw new HttpRequestException($"Failed to retrieve any results. Response with status code {response?.StatusCode} - {response?.ReasonPhrase}.");
}

But in App Insights, we can only see the initial log

"Executing {invocationIdentifier} with retries"

And then the HttpRequestException is thrown with status code 504 (GatewayTimeout).

I can see an open issue which talks about Polly retries not being logged as separate dependencies, but they are able to see the retry log. We are not even able to see the retry logs of

"Waiting {sec} seconds before retry."

when there's an issue on the server, such as BadGateway, GatewayTimeout, etc.

However, we are able to intermittently see this log line

"Waiting {sec} seconds before retry... Exception - There was error sending the request".

In cases where the retries are not being captured, we are getting the response of the call after approx. 20 seconds. Is it possible that the retries are happening internally but are just not being captured by App Insights? Shouldn't the retry logs still be captured, even if dependencies are not? How can we verify that the retries are actually taking place or not?

We have run the code in our local and also have unit tests in place, and in both cases, we can see the retries taking place. Can the behaviour be different on the cloud?


Solution

  • Your retry policy triggers only if the decorated HttpClient throws an HttpRequestException:

    .Handle<HttpRequestException>(...
    

    That means if your HttpClient does not throw an HRE exception, but does return an HttpResponseMessage with 504 status code then your retry logic won't be triggered.

    You might observe the HttpRequestException with Gateway Timeout because after the policy execution your code performs a success check

    if (!response.IsSuccessStatusCode)
    {
        throw new HttpRequestException($"Failed to retrieve any results. Response with status code {response?.StatusCode} - {response?.ReasonPhrase}.");
    }
    

    Option #1

    Call the EnsureSuccessStatusCode method inside the policy execution.

    var response = await retryService.ExecuteTaskWithRetryAsync(retryOptions.MaxRetryAttempts, retryOptions.DelayBetweenRetriesInSeconds, nameof(GetFromMetaRPAsync), async () =>
    {
        var httpResponse = await httpClient.GetAsync(endpoint).ConfigureAwait(false);
        httpResponse.EnsureSuccessStatusCode();
        return httpResponse;
    });
    

    As you can see in dotnet source code the potentially thrown HRE contains the status code as well as the reason phrase.

    Option #2

    Define your policy to trigger for HttpResponseMessage as well

    var retryPolicy = Policy
        .Handle<HttpRequestException>(ex => ex.StatusCode != HttpStatusCode.BadRequest && ex.StatusCode != HttpStatusCode.Unauthorized)
        .OrResult<HttpResponseMessage>(res => res.StatusCode != HttpStatusCode.BadRequest && res.StatusCode != HttpStatusCode.Unauthorized)
        .WaitAndRetryAsync(...
    

    OR

    var nonRetryableStatusCodes = [ HttpStatusCode.BadRequest,  HttpStatusCode.Unauthorized];
    ...
    var retryPolicy = Policy
        .Handle<HttpRequestException>(ex => nonRetryableStatusCodes.Contains(ex.StatusCode) is false)
        .OrResult<HttpResponseMessage>(res => nonRetryableStatusCodes.Contains(res.StatusCode) is false)
        .WaitAndRetryAsync(