Search code examples
c#.netazureservicebusazure-managed-identitykubernetes-health-check

Avoiding Azure.Identity.CredentialUnavailableException on the fresh startup


Short summary

When I deploy my application which uses Managed Identity to connect to the Service Bus instance, for a first couple of minutes I see a lot of Azure.Identity.CredentialUnavailableException exceptions in the logs. Then it all starts working fine and the connection is established.

Longer explanation

My setup is as follows:

  • web app hosted in Azure (using Kubernetes)
  • Azure Service Bus which the app connects with
  • Managed Identity assigned to an app used to connect to the Service Bus

In my application I use Azure.Messaging.ServiceBus. In my Startup file I register the Service Bus Client as follows:

        services.AddAzureClients(clientBuilder =>
            clientBuilder.AddServiceBusClient(Configuration.GetSection("ServiceBus"))
                .WithCredential(new DefaultAzureCredential()));

Just as a side note, I use DefaultAzureCredential instead of ManagedIdentityCredential because I want my application to work locally as well (so VisualStudioCredential or AzureDeveloperCliCredentail can be used).

Later, I register my IHostedService implementation where I handle Service Bus messages (I can provide a simplified but full code for this but I don't think it's that important). In this class, I create ServiceBusProcessor instance using ServiceBusClient instance and calling CreateProcessor() method. Then I invoke StartProcessingAsync() method from the created ServiceBusProcessor. As far as I understand, this is the point where the connection to the Service Bus is actually being initialized.

When I deployed my app for the first time I saw a lot of Azure.Identity.CredentialUnavailableException exceptions:

DefaultAzureCredential failed to retrieve a token from the included credentials. See the troubleshooting guide for more information. https://aka.ms/azsdk/net/identity/defaultazurecredential/troubleshoot
- EnvironmentCredential authentication unavailable. Environment variables are not fully configured. See the troubleshooting guide for more information. https://aka.ms/azsdk/net/identity/environmentcredential/troubleshoot
- ManagedIdentityCredential authentication unavailable. Multiple attempts failed to obtain a token from the managed identity endpoint.
- Operating system Linux 5.4.0-1103-azure #109~18.04.1-Ubuntu SMP Wed Jan 25 20:53:00 UTC 2023 isn't supported.
- Stored credentials not found. Need to authenticate user in VSCode Azure Account. See the troubleshooting guide for more information. https://aka.ms/azsdk/net/identity/vscodecredential/troubleshoot
- Azure CLI not installed
- PowerShell is not installed.

However, after 2 or 3 minutes the connection has been established and it all started working fine. As I understand there is a process under the hood which calls Azure to create some additional resources to 'link' the Managed Identity with a freshly created virtual machine. When this succeeds, the connection is established and the app starts processing Service Bus messages. But in the meantime some code in the package (I assume) is still retrying to connect to the Service Bus and that's why I see a lot of Azure.Identity.CredentialUnavailableException exceptions (the numbers were as high as 70000 exceptions for some deployments).

Question

Is there any way to check if the connection has been established before continuing to run the application? Or is there any way to have a control over how the connection is established, eg. over the retry policy? My ultimate goal is to avoid those exceptions in the logs.

I tried to see if I could register my custom IHealthCheck so I don't return ready before the connection is established, but I can't find an easy way to know that the connection is actually established.

I also tried to understand how Azure.Messaging.ServiceBus is establishing the connection and if I can have any control over it but couldn't find any information on this matter.


Solution

  • This type of failure most often traces back to the local managed identity endpoint on the host not being available when the application is launched. Unfortunately, that's not something that the Azure SDK libraries can control or influence, it is something that would need to be investigated on the host.

    From the client perspective, to limit these exceptions, the simplest option would likely be to tune the retry options for DefaultAzureCredential by passing a set of DefaultAzureCredentialOptions to it. That would allow the credential to do its work but slow down the flow and reduce the number of retries/failures. Since this retry policy is specific to the credential, it would not impact the Service Bus operations that you're performing.

    Another option would be to introduce a delay in your application and not start the processor until you've confirmed that the MI endpoint address is available and responding. More information on how to interact with the local Managed Identity REST API can be found here. This would be more involved for the application but would give you the most precise understanding and avoid SDK errors.