We are using some notebooks to ingest data from some other system and placing them in Storage account.
Our team recently enabled System Managed Identity on Azure Databricks and when trying to connect to Azure Storage account, the error I get it
DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment
variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to
troubleshoot this issue.
ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint.
SharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
AzureCliCredential: Azure CLI not found on path
AzurePowerShellCredential: PowerShell is not installed
AzureDeveloperCliCredential: Azure Developer CLI could not be found. Please visit
https://aka.ms/azure-dev for installation instructions and then,once installed,
authenticate to your Azure account using 'azd auth login'.
To mitigate this issue, please refer to the troubleshooting guidelines here at
https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.
We enabled Storage Blob Data Contributor role to both User Assigned Managed Identity (dbmanagedidentity) and System Managed Identity but that doesn't seem to be issue.
Seems like compute cluster being in Shared Mode has problem to authenticate with Azure because if I use my personal compute on Databrick then authentication works and I am able to list blobs.
Any solution on how to authenticate with Azure Storage Account in Shared Cluster Mode ? I can't use access keys because those are not recommended by enterprise team.
Turns out this is a documented limitation.
I've tracked back the managed identity authorization on old (DBR 14) clusters and their public GitHub codebase. Turns out the managed identity authorization step was done by calling 169.254.169.254, which is the instance metadata service of the current Azure virtual machine. As stated here:
You cannot connect to the instance metadata service [using Unity Catalog shared access mode]
Solution: use single-user cluster setup (and maybe create a dedicated user for executing such workflows).
Summary:
Managed Identity | Unity Catalog | Team attach | |
---|---|---|---|
No isolation shared | ✅ | ⛔ | ✅ |
Shared | ⛔ | ✅ | ✅ |
Single user | ✅ | ✅ | ⛔ |