I am trying to follow this tutorial: https://learn.microsoft.com/en-us/azure/databricks/getting-started/connect-to-azure-storage
I have followed it to a tee but when I have got to Step 6: Connect to Azure Data Lake Storage Gen2 using python, I am getting an error trying to access the key vault.
My error is as such:
com.databricks.common.client.DatabricksServiceHttpClientException: PERMISSION_DENIED: Invalid permissions on the specified KeyVault https:X. Wrapped Message: Status code 403, "{"error":{"code":"Forbidden","message":"Caller is not authorized to perform action on resource.\r\nIf role assignments, deny assignments or role definitions were changed recently, please observe propagation time.\r\nCaller: name=AzureDatabricksnDecisionReason: 'DeniedWithNoValidRBAC' \r\nVault: KV;location=uksouth\r\n","innererror":{"code":"ForbiddenByRbac"}}}"
I assume this is some role so have added Key Vault Administrator, Key Vault Certificates Officer, Key Vault Secrets User to try and resolve the issue but still the same error. I'm new to this and can't figure out why this isn't working.
Any advise would be appreciated.
The error usually occurs if you missed assigning proper RBAC role to
AzureDatabricks
service principal under your key vault.
When I tried the same in my environment without assigning role to Azure Databricks
application, I got same error as below:
service_credential = dbutils.secrets.get(scope="<scope>",key="<service-credential-key>")
spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")
Response:
com.databricks.common.client.DatabricksServiceHttpClientException: PERMISSION_DENIED: Invalid permissions on the specified KeyVault https://srikv151.vault.azure.net/. Wrapped Message: Status code 403, "{"error":{"code":"Forbidden","message":"Caller is not authorized to perform action on resource.\r\nIf role assignments, deny assignments or role definitions were changed recently, please observe propagation time.\r\nCaller: name=AzureDatabricks;appid=2ff814a6-3304-4ab8-85cb-xxxxxx;oid=fe597bb2-377c-44f1-8515-xxxxxxx;iss=https://sts.windows.net/72f988bf-86f1-41af-91ab-xxxxxxx/\r\nAction: 'Microsoft.KeyVault/vaults/secrets/getSecret/action'\r\nResource: '/subscriptions/b83c1ed3-c5b6-44fb-b5ba-xxxxx/resourcegroups/v-sridevim/providers/microsoft.keyvault/vaults/srikv151/secrets/secret'\r\nAssignment: (not found)\r\nDenyAssignmentId: null\r\nDecisionReason: 'DeniedWithNoValidRBAC' \r\nVault: srikv151;location=eastus\r\n","innererror":{"code":"ForbiddenByRbac"}}}"
To resolve the error, make sure to assign Key Vault Administrator role to
AzureDatabricks
service principal under your key vault.
You can use below CLI command for assigning role to AzureDatabricks
service principal under your key vault:
az role assignment create --assignee-object-id <oid_from_error> --role "Key Vault Administrator" --scope "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.KeyVault/vaults/{keyVaultName}"
Response:
When I checked the same in Portal, role assigned successfully to AzureDatabricks
service principal under key vault:
You can now access Azure Data Lake Storage Gen2 account using python from Azure Databricks workspace.