Search code examples
azuredatabricksazure-databricksazure-storage-accountazure-service-principal

Access Control from Databricks to Azure Storage Accounts and Containers


Our Databricks workspace needs to access different data sets but we need to ensure that access control can be granted on a role or individual level. The data sets are planned to be available as files on Data Lake Gen2 that will be read into dataframes etc. These files in storage accounts can be organized as seen fit for access rights (either 1 storage account per dataset - which might hit the 256 limit soon - or 1 dataset per container and thus several datasets in a storage account).

Our architectural guidelines require the access to be via service principal. However, I think this would give each user in the Databricks workspace the same access rights to different storage accounts (datasets).

Is there another feasible solution with accessing storage accounts from Databricks via service principal but at the same time have fine-grained control about access rights of individual users or at least on a role-level? Can this be achieved on a container level or only on a storage account level?

I tried to use service principal to access storage accounts from within a Databricks workspace which then grants every user the access to the storage accounts.


Solution

  • Usually when user is working with the data it happens in two steps:

    1. Checking permissions for accessing a specific piece of data
    2. Actually accessing the data in the storage account if it's allowed

    This schema is fully supported on Databricks with following:

    • If your organization is already adopted the Unity Catalog (UC), then it's easy - you just add storage accounts/containers as external locations, create tables for data in these locations, and then grant permissions on working with specific tables to users or (better) roles. Actual data access will be done
    • If you didn't adopt UC yet, then you can enforce access via Table Access Control (TACL). In this case you will need to attach a service principal to a TACL enabled cluster, but actual enforcement will happen by the TACL service, and data will be read/written only if user/role has permissions to do that.