Search code examples
azurenetworkingdatabricksazure-databricksdatabricks-unity-catalog

Databricks file trigger - how to whitlelist storage firewall


Recently, Databricks added a new feature - file trigger. However, this functionality seems to need a storage account to allow all network traffic.

My storage account has a firewall configured, it denies traffic from unknown sources. Databricks Workspace is deployed to our internal network - we are using Vnet injection. All necessary subnets are whitelisted, generally, storage works fine, but not with a file trigger. If I turn off the storage firewall, the file trigger works fine. External location and Azure Databricks Connector are configured correctly.

The error I get:

Invalid credentials for storage location abfss://@.dfs.core.windows.net/. The credentials for the external location in the Unity Catalog cannot be used to read the files from the configured path. Please grant the required permissions.

If I look at the logs in my storage account - it looks like the file trigger lists the storage account from a private IP address starting from 10.120.x.x. How do I whitelist this service? I want to keep my storage under the firewall.


Solution

  • Update 3rd April 2023rd: ADLS firewall isn't supported right now out of the box, work is in progress to solve this issue.

    It's described in the documentation - you need:

    • Create managed identity by creating the Databricks Access Connector
    • Give this managed identity permission to access your storage account
    • Create UC external location using the managed identity
    • Give access to your storage account to given access connector - in "Networking", select "Resource instances", then select a Resource type of Microsoft.Databricks/accessConnectors and select your Azure Databricks access connector.