Search code examples
azureazure-devopsdatabricksazure-databricksazure-virtual-network

Databricks, Storage Account and VNet peering


I have two virtual networks on Azure I have deployed the following for testing purpose:

  • behind vnet1 was deployed a Storage Account ADLS
  • behind vnet2 was deployed Databricks with the 2 subnets for public and private
  • between vnet1 and vnet 2 there is a vnet peering. The peering status is connected and sync
  • the Databricks cluster is getting a NIC with an ip address in the range of vnet2

When I try to access the abfss path of the storage account and run dbutils.fs.ls(abfss_url) to list content I get the error:

This request is not authorized to perform this operation.", 403

When I specifically assign the vnet2 Databricks on the storage account firewall, it works.

The question is: How does vnet peering works here? Shouldn't the vnet peering extend the actual network and let me access the storage account from the databricks cluster without to assign the vnet 2 on the storage account firewall?


Solution

  • The storage cannot be directly deployed on a vnet as it is a paas service. You would either be using service endpoint or private endpoint which is assigned to a specific subnet within vnet1.

    Since you have deployed your databricks on a custom vnet2 and you have a vnet1, when you peer the 2 vnets, this means you can consider that you have 1 big vnet(vnet3) which is a superset of vnet1 and vnet2. But you still need to setup a service or private endpoint for your databricks to talk to your storage, else by default, the request from your databricks will go out to the internet and attempt to re-enter your vnet3 and this is where it is getting blocked.

    You can refer to this article on how to use service or private endpoint.