azure-blob-storage databricks azure-databricks azure-authentication

Databricks Azure Blob Storage access

I am trying to access files stored in Azure blob storage and have followed the documentation linked below:

https://docs.databricks.com/external-data/azure-storage.html

I was successful in mounting the Azure blob storage on dbfs but it seems that the method is not recommended anymore. So, I tried to set up direct access using URI using SAS authentication.

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.<storage-account>.dfs.core.windows.net", "<token>")

Now when I try to access any file using:

spark.read.load("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-data>")

I get the following error:

Operation failed: "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.", 403, HEAD,

I am able to mount the storage account using the same SAS token but this is not working. What needs to be changed for this to work?

Solution

If you are using blob storage, then you have to use wasbs and not abfss. I have tried using using the same code as yours with my SAS token and got the same error with my blob storage.

spark.conf.set("fs.azure.account.auth.type.<storage_account>.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage_account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")

spark.conf.set("fs.azure.sas.fixed.token.<storage_account>.dfs.core.windows.net", "<token>")

df = spark.read.load("abfss://<container>@<storage_account>.dfs.core.windows.net/input/sample1.csv")

enter image description here

When I used the following modified code, I was able to successfully read the data.

spark.conf.set("fs.azure.account.auth.type.<storage_account>.blob.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage_account>.blob.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")


spark.conf.set("fs.azure.sas.fixed.token.<storage_account>.blob.core.windows.net", "<token>")

df = spark.read.format("csv").load("wasbs://<container>@<storage_account>.blob.core.windows.net/input/sample1.csv")

enter image description here

UPDATE:

To access files from azure blob storage where the firewall settings are only from selected networks, you need to configure VNet for the Databricks workspace.

enter image description here

Now add the same virtual network to your storage account as well.

enter image description here

I have also selected service endpoints and subnet delegation as following:

enter image description here

Now when I run the same code again using the file path as wasbs://<container>@<storage_account>.blob.core.windows.net/<path>, the file is read successfully.

enter image description here