Search code examples
azure-blob-storagedatabricksazure-databricksazure-authentication

Databricks Azure Blob Storage access


I am trying to access files stored in Azure blob storage and have followed the documentation linked below:

https://docs.databricks.com/external-data/azure-storage.html

I was successful in mounting the Azure blob storage on dbfs but it seems that the method is not recommended anymore. So, I tried to set up direct access using URI using SAS authentication.

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.<storage-account>.dfs.core.windows.net", "<token>")

Now when I try to access any file using:

spark.read.load("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-data>")

I get the following error:

Operation failed: "Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.", 403, HEAD,

I am able to mount the storage account using the same SAS token but this is not working. What needs to be changed for this to work?


Solution

  • If you are using blob storage, then you have to use wasbs and not abfss. I have tried using using the same code as yours with my SAS token and got the same error with my blob storage.

    spark.conf.set("fs.azure.account.auth.type.<storage_account>.dfs.core.windows.net", "SAS")
    spark.conf.set("fs.azure.sas.token.provider.type.<storage_account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
    
    spark.conf.set("fs.azure.sas.fixed.token.<storage_account>.dfs.core.windows.net", "<token>")
    
    df = spark.read.load("abfss://<container>@<storage_account>.dfs.core.windows.net/input/sample1.csv")
    

    enter image description here

    • When I used the following modified code, I was able to successfully read the data.
    spark.conf.set("fs.azure.account.auth.type.<storage_account>.blob.core.windows.net", "SAS")
    spark.conf.set("fs.azure.sas.token.provider.type.<storage_account>.blob.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
    
    
    spark.conf.set("fs.azure.sas.fixed.token.<storage_account>.blob.core.windows.net", "<token>")
    
    df = spark.read.format("csv").load("wasbs://<container>@<storage_account>.blob.core.windows.net/input/sample1.csv")
    

    enter image description here


    UPDATE:

    To access files from azure blob storage where the firewall settings are only from selected networks, you need to configure VNet for the Databricks workspace.

    enter image description here

    • Now add the same virtual network to your storage account as well.

    enter image description here

    • I have also selected service endpoints and subnet delegation as following:

    enter image description here

    • Now when I run the same code again using the file path as wasbs://<container>@<storage_account>.blob.core.windows.net/<path>, the file is read successfully.

    enter image description here