azure azure-blob-storage azure-databricks azure-data-lake-gen2

Difference between connect and mount in Azure Databricks

I mounted my Azure Storage Account using dbutils and Python like in this page, with the method using Azure Service Principal: https://learn.microsoft.com/en-us/azure/databricks/dbfs/mounts

configs = {"fs.azure.account.auth.type": "OAuth",
          "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
          "fs.azure.account.oauth2.client.id": "<application-id>",
          "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
          "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}

# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
  mount_point = "/mnt/<mount-name>",
  extra_configs = configs)

but I also saw there is an option to do a connection with spark to the Azure Blob File System (ABFS) driver like in this page: https://learn.microsoft.com/en-us/azure/databricks/external-data/azure-storage

service_credential = dbutils.secrets.get(scope="<scope>",key="<service-credential-key>")

spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")

I couldn't find information about the difference? In which use cases is it better to use one or the other? Is one method faster than the other to get information from the stored data in the Azure Storage Account?

Thanks a lot in advance!

Solution

When you mount your storage account, you make it accessible to everyone that has access to your Databricks workspace.
But when you use spark.conf.set to connect and use your storage account, it is limited to only those who have access to that cluster.
As highlighted in the same Microsoft document for Access Azure Data Lake Storage Gen2 and Blob Storage, Mounting is among the deprecated ways of accessing Storage accounts and no longer recommended. Therefore, as per the requirement, you can either choose mounting or setting configurations taking security into consideration.
If you want to choose mounting, you can try setting up mount point using credential passthrough.

Is one method faster than the other to get information from the stored data in the Azure Storage Account?

As far as I know, the rate at which information can be accessed would not change. The main difference is that using mounting is not as secure as using spark.conf.set because it is accessible to all users.