Search code examples
pythondatabricksazure-databricksazure-data-lake-gen2

In databricks using python, dbutils.fs.mount gives java.lang.NullPointerException: authEndpoint trying to mount using abfss. wasbs works fine


When using db.fs.mount in databricks to connect to azure gen2 data lake a authEndpoint error is received when attempting to connect to "abfss://[email protected]/" HOWEVER, connecting to "wasbs://[email protected]/" works fine. I am trying to understand why abfss causes the authEndpoint error while wasbs does not.

enter code here
#fails
endpoint = "abfss://[email protected]/";
dbutils.fs.mount(
   source = endpoint,
   mount_point = "/mnt/test",
   extra_configs = {"fs.azure.account.key.theDataLake.blob.core.windows.net" : "xxxxxx"})

#works
endpoint = "wasbs://[email protected]/";
dbutils.fs.mount(
   source = endpoint,
   mount_point = "/mnt/test",
   extra_configs = {"fs.azure.account.key.theDataLake.blob.core.windows.net" : "xxxxxx"})

Solution

  • You can't mount the ABFSS protocol using the storage key. You can mount with ABFSS only when using Service Principal (docs), and it requires another set of parameters for extra_configs:

    {"fs.azure.account.auth.type": "OAuth",
    "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    "fs.azure.account.oauth2.client.id": "<application-id>",
    "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
    "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}