I am trying to access Azure Data Lake Storage Gen2 with a Service Principal via Unity Catalog.
All Privileges
on the external locationIn PySpark I set the Spark config according to the Azure Gen 2 documentation:
from pyspark.sql.types import StringType
spark.conf.set(f"fs.azure.account.auth.type.{storage_account}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account}.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account}.dfs.core.windows.net", client_id)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account}.dfs.core.windows.net", client_secret)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account}.dfs.core.windows.net", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")
# create and write dataframe
df = spark.createDataFrame(["10","11","13"], StringType()).toDF("values")
df.write \
.format("delta") \
.mode("overwrite") \
.save(f"abfss://{container}@{storage_account}.dfs.core.windows.net/example/example-0")
This unfortunaly this returns an unexpected:
Operation failed: "This request is not authorized to perform this operation using this permission.", 403, HEAD, https://{storage-account}.dfs.core.windows.net/{container-name}/example/example-0?upn=false&action=getStatus&timeout=90
When you use Unity Catalog you don't need these properties - they were needed prior to Unity Catalog and not used right now, or used only for clusters without UC for direct data access:
spark.conf.set(f"fs.azure.account.auth.type.{storage_account}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storage_account}.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storage_account}.dfs.core.windows.net", client_id)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storage_account}.dfs.core.windows.net", client_secret)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storage_account}.dfs.core.windows.net", f"https://login.microsoftonline.com/{tenant_id}/oauth2/token")
The authentication to the given storage location will happen via mapping the storage credential to the external location path.
But permissions will be checked for the user/service principal who is running a giving piece of code, so this user/principal should have corresponding permission on the external location. If you run this code as SP-assigned job, then it will have access. But if you run it as yourself, it won't work until you get permissions.