Search code examples

Access ADLS Gen2 using pem/certificate from Apache Spark

I have an Azure SPN which allows me to read data from ADLS Gen2 using certificates (.pem) file. When I use Azure SDK, I can easily create the following object

from azure.identity import CertificateCredential

azure_credential = CertificateCredential(

and then use it to access ADLS Gen2


I need to use the same SPN from Apache Spark but I cannot find a way to achieve this. As per the authentication methods written in ABFS documentation, it doesn't seem to be possible ( My goal would be to set the certificate like I used to set the client secrets in Spark conf like below.

spark_session.conf.set("fs.adl.oauth2.access.token.provider.type", "ClientCredential")
spark_session.conf.set("", client_id)
spark_session.conf.set("fs.adl.oauth2.credential", client_secret)
spark_session.conf.set("fs.adl.oauth2.refresh.url", f"{tenant_id}/oauth2/token")


  • Connecting to ADLS Gen2 in Databricks: Configuration Options

    According to the official Databricks documentation, there are multiple ways to connect to Azure Data Lake Storage (ADLS) Gen2. The available methods include:

    1. OAuth 2.0 with an Azure Service Principal
    2. Shared Access Signatures (SAS)
    3. Account Keys

    It's important to note that configuring a certificate (.pem) directly in Spark configuration, akin to how secrets are managed, is currently not supported.