.Net for Apache Spark authentication against ADLS (Azure datalake store) gen 1

I am new to apache spark. I am trying to use Microsoft apache nuget library to read data from ADLS. I cant seem to figure out how i can authenticate using spark. There seems to be no documentation around this at all. Is this even possible? I am writing a .Net framework console app.

Any help/pointers would be greatly appreciated!


  • If you want to Azure data lake store in Spark, please refer to the following steps. Please note that I use the spark 3.0.1 with Hadoop 3.2 for test

    1. Create a Service Principal
    az login
    az ad sp create-for-rbac --name "myApp" --role contributor --scopes /subscriptions/<subscription-id>/resourceGroups/<group-name> --sdk-auth
    1. Grant Service Principal access to Data Lake
    # get sp object id with sp's client id
    $sp=Get-AzADServicePrincipal -ApplicationId  42e0d080-b1f3-40cf-8db6-c4c522d988c4
    $newFullAcl = $fullAcl.Split("{,}")
    Set-AdlStoreItemAclEntry -Account <> -Path / -Acl $newFullAcl -Recurse -Debug
    1. Code
    string filePath =
                    $"adl://{<account name>}";
                // Create SparkSession
                SparkSession spark = SparkSession
                    .AppName("Azure Data Lake Storage example using .NET for Apache Spark")
                    .Config("fs.adl.impl", "org.apache.hadoop.fs.adl.AdlFileSystem")
                    .Config("fs.adl.oauth2.access.token.provider.type", "ClientCredential")
                    .Config("", "<sp appid>")
                    .Config("fs.adl.oauth2.credential", "<sp password>")
                    .Config("fs.adl.oauth2.refresh.url", $"<tenant>/oauth2/token")
                // Create sample data
                var data = new List<GenericRow>
                    new GenericRow(new object[] { 1, "John Doe"}),
                    new GenericRow(new object[] { 2, "Jane Doe"}),
                    new GenericRow(new object[] { 3, "Foo Bar"})
                // Create schema for sample data
                var schema = new StructType(new List<StructField>()
                    new StructField("Id", new IntegerType()),
                    new StructField("Name", new StringType()),
                // Create DataFrame using data and schema
                DataFrame df = spark.CreateDataFrame(data, schema);
                // Print DataFrame
                // Write DataFrame to Azure Data Lake Gen1
                // Read saved DataFrame from Azure Data Lake Gen1
                DataFrame readDf = spark.Read().Parquet(filePath);
                // Print DataFrame
                // Stop Spark session
    1. Run
    spark-submit ^
    --packages org.apache.hadoop:hadoop-azure-datalake:3.2.0 ^
    --class org.apache.spark.deploy.dotnet.DotnetRunner ^
    --master local ^
    microsoft-spark-3-0_2.12-<version>.jar ^
    dotnet <application name>.dll

