Search code examples
scalaazureapache-sparkspark-streamingspark-streaming-kafka

How to set streaming app checkpointing to Azure storage?


I am trying set checkpointing for spark streaming application to Azure storage. I was using S3 and the code was working fine.

Here is the latest code of how I set checkpointing to Azure.

sc.hadoopConfiguration
      .set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
    sc.hadoopConfiguration
      .set(
        "fs.azure.account.key.[name].blob.core.windows.net",
        [key]
      )
    ssc.checkpoint(
      "https://[name].blob.core.windows.net/[blob]")

Here is the error message that I am getting when starting. Exception in thread "main" java.io.IOException: No FileSystem for scheme: https


Solution

  • See here - it's for databricks but should still apply.

    val df = spark.read.parquet("wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>")
    

    ==> So, use wasbs instead of https