Search code examples
scalaazureapache-sparkazure-databricks

Azure Databrics - Running a Spark Jar from Gen2 DataLake Storage


I am trying to run a spark-submit from Azure Databrics. Currently I can create a job, with the jar uploaded within the Databrics workspace, and run it.

My queries are:

  1. Is there a way to access a jar residing on a GEN2 DataLake storage and do a spark-submit from Databrics workspace, or even from Azure ADF ? (Because the communication between the workspace and GEN2 storage is protected "fs.azure.account.key")

  2. Is there a way to do a spark-submit from a databrics notebook?


Solution

  • Finally I figured out how to run this:

    1. You can do a run a Databricks jar from an ADF, and attach it to an existing cluster, which will have the adls key configured in the cluster.

    2. It is not possible to do a spark-submit from a notebook. But you can create a spark job in jobs, or you can use the Databricks Run Sumbit api, to do a spark-submit.