In Databricks, I would like to have a workflow with more than one task. I would like to pass information between those tasks. According to https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/share-task-context , this can be achieved with Python using
dbutils.jobs.taskValues.set(key = 'name', value = 'Some User')
and to fetch in second task
dbutils.jobs.taskValues.get(taskKey = "prev_task_name", key = "name", default = "Jane Doe")
I am, however, using jar libraries written in Scala 2.12 for my tasks.
Is there any way to achieve this in Scala? Or any ideas for workarounds?
Yes, as per the documentation, Scala is not supported with the taskValues subutility.
However, if you still want to get values around a task, you can create a global temporary view and access them.
In the example below, I tried with a Scala notebook, and the same Scala code can be added to your JAR while building it.
Output:
Code in scalatask1
case class taskValues(key: String, value: String)
val df = Seq(new taskValues("task1key1", "task1key1value"), new taskValues("task1key2", "task1key2value"), new taskValues("task1key3", "task1key3value")).toDF
df.createOrReplaceGlobalTempView("task1Values")
Code in scalatask2:
spark.sql("select * from global_temp.task1values;").filter($"key"==="task1key2").select("value").collect()(0)(0)
Here, you get the task1 table and filter out the required key.