Search code examples
scaladatabricksazure-databricks

Databricks: Sharing information between workflow tasks using Scala


In Databricks, I would like to have a workflow with more than one task. I would like to pass information between those tasks. According to https://learn.microsoft.com/en-us/azure/databricks/workflows/jobs/share-task-context , this can be achieved with Python using

dbutils.jobs.taskValues.set(key = 'name', value = 'Some User')

and to fetch in second task

dbutils.jobs.taskValues.get(taskKey = "prev_task_name", key = "name", default = "Jane Doe")

I am, however, using jar libraries written in Scala 2.12 for my tasks.

Is there any way to achieve this in Scala? Or any ideas for workarounds?


Solution

  • Yes, as per the documentation, Scala is not supported with the taskValues subutility.

    However, if you still want to get values around a task, you can create a global temporary view and access them.

    In the example below, I tried with a Scala notebook, and the same Scala code can be added to your JAR while building it.

    Output:

    enter image description here

    Code in scalatask1

    case class taskValues(key: String, value: String)
    
    val df = Seq(new taskValues("task1key1", "task1key1value"), new taskValues("task1key2", "task1key2value"), new taskValues("task1key3", "task1key3value")).toDF
    
    df.createOrReplaceGlobalTempView("task1Values")
    

    Code in scalatask2:

    spark.sql("select * from global_temp.task1values;").filter($"key"==="task1key2").select("value").collect()(0)(0)
    

    Here, you get the task1 table and filter out the required key.

    enter image description here