azure-devops yaml databricks azure-databricks

How to schedule a Databricks notebook and run a different notebook from the same Devops pipeline

I am trying to schedule notebook1 and I want to just run notebook2 only once using the devops pipeline. Please find the below code I used.

- task: Bash@3
  displayName: 'Schedule Databricks Notebook'
  inputs:
    targetType: 'inline'
    script: | 
      
      databricksUrl='https://adb-13946046.6.azuredatabricks.net/api/2.0'
      notebookPath1='/Users/user/notebook1.py'
      notebookPath2='/Users/user/notebook2.py'
      

      jobName='ScheduledJobName1'
      jobName2='testjob'

      requestUri="$databricksUrl/jobs/create"
      requestUriRun="$databricksUrl/jobs/run-now"

      body='{
        "name": "'$jobName'",
        "new_cluster": {
          "spark_version": "7.0.x",
          "node_type_id": "Standard_DS3_v2",
          "num_workers": 0
        },
        "notebook_task": {
          "notebook_path": "'$notebookPath1'"
        },
        "schedule": {
            "quartz_cron_expression": "45 10 * * * ?",
            "timezone_id": "Canada/Eastern"
        }
      }'

      body2='{
        "name": "'$jobName2'",
        "new_cluster": {
          "spark_version": "7.0.x",
          "node_type_id": "Standard_DS3_v2",
          "num_workers": 0
        },
        "notebook_task": {
          "notebook_path": "'$notebookPath2'"
        }
      }'

      # Make the API request
      curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body" "$requestUri"
      curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body2" "$requestUriRun"

But I only see notebook1 is scheduled and notebook2 is not run. Please assist.

Thanks

Solution

The jobs/run-now endpoint allows you to run an existing job by specifying its job ID. You need to predefine the job and get the job id.

For example, i created a notebook with code inside:

x = int(dbutils.widgets.get("x"))
y = int(dbutils.widgets.get("y"))

# Perform some calculation
z = x + y

# Return the output as JSON
import json
output = {"result": z}
dbutils.notebook.exit(json.dumps(output))

Then create a job jobtest in workflow, get the jobid from browser:

To trigger the job, use rest api code below in devops yaml:

- task: Bash@3
  displayName: 'Schedule Databricks Notebook'
  inputs:
    targetType: 'inline'
    script: | 
      databricksUrl='https://adb-8152044424115302.3.azuredatabricks.net/api/2.0'
      requestUriRunnow="$databricksUrl/jobs/run-now"

      body3='{
        "job_id": 880800206549193,
        "notebook_params": {
            "x": "30",
            "y": "50"
          }
      }'
      # Make the API request
      curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body3" "$requestUriRunnow"

It runs the job with parameters defined on databricks:

Another endpoint is jobs/runs/submit, it allows you to create and run a one-time job run without creating a job.

yaml sample:

- task: Bash@3
  displayName: 'Schedule Databricks Notebook'
  inputs:
    targetType: 'inline'
    script: | 
      databricksUrl='https://adb-8152044424015301.3.azuredatabricks.net/api/2.0'
      notebookPath1='/Users/user/Notebook2'
      requestUriRun="$databricksUrl/jobs/runs/submit"
      body='{
                "run_name": "test-run",
                "new_cluster": {
                  "spark_version": "7.3.x-scala2.12",
                  "node_type_id": "Standard_DS3_v2",
                  "num_workers": 1
                },
                "notebook_task": {
                  "notebook_path": "'$notebookPath1'",
                  "base_parameters": {
                      "x": "15",
                      "y": "20"
                  }
                }
      }'

      # Make the API request
      curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body" "$requestUriRun"

Check jobs run: