Search code examples
azure-devopsyamldatabricksazure-databricks

How to schedule a Databricks notebook and run a different notebook from the same Devops pipeline


I am trying to schedule notebook1 and I want to just run notebook2 only once using the devops pipeline. Please find the below code I used.

- task: Bash@3
  displayName: 'Schedule Databricks Notebook'
  inputs:
    targetType: 'inline'
    script: | 
      
      databricksUrl='https://adb-13946046.6.azuredatabricks.net/api/2.0'
      notebookPath1='/Users/user/notebook1.py'
      notebookPath2='/Users/user/notebook2.py'
      

      jobName='ScheduledJobName1'
      jobName2='testjob'

      requestUri="$databricksUrl/jobs/create"
      requestUriRun="$databricksUrl/jobs/run-now"

      body='{
        "name": "'$jobName'",
        "new_cluster": {
          "spark_version": "7.0.x",
          "node_type_id": "Standard_DS3_v2",
          "num_workers": 0
        },
        "notebook_task": {
          "notebook_path": "'$notebookPath1'"
        },
        "schedule": {
            "quartz_cron_expression": "45 10 * * * ?",
            "timezone_id": "Canada/Eastern"
        }
      }'

      body2='{
        "name": "'$jobName2'",
        "new_cluster": {
          "spark_version": "7.0.x",
          "node_type_id": "Standard_DS3_v2",
          "num_workers": 0
        },
        "notebook_task": {
          "notebook_path": "'$notebookPath2'"
        }
      }'

      # Make the API request
      curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body" "$requestUri"
      curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body2" "$requestUriRun"

But I only see notebook1 is scheduled and notebook2 is not run. Please assist.

Thanks


Solution

  • The jobs/run-now endpoint allows you to run an existing job by specifying its job ID. You need to predefine the job and get the job id.

    For example, i created a notebook with code inside:

    x = int(dbutils.widgets.get("x"))
    y = int(dbutils.widgets.get("y"))
    
    # Perform some calculation
    z = x + y
    
    # Return the output as JSON
    import json
    output = {"result": z}
    dbutils.notebook.exit(json.dumps(output))
    

    Then create a job jobtest in workflow, get the jobid from browser:

    enter image description here

    To trigger the job, use rest api code below in devops yaml:

    - task: Bash@3
      displayName: 'Schedule Databricks Notebook'
      inputs:
        targetType: 'inline'
        script: | 
          databricksUrl='https://adb-8152044424115302.3.azuredatabricks.net/api/2.0'
          requestUriRunnow="$databricksUrl/jobs/run-now"
    
          body3='{
            "job_id": 880800206549193,
            "notebook_params": {
                "x": "30",
                "y": "50"
              }
          }'
          # Make the API request
          curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body3" "$requestUriRunnow"
    

    It runs the job with parameters defined on databricks:

    enter image description here

    Another endpoint is jobs/runs/submit, it allows you to create and run a one-time job run without creating a job.

    yaml sample:

    - task: Bash@3
      displayName: 'Schedule Databricks Notebook'
      inputs:
        targetType: 'inline'
        script: | 
          databricksUrl='https://adb-8152044424015301.3.azuredatabricks.net/api/2.0'
          notebookPath1='/Users/user/Notebook2'
          requestUriRun="$databricksUrl/jobs/runs/submit"
          body='{
                    "run_name": "test-run",
                    "new_cluster": {
                      "spark_version": "7.3.x-scala2.12",
                      "node_type_id": "Standard_DS3_v2",
                      "num_workers": 1
                    },
                    "notebook_task": {
                      "notebook_path": "'$notebookPath1'",
                      "base_parameters": {
                          "x": "15",
                          "y": "20"
                      }
                    }
          }'
    
          # Make the API request
          curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body" "$requestUriRun"
    

    Check jobs run:

    enter image description here