Search code examples
azure-devopsyamldatabricksazure-databricks

How to schedule a databricks notebook in Devops pipeline


Below is the code I am using to schedule a Databricks notebook from devops pipeline.

- task: Bash@3
  displayName: 'Schedule Databricks Notebook'
  inputs:
    targetType: 'inline'
    script: |
      personalAccessToken='dapie7'  
      databricksUrl='https://adb-13946.6.azuredatabricks.net/api/2.0'
      notebookPath='/Users/notebook.py/'
      jobName='ScheduledJobName'

      requestUri="$databricksUrl/jobs/create"

      body='{
        "name": "'$jobName'",
        "new_cluster": {
          "spark_version": "7.0.x",
          "node_type_id": "Standard_DS3_v2"
        },
        "notebook_task": {
          "notebook_path": "'$notebookPath'"
        },
        "schedule": "@daily",
        "max_retries": 0,
        "timezone_id": "Canada/Eastern",
        "cron_schedule": "45 8 * * *"
      }'

      # Encode PAT to Base64
      patBase64=$(echo -n ":$personalAccessToken" | base64)

      # Make the API request
      curl -X POST -H "Authorization: Basic $patBase64" -H "Content-Type: application/json" -d "$body" "$requestUri"

This is my log.

Starting: Schedule Databricks Notebook
==============================================================================
Task         : Bash
Description  : Run a Bash script on macOS, Linux, or Windows
Version      : 3.229.0
Author       : Microsoft Corporation
Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/bash
==============================================================================
Generating script.
========================== Starting Command Output ===========================
/usr/bin/bash /home/vsts/work/_temp/d7e8884a-073dfc.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   402  100    54  100   348    277   1789 --:--:-- --:--:-- --:--:--  2072
Finishing: Schedule Databricks Notebook

I do not see the job being created in the databricks. Is there any other way to schedule a job from devops pipeline or any corrections to the above code.

Thanks

Running a different notebook along with scheduling the 1st notebook.

- task: Bash@3
  displayName: 'Schedule Databricks Notebook'
  inputs:
    targetType: 'inline'
    script: | 
      
      databricksUrl='https://adb-13946046.6.azuredatabricks.net/api/2.0'
      notebookPath='/Users/user/notebook1.py'
      notebookPath1='/Users/user/notebook2.py'
      

      jobName='ScheduledJobName1'
      jobName2='testjob'

      requestUri="$databricksUrl/jobs/create"
      requestUriRun="$databricksUrl/jobs/run-now"

      body='{
        "name": "'$jobName'",
        "new_cluster": {
          "spark_version": "7.0.x",
          "node_type_id": "Standard_DS3_v2",
          "num_workers": 0
        },
        "notebook_task": {
          "notebook_path": "'$notebookPath'"
        },
        "schedule": {
            "quartz_cron_expression": "45 10 * * * ?",
            "timezone_id": "Canada/Eastern"
        }
      }'

      body2='{
        "name": "'$jobName2'",
        "new_cluster": {
          "spark_version": "7.0.x",
          "node_type_id": "Standard_DS3_v2",
          "num_workers": 0
        },
        "notebook_task": {
          "notebook_path": "'$notebookPath1'"
        }
      }'

      # Make the API request
      curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body" "$requestUri"
      curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body2" "$requestUriRun"
     

Solution

  • Fixed with below yaml and confirm it works:

    - task: Bash@3
      displayName: 'Schedule Databricks Notebook'
      inputs:
        targetType: 'inline'
        script: | 
          databricksUrl='https://adb-1305924866893174.5.azuredatabricks.net/api/2.0'
          notebookPath='/Users/user/Notebook1'
          jobName='ScheduledJobName1'
    
          requestUri="$databricksUrl/jobs/create"
    
          body='{
            "name": "'$jobName'",
            "new_cluster": {
              "spark_version": "7.0.x",
              "node_type_id": "Standard_DS3_v2",
              "num_workers": 0
            },
            "notebook_task": {
              "notebook_path": "'$notebookPath'"
            },
            "schedule": {
                "quartz_cron_expression": "45 8 * * * ?",
                "timezone_id": "Canada/Eastern"
            }
          }'
    
          # Make the API request
          curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body" "$requestUri"
    
    

    Check on Databricks, the job is created:

    enter image description here

    1. Generate the token from Databriks, it's bearer type not basic. add the variable in pipeline: enter image description here

    2. Add num_workers value as 0 for a single-node cluster.

    3. cron syntax is different, please check link.