Below is the code I am using to schedule a Databricks notebook from devops pipeline.
- task: Bash@3
displayName: 'Schedule Databricks Notebook'
inputs:
targetType: 'inline'
script: |
personalAccessToken='dapie7'
databricksUrl='https://adb-13946.6.azuredatabricks.net/api/2.0'
notebookPath='/Users/notebook.py/'
jobName='ScheduledJobName'
requestUri="$databricksUrl/jobs/create"
body='{
"name": "'$jobName'",
"new_cluster": {
"spark_version": "7.0.x",
"node_type_id": "Standard_DS3_v2"
},
"notebook_task": {
"notebook_path": "'$notebookPath'"
},
"schedule": "@daily",
"max_retries": 0,
"timezone_id": "Canada/Eastern",
"cron_schedule": "45 8 * * *"
}'
# Encode PAT to Base64
patBase64=$(echo -n ":$personalAccessToken" | base64)
# Make the API request
curl -X POST -H "Authorization: Basic $patBase64" -H "Content-Type: application/json" -d "$body" "$requestUri"
This is my log.
Starting: Schedule Databricks Notebook
==============================================================================
Task : Bash
Description : Run a Bash script on macOS, Linux, or Windows
Version : 3.229.0
Author : Microsoft Corporation
Help : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/bash
==============================================================================
Generating script.
========================== Starting Command Output ===========================
/usr/bin/bash /home/vsts/work/_temp/d7e8884a-073dfc.sh
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 402 100 54 100 348 277 1789 --:--:-- --:--:-- --:--:-- 2072
Finishing: Schedule Databricks Notebook
I do not see the job being created in the databricks. Is there any other way to schedule a job from devops pipeline or any corrections to the above code.
Thanks
Running a different notebook along with scheduling the 1st notebook.
- task: Bash@3
displayName: 'Schedule Databricks Notebook'
inputs:
targetType: 'inline'
script: |
databricksUrl='https://adb-13946046.6.azuredatabricks.net/api/2.0'
notebookPath='/Users/user/notebook1.py'
notebookPath1='/Users/user/notebook2.py'
jobName='ScheduledJobName1'
jobName2='testjob'
requestUri="$databricksUrl/jobs/create"
requestUriRun="$databricksUrl/jobs/run-now"
body='{
"name": "'$jobName'",
"new_cluster": {
"spark_version": "7.0.x",
"node_type_id": "Standard_DS3_v2",
"num_workers": 0
},
"notebook_task": {
"notebook_path": "'$notebookPath'"
},
"schedule": {
"quartz_cron_expression": "45 10 * * * ?",
"timezone_id": "Canada/Eastern"
}
}'
body2='{
"name": "'$jobName2'",
"new_cluster": {
"spark_version": "7.0.x",
"node_type_id": "Standard_DS3_v2",
"num_workers": 0
},
"notebook_task": {
"notebook_path": "'$notebookPath1'"
}
}'
# Make the API request
curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body" "$requestUri"
curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body2" "$requestUriRun"
Fixed with below yaml and confirm it works:
- task: Bash@3
displayName: 'Schedule Databricks Notebook'
inputs:
targetType: 'inline'
script: |
databricksUrl='https://adb-1305924866893174.5.azuredatabricks.net/api/2.0'
notebookPath='/Users/user/Notebook1'
jobName='ScheduledJobName1'
requestUri="$databricksUrl/jobs/create"
body='{
"name": "'$jobName'",
"new_cluster": {
"spark_version": "7.0.x",
"node_type_id": "Standard_DS3_v2",
"num_workers": 0
},
"notebook_task": {
"notebook_path": "'$notebookPath'"
},
"schedule": {
"quartz_cron_expression": "45 8 * * * ?",
"timezone_id": "Canada/Eastern"
}
}'
# Make the API request
curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body" "$requestUri"
Check on Databricks, the job is created:
Generate the token from Databriks, it's bearer type not basic. add the variable in pipeline:
Add num_workers value as 0 for a single-node cluster.
cron syntax is different, please check link.