I am trying to schedule notebook1 and I want to just run notebook2 only once using the devops pipeline. Please find the below code I used.
- task: Bash@3
displayName: 'Schedule Databricks Notebook'
inputs:
targetType: 'inline'
script: |
databricksUrl='https://adb-13946046.6.azuredatabricks.net/api/2.0'
notebookPath1='/Users/user/notebook1.py'
notebookPath2='/Users/user/notebook2.py'
jobName='ScheduledJobName1'
jobName2='testjob'
requestUri="$databricksUrl/jobs/create"
requestUriRun="$databricksUrl/jobs/run-now"
body='{
"name": "'$jobName'",
"new_cluster": {
"spark_version": "7.0.x",
"node_type_id": "Standard_DS3_v2",
"num_workers": 0
},
"notebook_task": {
"notebook_path": "'$notebookPath1'"
},
"schedule": {
"quartz_cron_expression": "45 10 * * * ?",
"timezone_id": "Canada/Eastern"
}
}'
body2='{
"name": "'$jobName2'",
"new_cluster": {
"spark_version": "7.0.x",
"node_type_id": "Standard_DS3_v2",
"num_workers": 0
},
"notebook_task": {
"notebook_path": "'$notebookPath2'"
}
}'
# Make the API request
curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body" "$requestUri"
curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body2" "$requestUriRun"
But I only see notebook1 is scheduled and notebook2 is not run. Please assist.
Thanks
The jobs/run-now endpoint allows you to run an existing job by specifying its job ID. You need to predefine the job and get the job id.
For example, i created a notebook with code inside:
x = int(dbutils.widgets.get("x"))
y = int(dbutils.widgets.get("y"))
# Perform some calculation
z = x + y
# Return the output as JSON
import json
output = {"result": z}
dbutils.notebook.exit(json.dumps(output))
Then create a job jobtest
in workflow, get the jobid from browser:
To trigger the job, use rest api code below in devops yaml:
- task: Bash@3
displayName: 'Schedule Databricks Notebook'
inputs:
targetType: 'inline'
script: |
databricksUrl='https://adb-8152044424115302.3.azuredatabricks.net/api/2.0'
requestUriRunnow="$databricksUrl/jobs/run-now"
body3='{
"job_id": 880800206549193,
"notebook_params": {
"x": "30",
"y": "50"
}
}'
# Make the API request
curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body3" "$requestUriRunnow"
It runs the job with parameters defined on databricks:
Another endpoint is jobs/runs/submit
, it allows you to create and run a one-time job run without creating a job.
yaml sample:
- task: Bash@3
displayName: 'Schedule Databricks Notebook'
inputs:
targetType: 'inline'
script: |
databricksUrl='https://adb-8152044424015301.3.azuredatabricks.net/api/2.0'
notebookPath1='/Users/user/Notebook2'
requestUriRun="$databricksUrl/jobs/runs/submit"
body='{
"run_name": "test-run",
"new_cluster": {
"spark_version": "7.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"num_workers": 1
},
"notebook_task": {
"notebook_path": "'$notebookPath1'",
"base_parameters": {
"x": "15",
"y": "20"
}
}
}'
# Make the API request
curl -X POST -H "Authorization: Bearer $(token)" -H "Content-Type: application/json" -d "$body" "$requestUriRun"
Check jobs run: