python pyspark databricks azure-databricks

notebook to execute Databricks job

Is there an api or other way to programmatically run a Databricks job. Ideally, we would like to call a Databricks job from a notebook. Following just gives currently running job id but that's not very useful:

dbutils.notebook.entry_point.getDbutils().notebook().getContext().currentRunId().toString()

Solution

To run a databricks job, you can use Jobs API. I have a databricks job called for_repro which I ran using the 2 ways provided below from databricks notebook.

Using requests library:

You can create an access token by navigating to Settings -> User settings. Under Access token tab, click generate token.
Use the above generated token along with the following code.

import requests
import json

my_json = {"job_id": <your_job-id>}    

auth = {"Authorization": "Bearer <your_access-token>"}

response = requests.post('https://<databricks-instance>/api/2.0/jobs/run-now', json = my_json, headers=auth).json()
print(response)

enter image description here

The <databricks-instance> value from the above code can be extracted from your workspace URL.

enter image description here

Using %sh magic command script:

You can also use magic command %sh in your python notebook cell to run a databricks job.

%sh

curl --netrc --request POST --header "Authorization: Bearer <access_token>" \
https://<databricks-instance>/api/2.0/jobs/run-now \
--data '{"job_id": <your job id>}'

enter image description here

The following is my job details and run history for reference.

Refer to this Microsoft documentation to know all other operations that can be achieved using Jobs API.