Search code examples
pythonpysparkdatabricksazure-databricks

notebook to execute Databricks job


Is there an api or other way to programmatically run a Databricks job. Ideally, we would like to call a Databricks job from a notebook. Following just gives currently running job id but that's not very useful:

dbutils.notebook.entry_point.getDbutils().notebook().getContext().currentRunId().toString()

Solution

  • To run a databricks job, you can use Jobs API. I have a databricks job called for_repro which I ran using the 2 ways provided below from databricks notebook.

    Using requests library:

    • You can create an access token by navigating to Settings -> User settings. Under Access token tab, click generate token.
    • Use the above generated token along with the following code.
    import requests
    import json
    
    my_json = {"job_id": <your_job-id>}    
    
    auth = {"Authorization": "Bearer <your_access-token>"}
    
    response = requests.post('https://<databricks-instance>/api/2.0/jobs/run-now', json = my_json, headers=auth).json()
    print(response)
    

    enter image description here


    • The <databricks-instance> value from the above code can be extracted from your workspace URL.

    enter image description here


    Using %sh magic command script:

    • You can also use magic command %sh in your python notebook cell to run a databricks job.
    %sh
    
    curl --netrc --request POST --header "Authorization: Bearer <access_token>" \
    https://<databricks-instance>/api/2.0/jobs/run-now \
    --data '{"job_id": <your job id>}'
    

    enter image description here

    • The following is my job details and run history for reference. enter image description here

    Refer to this Microsoft documentation to know all other operations that can be achieved using Jobs API.