python azure pyspark pid azure-databricks

How to get the runID or processid in Azure DataBricks?

I've been trying to get the RUNID or process id in DataBricks. And what I mean by RUNID or process id is, to generate a unique ID every time the notebook runs. Well, I tried a few methods but they are giving session-id and not the Run ID. Here are those:

dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().apply('sessionId')

i tried to invoke bash env

%sh
ps -fe

the below code is returning null:

%scala
dbutils.notebook.getContext.rootRunId

so can you please help me through this.

Thanks,

Solution

Note: Only jobs started by the Databricks executor display using the job ID specified in the stage. The job ID is the same for all instances of the job.

You can find the run ID for a particular instance in the Data Collector log.

The Databricks executor also writes the run ID of the job to the event record. To keep a record of all run IDs, enable event generation for the stage.

There are different methods to get the RunId for any given job:

Azure Databricks Portal (user Interface): By clicking on the Jobs tab, you can view all the Jobs which you have created.

Select any Job to get detailed RunId for each run.

Azure Portal (user Interface) Using Kusto Query Language: If you have configured diagnostic log delivery, you can use KQL queries to get the JobID and RunID:

Databricks REST API: You can use the below REST API command to get list of jobs and runs.

curl "https://centralus.azuredatabricks.net/api/2.0/jobs/runs/list" -X GET -H "Authorization: Bearer dapia08sjflksjs9jfra6a34a"

Hope this helps.