python azure azure-data-factory databricks

How can i save into variable STATUS of the databricks job based on run_id

I need to save into variable in Databricks status of the job(IN PORGRESS/FAILED/SUCCEEDED).Databricks notebook is triggered via ADF.

I tried below code but its not working -

I get NONE

import json
context_str = dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson() 
context = json.loads(context_str)
run_status_obj = context.get('currentRunState', {})
run_status = run_status_obj.get('Status', None) if run_status_obj else None
print(run_status)

Solution

You can update the status within your ADF itself instead of changing your notebooks. Use look up activities to update your SQL.

Lookup activity to add a new row with some unique id, date and status = InProgress'.
Execute your ADB notebook
If successful,update your SQL row with 'Success'
If ADB notebook failed, update SQL row with 'Failed'.

Simplistic ADF flow will look like this.

When I executed the pipeline, I first had my new SQL row ('0002') added as below.

When my pipeline completed, my same row was as below with 'Failed' status.

I purposely failed my ADB notebook. But ofcourse if you test with a proper run notebook you will see your row get updated to 'Success'.

For the lookup activities, I use the below queries which you can use as a base and build upon.

INSERT INTO [dbo].[ADBstatus] ([id],[date],[status]) VALUES ('0002', '2022-02-02', 'InProgress');
select 0 id;

UPDATE [dbo].[ADBstatus] SET [status] = 'Success' WHERE [id] = '0002';
select 0 id;

UPDATE [dbo].[ADBstatus] SET [status] = 'Failed' WHERE [id] = '0002';
select 0 id;

The additional select statement is to allow a dummy response from your SQL as LOOKUP activity mandates a response from your SQL.

If you prefer, you also use stored procedures instead of directly updating your SQL. You can choose your own id convention. Can be the notebook name, linked service name, pipeline name etc.