Search code examples
pythonazureazure-data-factorydatabricks

How can i save into variable STATUS of the databricks job based on run_id


enter image description here

I need to save into variable in Databricks status of the job(IN PORGRESS/FAILED/SUCCEEDED).Databricks notebook is triggered via ADF.

I tried below code but its not working -

I get NONE

import json
context_str = dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson() 
context = json.loads(context_str)
run_status_obj = context.get('currentRunState', {})
run_status = run_status_obj.get('Status', None) if run_status_obj else None
print(run_status)

Solution

  • You can update the status within your ADF itself instead of changing your notebooks. Use look up activities to update your SQL.

    1. Lookup activity to add a new row with some unique id, date and status = InProgress'.
    2. Execute your ADB notebook
    3. If successful,update your SQL row with 'Success'
    4. If ADB notebook failed, update SQL row with 'Failed'.

    Simplistic ADF flow will look like this. enter image description here

    When I executed the pipeline, I first had my new SQL row ('0002') added as below. enter image description here

    When my pipeline completed, my same row was as below with 'Failed' status. enter image description here

    I purposely failed my ADB notebook. But ofcourse if you test with a proper run notebook you will see your row get updated to 'Success'.

    For the lookup activities, I use the below queries which you can use as a base and build upon.

    INSERT INTO [dbo].[ADBstatus] ([id],[date],[status]) VALUES ('0002', '2022-02-02', 'InProgress');
    select 0 id;
    
    UPDATE [dbo].[ADBstatus] SET [status] = 'Success' WHERE [id] = '0002';
    select 0 id;
    
    UPDATE [dbo].[ADBstatus] SET [status] = 'Failed' WHERE [id] = '0002';
    select 0 id;
    

    The additional select statement is to allow a dummy response from your SQL as LOOKUP activity mandates a response from your SQL.

    If you prefer, you also use stored procedures instead of directly updating your SQL. You can choose your own id convention. Can be the notebook name, linked service name, pipeline name etc.