Search code examples
azureazure-data-factorydatabricksazure-databricks

Sending databricks notebook output as json via POST in ADF


I have a pipeline which ingest data from ERP system to datalake (bronze layer) then invoke notebook in Databricks, clean, do some transformations and save it into silver layer and then once again, some business transformations and golden layer. Everything is stored as either parquet or delta file. It works very good, however I need to POST the output from gold layer via REST API in Azure Data Factory. It means all the columns with associated values needs to be posted this way. I've tried to create a ForEach folder with Web Activity inside it, put url api end point, authorization and body, but I struggle to get output from the last notebook. Tried @activity('SilverToGoldenBusinessTransformations').output.value but the result is :

The expression 'length(activity('SilverToGoldenBusinessTransformations').output.value)' cannot be evaluated because property 'value' doesn't exist, available properties are 'runPageUrl, effectiveIntegrationRuntime, executionDuration, durationInQueue, billingReference'.

enter image description here

The body of web activity looks like this (not posting all of fields, just first 4)

{
"id":"@item().id",
"product":"@item().product",
"plant":"@item().plant",
"status":"@item().status"
}

Could you please help me as its the last step before finalizing this pipeline and I seem to oversight something. Thank you in advance


Solution

  • I tried the same expression @activity('Notebook1').output.value in ForEach and I got same error.

    enter image description here

    If you want to get the output from the Notebook, first you need to return the JSON array from the Notebook in databricks.

    You need to use dbutils.notebook.exit(<your_json_array>) at the end of the Notebook.

    Here, for sample I am returning the below JSON array from Notebook to notebook activity.

    json_array=[{"id":1,"product":"Amara","plant":"Taplin","status":"Available"},
    {"id":2,"product":"Laptop","plant":"Green","status":"Available"},
    {"id":3,"product":"Phone","plant":"Brinjal","status":"Unavailable"}]
    
    dbutils.notebook.exit(json_array)
    

    You need to store your data or dataframe as JSON array and return your JSON array like above.

    You can see that JSON array in the Notebook activity output.

    enter image description here

    To give this array to ForEach, use the below expression in ForEach.

    @activity('Notebook1').output.runOutput
    

    You can see that my pipeline executed successfully after giving above expression.

    enter image description here

    NOTE: Notebook activity return output has a limitation of 4 MB (5000 rows), if your returning JSON size more than this, then this method won't work for you.