I have a pipeline which ingest data from ERP system to datalake (bronze layer) then invoke notebook in Databricks, clean, do some transformations and save it into silver layer and then once again, some business transformations and golden layer. Everything is stored as either parquet or delta file. It works very good, however I need to POST the output from gold layer via REST API in Azure Data Factory. It means all the columns with associated values needs to be posted this way. I've tried to create a ForEach folder with Web Activity inside it, put url api end point, authorization and body, but I struggle to get output from the last notebook. Tried @activity('SilverToGoldenBusinessTransformations').output.value
but the result is :
The expression 'length(activity('SilverToGoldenBusinessTransformations').output.value)' cannot be evaluated because property 'value' doesn't exist, available properties are 'runPageUrl, effectiveIntegrationRuntime, executionDuration, durationInQueue, billingReference'.
The body of web activity looks like this (not posting all of fields, just first 4)
{
"id":"@item().id",
"product":"@item().product",
"plant":"@item().plant",
"status":"@item().status"
}
Could you please help me as its the last step before finalizing this pipeline and I seem to oversight something. Thank you in advance
I tried the same expression @activity('Notebook1').output.value
in ForEach and I got same error.
If you want to get the output from the Notebook, first you need to return the JSON array from the Notebook in databricks.
You need to use dbutils.notebook.exit(<your_json_array>)
at the end of the Notebook.
Here, for sample I am returning the below JSON array from Notebook to notebook activity.
json_array=[{"id":1,"product":"Amara","plant":"Taplin","status":"Available"},
{"id":2,"product":"Laptop","plant":"Green","status":"Available"},
{"id":3,"product":"Phone","plant":"Brinjal","status":"Unavailable"}]
dbutils.notebook.exit(json_array)
You need to store your data or dataframe as JSON array and return your JSON array like above.
You can see that JSON array in the Notebook activity output.
To give this array to ForEach, use the below expression in ForEach.
@activity('Notebook1').output.runOutput
You can see that my pipeline executed successfully after giving above expression.
NOTE: Notebook activity return output has a limitation of 4 MB (5000 rows), if your returning JSON size more than this, then this method won't work for you.