I am trying to pass dataframe as json from Databricks to Azure Data Factory. I use
for column in df.schema:
df = df.withColumn(column.name, col(column.name).cast("string"))
df = df.fillna('')
json_df = df.select(to_json(struct(*df.columns)).alias("json"))
json_array = [row["json"] for row in json_df.collect()]
decoded_list = [json.loads(row_str) for row_str in json_array]
dbutils.notebook.exit(json.dumps(decoded_list))
and my problem in ADF is that -> I set first variable("to send") with output from that notebook as array then I generate for each. The problem is either my code sends it as dictionary and it expects string value (when I change variable data type to array) or when it fail before variable saying that is expected string but I've gave it array (when variable stays as a string). However, when I send a string and loop for each I got backslashes which WebActivity reads as incorrect JSON format. How to convert it properly that:
it sends a string/array without backslashes in json format?
My solution I've adjusted my notebook a little bit, got rid of
decoded_list = [json.loads(row_str) for row_str in json_array]
and provided brackets as well as replace backslashes with doublequotes in notebook due to API endpoint requirements instead
wrapped_json_array = ['[' + s.replace('\\"', '"') +']' for s in json_array]
with wrapped_json_array
on dbutils exit
Your exit value from notebook is fine.
The problem is you need to set the variable correctly.
Below is the configuration in Set Variable
activity.
@activity('Notebook1').output.runOutput
Then used For Each activity
to make web request.
Web
activity configuration.
And it successfully executed.
Notebook activity output:
Next Set variable
activity output:
{
"name": "send",
"value": [
{
"metadata#uniqueid": "1",
"metadata#FTA": "true"
},
{
"metadata#uniqueid": "2",
"metadata#FTA": "false"
}
]
}
This value sent to for each and made web request.
Make Set Varaible
activity of type Array
and add above mentioned notebook activity output, don't send any strings because
json.dumps
on that strings gives you slash.
Keep notebook exit value same dbutils.notebook.exit(json.dumps(decoded_list))