Search code examples
azurepysparkazure-data-factorydatabricks

Pass json array from databricks to ADF as parameter/variable


I am trying to pass dataframe as json from Databricks to Azure Data Factory. I use

for column in df.schema:
    df = df.withColumn(column.name, col(column.name).cast("string"))

df = df.fillna('')

json_df = df.select(to_json(struct(*df.columns)).alias("json"))
json_array = [row["json"] for row in json_df.collect()]

decoded_list = [json.loads(row_str) for row_str in json_array]
dbutils.notebook.exit(json.dumps(decoded_list))

and my problem in ADF is that -> I set first variable("to send") with output from that notebook as array then I generate for each. The problem is either my code sends it as dictionary and it expects string value (when I change variable data type to array) or when it fail before variable saying that is expected string but I've gave it array (when variable stays as a string). However, when I send a string and loop for each I got backslashes which WebActivity reads as incorrect JSON format. How to convert it properly that:

it sends a string/array without backslashes in json format?

2023-10-02 10:00:00

My solution I've adjusted my notebook a little bit, got rid of

decoded_list = [json.loads(row_str) for row_str in json_array]

and provided brackets as well as replace backslashes with doublequotes in notebook due to API endpoint requirements instead

wrapped_json_array = ['[' + s.replace('\\"', '"') +']' for s in json_array]

with wrapped_json_array on dbutils exit


Solution

  • Your exit value from notebook is fine. The problem is you need to set the variable correctly. Below is the configuration in Set Variable activity.

    @activity('Notebook1').output.runOutput

    enter image description here

    Then used For Each activity to make web request.

    Web activity configuration.

    enter image description here

    And it successfully executed.

    Notebook activity output:

    enter image description here

    Next Set variable activity output:

    enter image description here

    {
        "name": "send",
        "value": [
            {
                "metadata#uniqueid": "1",
                "metadata#FTA": "true"
            },
            {
                "metadata#uniqueid": "2",
                "metadata#FTA": "false"
            }
        ]
    }
    

    This value sent to for each and made web request.

    enter image description here

    Make Set Varaible activity of type Array and add above mentioned notebook activity output, don't send any strings because json.dumps on that strings gives you slash. Keep notebook exit value same dbutils.notebook.exit(json.dumps(decoded_list))