Search code examples
jsonazuremappingazure-data-factorygoogle-cloud-dataflow

Mapping nested JSON Object to SQL table through Data flow in Azure Data Factory


I have a JSON object containing a nested array without keys. I want to extract all elements from this nested array, assign each element to a separate column with appropriate names, and sink the data into a SQL table. The goal is to generate a single record for each "name" and "display_name" pair.

Below is the input JSON object.

{
"took": 573,
"_revision": "8ac1368",
"response": {
    "accounts": {
        "hits": [{
                "name": "Nilesh",
                "display_name": "Nilesh",
                "selected_fields": [
                    "Class",
                    null,
                    10,
                    "Physics",
                    17,
                    14,
                    "Rohit"
                ]
            },
            {
                "name": "Ronak",
                "display_name": "Ronak",
                "selected_fields": [
                    "Class",
                    null,
                    10,
                    "Chemistry",
                    18,
                    14,
                    "Rahul"
                ]
            },
            {
                "name": "Puneet",
                "display_name": "Puneet",
                "selected_fields": [
                    "Class",
                    null,
                    10,
                    "Mathematics",
                    13,
                    15,
                    "Priyam"
                ]
            },
            {
                "name": "Sanyam",
                "display_name": "Sanyam",
                "selected_fields": [
                    "Class",
                    null,
                    10,
                    "English",
                    14,
                    15,
                    "Priyanshu"
                ]
            }
        ],
        "stats": {},
        "total_hits": 5811
    }
},
"service_id": "25163",
"_type": "status",
"_version": 1,
"status": "success"

}

Below is the output I'm looking for: enter image description here

I am trying the below approach in data flow: enter image description here

I am stuck at this point and struggling to expand the row data into columns (see the output screenshot above). I am looking for guidance on the next steps.


Solution

  • To flatten the array data as per the output, you need to give below settings in ADF dataflow.

    • Take the flatten transformation next to source. Give the Unroll by and Unroll root value as response.accounts.hits[] array.
    • Then map only the below three columns.
    name = response.accounts.hits.name,
    Display_name = response.accounts.hits.display_name,
    response = response.accounts.hits.selected_fields
    

    • Then take the derived column transformation to convert each array element into a new column.
    identify = response[1],
    {Blood group} = response[2],
    Standard = response[3],
    Subject = response[4],
    {Roll no} = response[5],
    Age = response[6],
    {Class Teacher} = response[7]
    

    • Then take the select transformation and deselect the response array from it. Output of select transformation: