Search code examples
azure-data-factory

How to modify subcolumn names in a nested array in Azure Data Factory


In Azure Data Factory, I tried to modify a JSON source's attribute names, where those attributes(subcolumns) are in a nested array.

Source JSON:

[
    {
        "brand": "Toyota",
        "models": [
            { "name": "Corolla", "year": 2020 },
            { "name": "Yaris", "year": 2021}
        ]
    },
    {
        "brand": "BMW",
        "models": [
            { "name": "X5", "year": 2020 },
            { "name": "X6", "year": 2023 }
        ]
    }
]

Expected result:

[
    {
        "carBrand": "Toyota",
        "models": [
            { "modelName": "Corolla", "marketingYear": 2020 },
            { "modelName": "Yaris", "marketingYear": 2021 }
        ]
    },
    {
        "carBrand": "BMW",
        "models": [
            { "modelName": "X5", "marketingYear": 2020 },
            { "modelName": "X6", "marketingYear": 2023 }
        ]
    }
]

From pipeline / dataflow, I used Derived Columns: models = @(modelName=models.name, marketingYear=models.year)

It did create 'modelName' and 'marketingYear', whereas both became array format and collected all items result into one which is not what I need.

{"brand":"Toyota","models":{"modelName":["Corolla","Yaris"],"marketingYear":[2020,2021]}}
{"brand":"BMW","models":{"modelName":["X5","X6"],"marketingYear":[2020,2023]}}

Do you have any suggestion? Thanks!


Solution

    • In order to rename the columns in the complex array data, you can flatten the data and rename and then aggregate the data. Below is the approach

    • Take the flatten activity and Unroll by models[] value. While mapping the input and output columns , rename the column as required. Check the below screenshot.

    enter image description here

    • Then take the Aggregate transformation to aggregate the data back to original Json form.
    group by: carBrand
    Aggregates: model=collect(@(modelName=modelName, marketingYear=marketingYear))
    

    enter image description here

    Output

    {"carBrand":"BMW","models":[{"modelName":"X5","marketingYear":2020},{"modelName":"X6","marketingYear":2023}]}
    {"carBrand":"Toyota","models":[{"modelName":"Corolla","marketingYear":2020},{"modelName":"Yaris","marketingYear":2021}]}