In Azure Data Factory, I tried to modify a JSON source's attribute names, where those attributes(subcolumns) are in a nested array.
Source JSON:
[
{
"brand": "Toyota",
"models": [
{ "name": "Corolla", "year": 2020 },
{ "name": "Yaris", "year": 2021}
]
},
{
"brand": "BMW",
"models": [
{ "name": "X5", "year": 2020 },
{ "name": "X6", "year": 2023 }
]
}
]
Expected result:
[
{
"carBrand": "Toyota",
"models": [
{ "modelName": "Corolla", "marketingYear": 2020 },
{ "modelName": "Yaris", "marketingYear": 2021 }
]
},
{
"carBrand": "BMW",
"models": [
{ "modelName": "X5", "marketingYear": 2020 },
{ "modelName": "X6", "marketingYear": 2023 }
]
}
]
From pipeline / dataflow, I used Derived Columns:
models = @(modelName=models.name, marketingYear=models.year)
It did create 'modelName' and 'marketingYear', whereas both became array format and collected all items result into one which is not what I need.
{"brand":"Toyota","models":{"modelName":["Corolla","Yaris"],"marketingYear":[2020,2021]}}
{"brand":"BMW","models":{"modelName":["X5","X6"],"marketingYear":[2020,2023]}}
Do you have any suggestion? Thanks!
In order to rename the columns in the complex array data, you can flatten the data and rename and then aggregate the data. Below is the approach
Take the flatten activity and Unroll by models[]
value. While mapping the input and output columns , rename the column as required. Check the below screenshot.
group by: carBrand
Aggregates: model=collect(@(modelName=modelName, marketingYear=marketingYear))
Output
{"carBrand":"BMW","models":[{"modelName":"X5","marketingYear":2020},{"modelName":"X6","marketingYear":2023}]}
{"carBrand":"Toyota","models":[{"modelName":"Corolla","marketingYear":2020},{"modelName":"Yaris","marketingYear":2021}]}