Search code examples
azureazure-web-app-serviceazure-data-factory

I need to create a pipeline to set it in the required format given below and load it in the datalake


I have a data in Azure datalake I need to create a pipeline to set it in the required format given below and load it in the datalake (in my case destination is an API) without using dataflow transformation.

enter image description here

sample input text file:

BannerNumber    StoreGroupNumber    StoreGroupName  MemberNumber    MemberName  StoreNumbers
21                      83              SD           225             SD2        "5","10","20","30","40","41","42","43","44","93"

Required json output:

{
    "BannerNumber": "21",
    "StoreGroupNumber": "83",
    "StoreGroupName": "SD",
    "StoreGroupMembers": [
        {
            "MemberId": "225",
            "MemberLabel": "SD2",
            "StoreNumbers": [
                {
                    "StoreId": "5"
                },
                {
                    "StoreId": "10"
                },
                {
                    "StoreId": "20"
                },
                {
                    "StoreId": "30"
                },
                {
                    "StoreId": "40"
                },
                {
                    "StoreId": "41"
                },
                {
                    "StoreId": "42"
                },
                {
                    "StoreId": "43"
                },
                {
                    "StoreId": "44"
                },
                {
                    "StoreId": "93"
                }
            ],
            "MemberAddedOn": "2023-10-23 14:09:29.0430000",---Need to be todays date 
        }
        ]
}

With dataflow i have done but as per the requirement they need it without using the dataflow


Solution

  • As these kind of requirement involves row and column operations Dataflow is the better option than below method.

    Could you please do it for this sample data.....i.imgur.com/CE6HYbt.png? which you have created

    NOTE: This will only work for this kind of data and in this method it involves some manual steps to build the required JSON.

    I took a tab seperated csv file with multiple rows as sample data:

    BannerNumber    StoreGroupNumber    StoreGroupName  MemberNumber    MemberName  StoreNumbers
    21  83  SD  225 SD2 "5","10","20","30","40","41","42","43","44","93"
    22  84  SA  226 SD3 "6","11","21","31","41","42","43","44","45","94"
    23  85  SB  227 SD4 "7","12","22","32","42","43","44","45","46","95"
    

    As it is Tab seperated, Give column delimiter as \t in the dataset settings.

    enter image description here

    In the pipeline, take a lookup activity for the above dataset and uncheck the First row only in it. This lookup will give the JSON array and we can use this JSON array to build the required JSON array.

    Create two variables json_arr,curr_date of array and string type respectively. The extra variable is only for showing the result here and its not necessary.

    enter image description here

    Then, take a set variable activity and give the below expression for the curr_date expression. This is for the date in required JSON.

    @utcnow('yyyy-MM-dd HH:mm:ss.fffffff')
    

    Give the lookup activity output array @activity('Lookup1').output.value to the ForEach activity and make sure you check the Seuential in the ForEach.

    Inside ForEach activity, take an append variable activity for the json_arr variable with the below expression.

    @json(concat('{"BannerNumber":"',item().BannerNumber,'","StoreGroupNumber":"',item().StoreGroupNumber,'","StoreGroupName":"',item().StoreGroupName,'","StoreGroupMembers":[{"MemberId":"',item().MemberNumber,'","MemberLabel":"',item().MemberName,'","StoreNumbers":[{"StoreId":',replace(item().StoreNumbers,',','},{"StoreId":'),'}],"MemberAddedOn": "',variables('curr_date'),'"}]}'))
    

    enter image description here

    This will append every row as object to the json_arr variable in each iteration.

    After ForEach, the result JSON array will be like this:

    enter image description here

    [
            {
                "BannerNumber": "21",
                "StoreGroupNumber": "83",
                "StoreGroupName": "SD",
                "StoreGroupMembers": [
                    {
                        "MemberId": "225",
                        "MemberLabel": "SD2",
                        "StoreNumbers": [
                            {
                                "StoreId": "5"
                            },
                            {
                                "StoreId": "10"
                            },
                            {
                                "StoreId": "20"
                            },
                            {
                                "StoreId": "30"
                            },
                            {
                                "StoreId": "40"
                            },
                            {
                                "StoreId": "41"
                            },
                            {
                                "StoreId": "42"
                            },
                            {
                                "StoreId": "43"
                            },
                            {
                                "StoreId": "44"
                            },
                            {
                                "StoreId": "93"
                            }
                        ],
                        "MemberAddedOn": "2023-11-15 07:06:10.6985713"
                    }
                ]
            },
            {
                "BannerNumber": "22",
                "StoreGroupNumber": "84",
                "StoreGroupName": "SA",
                "StoreGroupMembers": [
                    {
                        "MemberId": "226",
                        "MemberLabel": "SD3",
                        "StoreNumbers": [
                            {
                                "StoreId": "6"
                            },
                            {
                                "StoreId": "11"
                            },
                            {
                                "StoreId": "21"
                            },
                            {
                                "StoreId": "31"
                            },
                            {
                                "StoreId": "41"
                            },
                            {
                                "StoreId": "42"
                            },
                            {
                                "StoreId": "43"
                            },
                            {
                                "StoreId": "44"
                            },
                            {
                                "StoreId": "45"
                            },
                            {
                                "StoreId": "94"
                            }
                        ],
                        "MemberAddedOn": "2023-11-15 07:06:10.6985713"
                    }
                ]
            },
            {
                "BannerNumber": "23",
                "StoreGroupNumber": "85",
                "StoreGroupName": "SB",
                "StoreGroupMembers": [
                    {
                        "MemberId": "227",
                        "MemberLabel": "SD4",
                        "StoreNumbers": [
                            {
                                "StoreId": "7"
                            },
                            {
                                "StoreId": "12"
                            },
                            {
                                "StoreId": "22"
                            },
                            {
                                "StoreId": "32"
                            },
                            {
                                "StoreId": "42"
                            },
                            {
                                "StoreId": "43"
                            },
                            {
                                "StoreId": "44"
                            },
                            {
                                "StoreId": "45"
                            },
                            {
                                "StoreId": "46"
                            },
                            {
                                "StoreId": "95"
                            }
                        ],
                        "MemberAddedOn": "2023-11-15 07:06:10.6985713"
                    }
                ]
            }
        ]
    

    in my case destination is an API

    You can send this JSON to your destination API using web activity POST method.

    This is my pipeline JSON for your reference:

    {
        "name": "pipeline2",
        "properties": {
            "activities": [
                {
                    "name": "Lookup1",
                    "type": "Lookup",
                    "dependsOn": [],
                    "policy": {
                        "timeout": "0.12:00:00",
                        "retry": 0,
                        "retryIntervalInSeconds": 30,
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "userProperties": [],
                    "typeProperties": {
                        "source": {
                            "type": "DelimitedTextSource",
                            "storeSettings": {
                                "type": "AzureBlobFSReadSettings",
                                "recursive": true,
                                "enablePartitionDiscovery": false
                            },
                            "formatSettings": {
                                "type": "DelimitedTextReadSettings"
                            }
                        },
                        "dataset": {
                            "referenceName": "source_csv",
                            "type": "DatasetReference"
                        },
                        "firstRowOnly": false
                    }
                },
                {
                    "name": "ForEach1",
                    "type": "ForEach",
                    "dependsOn": [
                        {
                            "activity": "Date",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@activity('Lookup1').output.value",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "Append variable1",
                                "type": "AppendVariable",
                                "dependsOn": [],
                                "userProperties": [],
                                "typeProperties": {
                                    "variableName": "json_arr",
                                    "value": {
                                        "value": "@json(concat('{\"BannerNumber\":\"',item().BannerNumber,'\",\"StoreGroupNumber\":\"',item().StoreGroupNumber,'\",\"StoreGroupName\":\"',item().StoreGroupName,'\",\"StoreGroupMembers\":[{\"MemberId\":\"',item().MemberNumber,'\",\"MemberLabel\":\"',item().MemberName,'\",\"StoreNumbers\":[{\"StoreId\":',replace(item().StoreNumbers,',','},{\"StoreId\":'),'}],\"MemberAddedOn\": \"',variables('curr_date'),'\"}]}'))",
                                        "type": "Expression"
                                    }
                                }
                            }
                        ]
                    }
                },
                {
                    "name": "Set variable1",
                    "type": "SetVariable",
                    "dependsOn": [
                        {
                            "activity": "ForEach1",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "policy": {
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "userProperties": [],
                    "typeProperties": {
                        "variableName": "res_show_arr",
                        "value": {
                            "value": "@variables('json_arr')",
                            "type": "Expression"
                        }
                    }
                },
                {
                    "name": "Date",
                    "type": "SetVariable",
                    "dependsOn": [
                        {
                            "activity": "Lookup1",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "policy": {
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "userProperties": [],
                    "typeProperties": {
                        "variableName": "curr_date",
                        "value": {
                            "value": "@utcnow('yyyy-MM-dd HH:mm:ss.fffffff')",
                            "type": "Expression"
                        }
                    }
                }
            ],
            "variables": {
                "json_arr": {
                    "type": "Array"
                },
                "res_show_arr": {
                    "type": "Array"
                },
                "curr_date": {
                    "type": "String"
                }
            },
            "annotations": []
        }
    }