I have a data in Azure datalake I need to create a pipeline to set it in the required format given below and load it in the datalake (in my case destination is an API) without using dataflow transformation.
sample input text file:
BannerNumber StoreGroupNumber StoreGroupName MemberNumber MemberName StoreNumbers
21 83 SD 225 SD2 "5","10","20","30","40","41","42","43","44","93"
Required json output:
{
"BannerNumber": "21",
"StoreGroupNumber": "83",
"StoreGroupName": "SD",
"StoreGroupMembers": [
{
"MemberId": "225",
"MemberLabel": "SD2",
"StoreNumbers": [
{
"StoreId": "5"
},
{
"StoreId": "10"
},
{
"StoreId": "20"
},
{
"StoreId": "30"
},
{
"StoreId": "40"
},
{
"StoreId": "41"
},
{
"StoreId": "42"
},
{
"StoreId": "43"
},
{
"StoreId": "44"
},
{
"StoreId": "93"
}
],
"MemberAddedOn": "2023-10-23 14:09:29.0430000",---Need to be todays date
}
]
}
With dataflow i have done but as per the requirement they need it without using the dataflow
As these kind of requirement involves row and column operations Dataflow is the better option than below method.
Could you please do it for this sample data.....i.imgur.com/CE6HYbt.png? which you have created
NOTE: This will only work for this kind of data and in this method it involves some manual steps to build the required JSON.
I took a tab seperated csv file with multiple rows as sample data:
BannerNumber StoreGroupNumber StoreGroupName MemberNumber MemberName StoreNumbers
21 83 SD 225 SD2 "5","10","20","30","40","41","42","43","44","93"
22 84 SA 226 SD3 "6","11","21","31","41","42","43","44","45","94"
23 85 SB 227 SD4 "7","12","22","32","42","43","44","45","46","95"
As it is Tab seperated, Give column delimiter as \t
in the dataset settings.
In the pipeline, take a lookup activity for the above dataset and uncheck the First row only in it. This lookup will give the JSON array and we can use this JSON array to build the required JSON array.
Create two variables json_arr
,curr_date
of array and string type respectively. The extra variable is only for showing the result here and its not necessary.
Then, take a set variable activity and give the below expression for the curr_date
expression. This is for the date in required JSON.
@utcnow('yyyy-MM-dd HH:mm:ss.fffffff')
Give the lookup activity output array @activity('Lookup1').output.value
to the ForEach activity and make sure you check the Seuential
in the ForEach.
Inside ForEach activity, take an append variable activity for the json_arr
variable with the below expression.
@json(concat('{"BannerNumber":"',item().BannerNumber,'","StoreGroupNumber":"',item().StoreGroupNumber,'","StoreGroupName":"',item().StoreGroupName,'","StoreGroupMembers":[{"MemberId":"',item().MemberNumber,'","MemberLabel":"',item().MemberName,'","StoreNumbers":[{"StoreId":',replace(item().StoreNumbers,',','},{"StoreId":'),'}],"MemberAddedOn": "',variables('curr_date'),'"}]}'))
This will append every row as object to the json_arr
variable in each iteration.
After ForEach, the result JSON array will be like this:
[
{
"BannerNumber": "21",
"StoreGroupNumber": "83",
"StoreGroupName": "SD",
"StoreGroupMembers": [
{
"MemberId": "225",
"MemberLabel": "SD2",
"StoreNumbers": [
{
"StoreId": "5"
},
{
"StoreId": "10"
},
{
"StoreId": "20"
},
{
"StoreId": "30"
},
{
"StoreId": "40"
},
{
"StoreId": "41"
},
{
"StoreId": "42"
},
{
"StoreId": "43"
},
{
"StoreId": "44"
},
{
"StoreId": "93"
}
],
"MemberAddedOn": "2023-11-15 07:06:10.6985713"
}
]
},
{
"BannerNumber": "22",
"StoreGroupNumber": "84",
"StoreGroupName": "SA",
"StoreGroupMembers": [
{
"MemberId": "226",
"MemberLabel": "SD3",
"StoreNumbers": [
{
"StoreId": "6"
},
{
"StoreId": "11"
},
{
"StoreId": "21"
},
{
"StoreId": "31"
},
{
"StoreId": "41"
},
{
"StoreId": "42"
},
{
"StoreId": "43"
},
{
"StoreId": "44"
},
{
"StoreId": "45"
},
{
"StoreId": "94"
}
],
"MemberAddedOn": "2023-11-15 07:06:10.6985713"
}
]
},
{
"BannerNumber": "23",
"StoreGroupNumber": "85",
"StoreGroupName": "SB",
"StoreGroupMembers": [
{
"MemberId": "227",
"MemberLabel": "SD4",
"StoreNumbers": [
{
"StoreId": "7"
},
{
"StoreId": "12"
},
{
"StoreId": "22"
},
{
"StoreId": "32"
},
{
"StoreId": "42"
},
{
"StoreId": "43"
},
{
"StoreId": "44"
},
{
"StoreId": "45"
},
{
"StoreId": "46"
},
{
"StoreId": "95"
}
],
"MemberAddedOn": "2023-11-15 07:06:10.6985713"
}
]
}
]
in my case destination is an API
You can send this JSON to your destination API using web activity POST method.
This is my pipeline JSON for your reference:
{
"name": "pipeline2",
"properties": {
"activities": [
{
"name": "Lookup1",
"type": "Lookup",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"dataset": {
"referenceName": "source_csv",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Date",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Lookup1').output.value",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Append variable1",
"type": "AppendVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "json_arr",
"value": {
"value": "@json(concat('{\"BannerNumber\":\"',item().BannerNumber,'\",\"StoreGroupNumber\":\"',item().StoreGroupNumber,'\",\"StoreGroupName\":\"',item().StoreGroupName,'\",\"StoreGroupMembers\":[{\"MemberId\":\"',item().MemberNumber,'\",\"MemberLabel\":\"',item().MemberName,'\",\"StoreNumbers\":[{\"StoreId\":',replace(item().StoreNumbers,',','},{\"StoreId\":'),'}],\"MemberAddedOn\": \"',variables('curr_date'),'\"}]}'))",
"type": "Expression"
}
}
}
]
}
},
{
"name": "Set variable1",
"type": "SetVariable",
"dependsOn": [
{
"activity": "ForEach1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "res_show_arr",
"value": {
"value": "@variables('json_arr')",
"type": "Expression"
}
}
},
{
"name": "Date",
"type": "SetVariable",
"dependsOn": [
{
"activity": "Lookup1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"variableName": "curr_date",
"value": {
"value": "@utcnow('yyyy-MM-dd HH:mm:ss.fffffff')",
"type": "Expression"
}
}
}
],
"variables": {
"json_arr": {
"type": "Array"
},
"res_show_arr": {
"type": "Array"
},
"curr_date": {
"type": "String"
}
},
"annotations": []
}
}