Search code examples
azurecopyazure-data-factorymove

Copy Files from a folder to multiple folders based on the file name in Azure Data Factory


I have a parent folder in ADLS Gen2 called Source which has number of subfolders and these subfolders contain the actual data files as shown in in the below example...

***Source: ***

Folder Name: 20221212

A_20221212.txt B_20221212.txt C_20221212.txt

Folder Name: 20221219

A_20221219.txt B_20221219.txt C_20221219.txt

Folder Name: 20221226

A_20221226.txt B_20221226.txt C_20221226.txt

How can I copy files from subfolders to name specific folders (should create a new folder if it does not exist) using Azure Data Factory, please see the example below...

***Target: ***

Folder Name: A

A_20221212.txt A_20221219.txt A_20221226.txt

Folder Name: B

B_20221212.txt B_20221219.txt B_20221226.txt

Folder Name: C

C_20221212.txt C_20221219.txt C_20221226.txt

Really appreciate your and help.


Solution

  • I have reproduced the above and got below results.

    You can follow the below procedure using Get Meta data activity if you have the folder directories at same level.

    This is my source folder structure.

    data
        20221212
            A_20221212.txt
            B_20221212.txt
            C_20221212.txt`
        20221219
            A_20221219.txt
            B_20221219.txt
            C_20221219.txt
        20221226
            A_20221226.txt
            B_20221226.txt
            C_20221226.txt
    

    Source dataset:

    enter image description here

    Give this to Get Meta data activity and use ChildItems.

    Then Give the ChildItems array from Get Meta data activity to a ForEach activity. Inside ForEach I have used set variable for storing folder name.

    @split(item().name,'_')[0]
    

    enter image description here

    Now, use copy activity and in source use wild card path like below.

    enter image description here

    For sink create dataset parameters and give it copy activity sink like below.

    enter image description here

    enter image description here

    My pipeline JSON:

    {
        "name": "pipeline1",
        "properties": {
            "activities": [
                {
                    "name": "Get Metadata1",
                    "type": "GetMetadata",
                    "dependsOn": [],
                    "policy": {
                        "timeout": "0.12:00:00",
                        "retry": 0,
                        "retryIntervalInSeconds": 30,
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "userProperties": [],
                    "typeProperties": {
                        "dataset": {
                            "referenceName": "sourcetxt",
                            "type": "DatasetReference"
                        },
                        "fieldList": [
                            "childItems"
                        ],
                        "storeSettings": {
                            "type": "AzureBlobFSReadSettings",
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSettings"
                        }
                    }
                },
                {
                    "name": "ForEach1",
                    "type": "ForEach",
                    "dependsOn": [
                        {
                            "activity": "Get Metadata1",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@activity('Get Metadata1').output.childItems",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "Copy data1",
                                "type": "Copy",
                                "dependsOn": [
                                    {
                                        "activity": "Set variable1",
                                        "dependencyConditions": [
                                            "Succeeded"
                                        ]
                                    }
                                ],
                                "policy": {
                                    "timeout": "0.12:00:00",
                                    "retry": 0,
                                    "retryIntervalInSeconds": 30,
                                    "secureOutput": false,
                                    "secureInput": false
                                },
                                "userProperties": [],
                                "typeProperties": {
                                    "source": {
                                        "type": "DelimitedTextSource",
                                        "storeSettings": {
                                            "type": "AzureBlobFSReadSettings",
                                            "recursive": true,
                                            "wildcardFolderPath": "*",
                                            "wildcardFileName": {
                                                "value": "@item().name",
                                                "type": "Expression"
                                            },
                                            "enablePartitionDiscovery": false
                                        },
                                        "formatSettings": {
                                            "type": "DelimitedTextReadSettings"
                                        }
                                    },
                                    "sink": {
                                        "type": "DelimitedTextSink",
                                        "storeSettings": {
                                            "type": "AzureBlobFSWriteSettings"
                                        },
                                        "formatSettings": {
                                            "type": "DelimitedTextWriteSettings",
                                            "quoteAllText": true,
                                            "fileExtension": ".txt"
                                        }
                                    },
                                    "enableStaging": false,
                                    "translator": {
                                        "type": "TabularTranslator",
                                        "typeConversion": true,
                                        "typeConversionSettings": {
                                            "allowDataTruncation": true,
                                            "treatBooleanAsNumber": false
                                        }
                                    }
                                },
                                "inputs": [
                                    {
                                        "referenceName": "sourcetxt",
                                        "type": "DatasetReference"
                                    }
                                ],
                                "outputs": [
                                    {
                                        "referenceName": "targettxts",
                                        "type": "DatasetReference",
                                        "parameters": {
                                            "folder_name": {
                                                "value": "@variables('folder_name')",
                                                "type": "Expression"
                                            },
                                            "file_name": {
                                                "value": "@item().name",
                                                "type": "Expression"
                                            }
                                        }
                                    }
                                ]
                            },
                            {
                                "name": "Set variable1",
                                "type": "SetVariable",
                                "dependsOn": [],
                                "userProperties": [],
                                "typeProperties": {
                                    "variableName": "folder_name",
                                    "value": {
                                        "value": "@split(item().name,'_')[0]",
                                        "type": "Expression"
                                    }
                                }
                            }
                        ]
                    }
                }
            ],
            "variables": {
                "folder_name": {
                    "type": "String"
                }
            },
            "annotations": []
        }
    }
    

    Result:

    enter image description here