Search code examples
foreachazure-data-factoryazure-blob-storagecopydataset

Copy folders into Azure data storage (azure data factory)


I am trying to copy folders with their files from ftp into an azure data storage, by looping through the folders and for each folder copy the content into a container that has the folder's name. for this, I used a metadata ,for each and copy data component. For now I am able to copy all the folders into the same container , but what I want is to have multiple containers named after the the folders in the output, each one containing files from the ftp.

ps : I am still new to azure data factory

Any advise or help is very welcome :)


Solution

  • You need to add a Get Metadata activity before the for-each. The Get Metadata activity will get the files in the current directory and pass them to the For-Each. You connect it to your Blob storage folder.

    try something like this

    Setup a JSON source:

    enter image description here

    Create a pipeline, use GetMetadata activity to list all the folders in the container/storage. Select fields as childItems

    enter image description here

    Feed the Metadata output (list of container contents) into filter activity and filter only folders.

    enter image description here

    Input the list of folders to a ForEach activity

    enter image description here

    Inside ForEach, set the current item() to a variable, and use it as a parameter for a parameterized source dataset which is a clone of original source !

    enter image description here

    enter image description here

    This would result in listing the files from each folder in your container.

    Feed this to another filter and this time filter on files. Use @equals(item().type,'File')

    enter image description here

    Now create another pipeline where we will have our copy activity running for each file found to be having same name as that of its parent folder.

    Create parameters in the new child pipeline to receive the current Folder and File name in the iteration from Parent Pipeline to evaluate for copy.

    enter image description here

    Inside child pipeline, start with foreach whose input will be the list of filenames inside the folder received into parameter: @pipeline().parameters.filesnamesreceived

    enter image description here

    Use variable to hold the current item and use IfCondition to check if filename and folder names match.

    enter image description here

    Note: Try to evaluate dropping the file extension as per your requirement as metadata would hold the complete file name along with its extension.

    If True - > the names match, copy from source to sink.

    enter image description here

    Here the hierarchy is preserved and you can also use "Prefix" to mention the file path as it copies with preserving hierarchy. It utilizes the service-side filter for Blob storage, which provides better performance than a wildcard filter.

    The sub-path after the last "/" in prefix will be preserved. For example, you have source container/folder/subfolder/file.txt, and configure prefix as folder/sub, then the preserved file path is subfolder/file.txt. Which fits your scenario.

    enter image description here

    This copies files like /source/source/source.json to /sink/source/source.json