Search code examples
azureazure-data-factoryazure-blob-storagedata-ingestion

How to copy multiple folders along with files from one blob container to another blob within multiple sub-folders using ADF?


I am using ADF pipeline to copy multiple datasets parquet files present in multiple folders from one source blob container (Client container) to our blob container.

However, I don't want to copy it directly as it is. Instead, I want it like below,

enter image description here

As shown above I want following folders along with the files present in them to be copied to the shown destination folders. I tried with regular copy activity but it doesn't allow me to add multiple wildcards and also doesn't allow me to specify multiple destination in Sink section.

Could you please let me know how can I achieve the above using the ADF pipeline?

Lookup output array for reference.

enter image description here


Solution

  • As you have source and destination folder paths in a file, you can use a copy activity with dataset parameters in a For-Each activity.

    First store the above file in a temporary location.

    I took the folder paths same as yours.

    SourceBlob_Container,DestinationBlob_Container
    blobsource/demo/raad/indicated/inbound/IMSONE_CHANEL_M_AB_202402_20240415,blobdestination/Demo_Tables/Load_04_15_2024
    blobsource/demo/raad/indicated/inbound/IMSONE_CONTROL_M_AB_202402_20240415,blobdestination/Demo_Tables/Load_04_15_2025
    blobsource/demo/raad/indicated/inbound/IMSONE_COMPATIENT_M_665_202402_20240415,blobdestination/M_665/ Load_04_15_2024
    blobsource/demo/raad/indicated/inbound/IMSONE_DIAG_M_665_202402_20240415,blobdestination/M_665/Load_04_15_2025
    blobsource/demo/raad/indicated/inbound/IMSONE_COMPATIENT_M_667_202402_20240415,blobdestination/M_667/Load_04_15_2024
    blobsource/demo/raad/indicated/inbound/IMSONE_DIAG_M_667_202402_20240415,blobdestination/M_667/Load_04_15_2025
    

    Create a delimited text dataset to this file and give that to a lookup activity with below configurations.

    enter image description here

    Lookup activity will give the Source and sink folder paths as a JSON array. Take a For-Each activity and give this array @activity('Lookup1').output.value to the For-Each expression.

    enter image description here

    Inside For-Each, take the copy activity. For the source of the copy activity, take a parquet dataset and create a dataset parameter container_name.

    enter image description here

    Use this as @dataset().container_name in the dataset container name and leave the remaining path as empty.

    enter image description here

    For the sink dataset of the copy activity, create another parquet dataset. Here, create two dataset parameters container_name, folder_path.

    enter image description here

    Use those in the dataset as same as source dataset and leave the file name as empty.

    enter image description here

    Give these two datasets as source and sink of the copy activity. Now, in the source, select Wild card file path and give the below expressions.

    container_name : @first(split(item().SourceBlob_Container,'/'))
    wild card folder name : @join(skip(split(item().SourceBlob_Container,'/'),1),'/')
    wild card path : *.parquet
    

    enter image description here

    Similarly, give the below expressions for the sink dataset parameters.

    container_name : @first(split(item().DestinationBlob_Container,'/'))
    folder_path : @join(skip(split(item().DestinationBlob_Container,'/'),1),'/')
    

    enter image description here

    Now, debug the pipeline and all the parquet files in each source folder path will be copied to its respective target folder in each iteration.

    enter image description here

    UPDATE:

    To copy the source folder of the files to the target location, give the below expressions in the copy activity sink.

    container_name : @first(split(item().DestinationBlob_Container,'/'))
    folder_path : @concat(join(skip(split(item().DestinationBlob_Container,'/'),1),'/'),'/',last(split(item().SourceBlob_Container,'/')))
    

    enter image description here

    Keep the copy activity source as same as above.

    It will copy all file in the folder along with folder to the target location.

    enter image description here