Search code examples
azurezipazure-data-factoryazure-blob-storageazure-storage

ADF CopyActivity unzippes archive recursively, but it is not needed


I have two binary datasets in adf, one is using .zip compression, second - without compression.

So I use these two datasets as a source and sink for CopyData activity to unzip files on blob storage, and it works perfectly fine when I have just a zip file with some text files inside.

But unexpected behaviour occurs when I need to unzip file that has another zip files inside. As a expected result, I want to see folder named like main archive and few zip archives inside it.

main_archive.zip/
|- nested1.zip
|- nested2.zip

But instead of zip archives, I see folders named like nested zip archives, and unzipped files inside of it.

main_archive.zip/
|- nested1.zip/
   |- file1.txt
|- nested2.zip/
   |- file2.txt

I am unsure why I am experiencing such type of situation, while others are asking questions "how to unzip nested archives at once" and getting response - "adf doesn't support nested unzipping with one operation".

I need these nested archives to be compressed. Any thoughts?


Solution

  • I have tried your scenario and I got same results.

    enter image description here

    It is unzipping every inner zip file recursively. I have tried the same scenario in the Synapse integrated pipelines and it's the same result in that case as well.

    Earlier, it is used to unzip only the given zip file. But currently, it's unsure that whether this behavior is a new feature or a bug. I have raised a request on Github which you can follow.

    As your root zip file containing zip files only on a sub-folder layer, you can try below workaround in this case. This approach creates the required zip files from the unzipped folders and deletes those folders.

    After copy activity, create a Get meta data activity with a Binary dataset and set ChildItems field. The Binary dataset path should be your target unzipped folder which is zipsoutout/mainzip.zip in my case and don't give any compression type.

    enter image description here

    This will give all the folder names and file names as a list. Filter out the unzipped folder names from this list using filter activity. The folder names which contain .zip at the end are the unzipped folders.

    Give the below expressions as items and condition for the Filter activity.

    Items : @activity('Get Metadata1').output.childItems
    
    condition : @endswith(item().name, '.zip')
    

    enter image description here

    Now, give this Filter activity output array @activity('Filter1').output.value to a For-each activity expression.

    Inside For-Each, take copy activity to zip the folders. Give the same dataset which was used in Get meta data activity earlier to the copy activity source with below configurations.

    @concat('mainzip.zip/',item().name)
    

    enter image description here

    Create a new Binary dataset with same folder path but for the file path, create a dataset parameter and use that in the file name. Give the required Compression type as well.

    enter image description here

    Give this dataset as copy activity sink and use the @item().name for the dataset parameter in the copy activity.

    enter image description here

    This copy activity will create the required zip file. Now, to delete the existing unzipped folders, use a Delete activity. This requires another Binary dataset.

    Create a dataset parameter and use that in the folder name of the dataset like below.

    enter image description here

    In the delete activity, use the below expression as the value for the above parameter and follow the below configurations.

    @concat('mainzip.zip/',item().name)
    

    enter image description here

    This will delete all the contents of the unzipped folders. As you are using a Blob storage, the empty folders will be deleted automatically.

    Now, debug the pipeline and it will create the required inner zip files.

    enter image description here