Search code examples
azureazure-storageazure-data-factoryazure-synapse

Azure Synapse Analytics - deleting pipeline Folder


I am new to Synapse and I have to make a pipeline that will delete files from folders in a hierarchy like the attached image. expecting hierarchy. The red half circles mark the files I would like to delete files for example older than 2 months.

As for now I have made a pipline for a single folder and using the for each loop I can get to the files and delete the corresponding one. And it works, since I have about 60-70 folders and even more files I wanted to go a level higher up and make a pipeline for each folder to execute. And with this is a problem. When i use GetMetadata Activity for top folder, and use for each loop to take name folders then i can not acess files in folder just only folder. Could you help me someone how to slove this?

deleting pipline for single folder using for each loop


Solution

  • We can achieve this using nested for each activities with the help of execute pipeline activity. As mentioned, Get metadata with wildcards returns all files without folders and Delete activity is unable to recognize wildcard folder paths(Folder/*).

    • I have created a similar folder structure for demo. In my pipeline, I have first created an array parameter req_files (sample1.csv and sample2.csv) with names of files required.

    enter image description here

    Note: If you want to dynamically do this, you can use append variable to build required file names (file09/22 and file08/22).

    • I used one get metadata to get folder names (which are inside root folder). I am iterating through the output of get metadata in my for each activity (items value is @activity('root folder contents').output.childItems).
    • Inside my for each, I used another get metadata activity to loop through each of the sub folders (to get file contents).
    • Now I have the folder name and list of files inside it. I am going to use execute pipeline to implement nested for each. Create 3 parameters in a new pipeline called delete_pipeline (where I perform delete) as current_folder, folder_files and files_needed.
    • Pass the following dynamic content for each of them from parent pipeline.
    current_folder: @item().name
    folder_files: @activity('sub folder contents').output.childItems
    files_needed: @pipeline().parameters.req_files
    

    enter image description here

    • Now in delete_pipeline, I have a for each loop to loop through the list of files we are passing (items value is @pipeline().parameters.folder_files).
    • Inside this for each, I am using an If condition activity. This is because I want to delete files which are not in my req_files parameter (array from parent pipeline which we passed to files_needed parameter in delete_pipeline). The condition for if condition activity will be as following:
    @contains(pipeline().parameters.files_needed,item().name)
    

    enter image description here

    • We need to delete the file only when it is not present in req_files (files_needed). So, when the condition is false, we perform delete.

    • I have created 2 parameters file_namepath_of_file_to_delete and file_name_to_delete in the dataset I am using for delete activity with following dynamic content.

    file_namepath_of_file_to_delete: Folder/@{pipeline().parameters.current_folder}
    file_name_to_delete: @item().name
    

    enter image description here

    When I run the pipeline, it keeps the required files and deletes the rest. The following are output images for reference.