Search code examples
azure-data-factory

How do i fetch the list of date folders in ADLS and pass them to delete activity?


I have created a ADF pipeline with only one activity and that is delete activity

Below is my ADLS folders , here raw is container name

 /raw/2022-05-28/omega/omega3.txt
 /raw/2022-05-28/demo/demo3.txt

/raw/2022-05-29/omega/omega2.txt
/raw/2022-05-29/demo/demo2.txt

/raw/2022-05-30/omega/omega1.txt
/raw/2022-05-30/demo/demo1.txt

My intention is to delete all the folders inside the raw container expect the current date folder

The folders to be deleted are below .

2022-05-28
2022-05-29

So basically once the pipeline get completed only the below folders and files needs to available because they belong to current date

/raw/2022-05-30/omega/omega1.txt
/raw/2022-05-30/demo/demo1.txt

Right now This is what doing

  1. Created a dataset for ADLS and gave the container name and 2022-05-28 in folder
  2. Created a pipeline with delete activity and with #1 dataset
  3. Running the pipeline two times manually by altering the dataset folder for 2022-05-28 and 2022-05-29

I dont want to have manual intervention like this , I want to pass array of folder dates to passed automatically based on number of old folders in ADLS , so How do i fetch the list of folders in ADLS and how to extract the date folder of that list and pass that list of folder dates as array to my delete pipeline

Can you please help


Solution

  • Since it is not ideal to change the name to delete each folder manually, you can use Dynamic parameters. We can use Get Metadata activity to get the folder names, For Each activity to loop through each folder name, If conditional activity to compare the folder name to current date folder, and finally Delete activity to delete the folders.

    • Create a dataset pointing to your container that holds all these folders (raw). Create a parameter folder_name for this dataset and give its value as @dataset().folder_name

    enter image description here

    • Use Get Metadata activity referring to the dataset just created with field list as child items. Give '/' as the value for parameter folder_name (We do not need dynamic parameter value in this activity).

    enter image description here

    • Create a for each activity. The output of get metadata activity is passed to this for each activity. In For Each -> Settings, give the items field value as @activity(‘get_foldername’).output.childItems where get_foldername is the name of the get metadata activity.

    enter image description here

    • Under For Each -> Activities, create an activity for this activity. Using If conditional activity, under activities tab, build an expression, @not(equals(utcNow('yyyy-MM-dd'), item().Name)) (if current date folder name is not equal to the for each activity folder_name). When this condition is true, we need to perform delete activity (create delete activity for true case).

    enter image description here

    • In Delete activity, use the dataset created initially, and give the value for folder_name as @item().Name (dynamic parameter).

    enter image description here

    Publish and run the pipeline. It will run successfully and delete all the other folders except the folder with current date. This way you can delete the folders from your container which do not belong to the current date.