I am trying to delete folders that are older than 7 days from ADLS. The folders I want to delete have date as name and are under subfolders of the main folder. Please see below example for folder structure (raw is my container)
/raw/MainFolder/SubfolderA/20230430/File.csv
/raw/MainFolder/SubfolderA/20230415/File.csv
/raw/MainFolder/SubfolderA/20230410/File.csv
/raw/MainFolder/SubfolderB/20230430/File.csv
/raw/MainFolder/SubfolderB/20230420/File.csv
/raw/MainFolder/SubfolderB/20230405/File.csv
I need to delete all the folders that have the date as name and is more than 7 days old. In this example the pipeline should delete 2 folders from SubFolderA (20230415, 20230410) and 2 folders from SubFolderB (20230420, 20230405).
How do I create a pipeline in Azure Data Factory that dynamically deletes these old folders using a GetMetadata and Delete Activity?
I created a pipeline with GetMetadata activity and am able to see the folders under MainFolder in debug output. But I need help with making it dynamic so that it will fetch all the folders under the subfolders and creating the delete activity.
I am able to achieve your requirement like below.
This is my folder structure:
raw
MainFolder
SubfolderA
20230425
//files
20230427
//files
20230429
//files
20230523
//files
SubfolderB
20230425
//files
20230427
//files
20230429
//files
20230523
//files
As you want to delete the folders which are more than 7 days old, first I have created a dates array using a ForEach with @range(0,7)
. This expression gives the array[0,1,2,3,4,5,6]
.
Inside ForEach I have used append variable activity to an array to append the date in yyyyMMdd
format with the below expression.
@formatDateTime(subtractFromTime(utcNow(),item(),'Day'),'yyyyMMdd')
This gives the dates array for the last 7 days list as below.
This is my pipeline flow:
Use a Get Meta data activity first to get the subfolders list(SubfolderA,SubfolderA
) and pass this child items array to ForEach.
@item().name
) to get the date folders list.Use the dataset with a parameter like below:
My Parent pipeline JSON:
{
"name": "parent",
"properties": {
"activities": [
{
"name": "get subfolders",
"type": "GetMetadata",
"dependsOn": [
{
"activity": "ForEach1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "sourcecsv",
"type": "DatasetReference",
"parameters": {
"folderpath": "MainFolder"
}
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "iterate subfolders",
"type": "ForEach",
"dependsOn": [
{
"activity": "get subfolders",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('get subfolders').output.childItems",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "get date folders",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "sourcecsv",
"type": "DatasetReference",
"parameters": {
"folderpath": {
"value": "@concat('MainFolder/',item().name)",
"type": "Expression"
}
}
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Execute Pipeline1",
"type": "ExecutePipeline",
"dependsOn": [
{
"activity": "Filter1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"pipeline": {
"referenceName": "child",
"type": "PipelineReference"
},
"waitOnCompletion": true,
"parameters": {
"date_folder": {
"value": "@activity('Filter1').output.value",
"type": "Expression"
},
"path": {
"value": "@concat('MainFolder/',item().name)",
"type": "Expression"
}
}
}
},
{
"name": "Filter1",
"type": "Filter",
"dependsOn": [
{
"activity": "get date folders",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('get date folders').output.childItems",
"type": "Expression"
},
"condition": {
"value": "@not(contains(variables('daysarr'),item().name))",
"type": "Expression"
}
}
}
]
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@range(0,7)",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Append variable1",
"type": "AppendVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "daysarr",
"value": {
"value": "@formatDateTime(subtractFromTime(utcNow(),item(),'Day'),'yyyyMMdd')",
"type": "Expression"
}
}
}
]
}
}
],
"variables": {
"counter": {
"type": "String"
},
"daysarr": {
"type": "Array"
},
"temp": {
"type": "String"
},
"new": {
"type": "Array"
}
},
"annotations": [],
"lastPublishTime": "2023-05-02T07:27:09Z"
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
Child Pipeline JSON:
{
"name": "child",
"properties": {
"activities": [
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@pipeline().parameters.date_folder",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Delete1",
"type": "Delete",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "sourcecsv",
"type": "DatasetReference",
"parameters": {
"folderpath": {
"value": "@concat(pipeline().parameters.path,'/',item().name)",
"type": "Expression"
}
}
},
"enableLogging": false,
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
}
}
}
]
}
}
],
"parameters": {
"date_folder": {
"type": "array"
},
"path": {
"type": "string"
}
},
"annotations": []
}
}
Folders before pipeline execution:
You can see the folders which are more than 7 days folders were deleted after pipeline execution.