Search code examples
azureazure-data-lakeu-sqlazure-data-factory

Delete temporary files from Azure Datalake Storage in a Azure DataFactory Pipeline (USQL preferred)


We are using AdLS (Azure data lake storage)as a temporary storage in our ADF (Azure data factory - V2) pipeline. What is the best way to delete the data that is stored temporarily in ADLS?

U-SQL only supports DDL and not DML, so can’t delete the temporary data(files) stored in ADLS using ADLA(Azure data lake analytics)

I plan on using ADF's "Web Activity" DELETE method, but that uses tokens that expire and I have to keep updating them.

Can any one please let me know what other options do we have?


Solution

  • the best way is to use new Delete Activity in ADF. In the right top corner of your ADF UI, you can find code section, click here and write JSON syntax for delete activity (i didn't find delete activity widget/icon so I needed to write directly JSON code)

    You can check syntax here

    Example pipeline with only delete activity

    {
    "name": "DeleteFilePipeline",
    "properties": {
        "activities": [
            {
                "name": "DeleteActivity",
                "type": "Delete",
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "typeProperties": {
                    "dataset": {
                        "referenceName": "deleteTest",
                        "type": "DatasetReference"
                    },
                    "enableLogging": false,
                    "maxConcurrentConnections": 1
                }
            }
        ]
    }
    

    }

    Dataset deleteTest is Azure Data Lake Gen1 dataset.