Search code examples
azurecopyazure-data-factory

Copy most recent txt file based on timestamp in filename


I'm completely new to ADF. I'm working with an existing ADF set-up, and need to modify it to fulfill new requirements due to a modified source folder structure.

Previously, the source folder contained txt files of the type TableName.TXT, and the pipeline consisted out of a lookup activity followed by a foreach activity that copies all files returned by the lookup.

The current issue is that the source folder has been changed and now there can be multiple files for the same table, i.e., files of the type 'TableName_YYYYMMDD_HHMMSS.TXT'. I want to modify the pipeline such that only the most recent one is being copied (using the YYYYMMDD_HHMMSS timestamp in the filename). I've been playing around with a Get Metadata activity, which successfully returns the different filenames of the TableName of interest, but I'm struggling with the next step. How can I select the most recent one out of the output?

"childItems": [
    {
        "name": "Table1_20250115_115024.TXT",
        "type": "File"
    },
    {
        "name": "Table1_20250116_085717.TXT",
        "type": "File"
    }
]

Note that I'm already in a ForEach, and from what I understood a nested foreach is not permitted.

I'm thinking about solutions like...

  • order Get Metadata output alphabetically (which would automatically select the most recent timestamp given the structure) and then select the first/last one as input for the copy activity?
  • use a 'set variable' activity to determine the max of the substring? then what?

Any guidance would be greatly appreciated!


Solution

  • You can try your first approach inside the loop. As you already getting the file names list with the same table name, you will get the file names list in the alphabetical order. As the date format also in YYYYMMDD_HHMMSS, it will give the latest date in the last item of the array.

    "childItems": [
            {
                "name": "Table1_20250115_115024.txt",
                "type": "File"
            },
            {
                "name": "Table1_20250116_085717.txt",
                "type": "File"
            },
            {
                "name": "Table1_20250216_085717.txt",
                "type": "File"
            }
        ]
    

    After getting this array, use the below expression to get the last item (latest file name) of the array. Here, for sample I have given the Get meta data activity output, in your case, it should be the above array that you got inside your loop.

    @last(activity('Get Metadata1').output.childItems).name
    

    enter image description here

    Latest file name:

    enter image description here

    Now, you can use this variable as per your requirement.