Search code examples
azureazure-data-factoryazure-data-lakelast-modified

get the latest added file in a folder [Azure Data Factory]


Inside the Data Lake, We have a folder that basically contains the files pushed by external source every day. However, we wanted to only process the latest added file in that folder. Is there any way to achieve that with Azure Data Factory?


Solution

  • You could set modifiedDatetimeStart and modifiedDatetimeEnd to filter the files in the folder when you use ADLS connector in copy activity.

    Maybe it has two situations:

    1.The data was pushed by external source in the schedule,you are suppose to know the schedule time to configure.

    2.The frequency is random,then maybe you have to log the pushing data time in another residence,then pass the time as parameter into copy activity pipeline before you execute it.


    I try to provide a flow for you in ADF pipelines as below:

    My sample files in same folder:

    enter image description here

    Step1,create two variables, maxtime and filename:

    maxtime is the critical datetime of specific date, filename is empty string.

    enter image description here

    Step2, use GetMetadata Activity and ForEach Activity to get the files under folder.

    enter image description here

    GetMetadata 1 configuration:

    enter image description here

    ForEach Activity configuration:

    enter image description here

    Step3: Inside ForEach Activity,use GetMetadata and If-Condition, the structure as below:

    enter image description here

    GetMetadata 2 configuration:

    enter image description here

    If-Condition Activity configuration:

    enter image description here

    Step4: Inside If-Condition True branch,use Set Variable Activity:

    enter image description here

    Set variable1 configuration:

    enter image description here

    Set variable2 configuration:

    enter image description here

    All of above steps aim to finding the latest fileName, the variable fileName is exactly target.


    Addition for another new dataset in GetMetadata 2

    enter image description here