Search code examples
etlazure-data-factory

Pipeline filter modifies date every time I run the pipeline which prevent me from pulling only the last modified date to SQL


Once a week, a file is generated to an onprem folder. My pipeline pulls from that onprem file to blob storage, then from blob to blob, during this part the pipeline filters my data goes to sql. Problem is, when it gets filtered the modified date changes and all the files in the blob storage are pulled rather than the one that got originally pulled for that week. I have attached images of my pipeline and the onprem files and what I filter for. OnPrem Files Pipeline Pipeline If Condition Storage sql pipeline


Solution

    • Instead of trying to proceed with last modified date of the file, you can proceed using file name instead.

    • Since you have date (yyyyddMM format) in the filename itself, you can dynamically create the filename and check if this file is present in the filtered files list or not.

    • Look at the following demonstration. Let's say I have the following 2 files as my filtered files. I used Get metadata activity (child items) on the blob storage.

    enter image description here

    • Since we know the format of how the filename would be (SalesWeekly_yyyyddMM.csv), create the present filename value dynamically using the following dynamic content in set variable activity (variable name is file_name_required).
    @concat('SalesWeekly_',formatDateTime(utcnow(),'yyyyddMM'),'.csv')
    

    enter image description here

    • Now, create an array containing all the filenames returned by our get metadata activity. The for each activity items value is given as @activity('Get Metadata1').output.childItems.

    • Inside this, use an append variable activity with value as @item().name.

    enter image description here

    • Now, you have file name you actually need (dynamically build) and the filtered file names array. You can check if the filename is present in the array of filtered file names or not and take necessary actions. I used if condition activtiy with the following dynamic content.
    @contains(variables('files_names'),variables('file_name_required'))
    

    enter image description here

    The following are reference images of the output flow.

    • When current week file is not present in the filtered files.

    enter image description here

    • When current week file is present in the filtered files.

    enter image description here

    I have used wait for demo here. You can replace it with copy activity (from blob to SQL) in True case. If you don't want to insert when current week file is missing, then leave the false case empty.