To give bit of context :
The S3 bucket folder will recived csv files everyday e.g. TableA_20230802 which is the table name with data suffix YYYYMMDD. The next day the folder will contain updated file TableA_20230803. ...and so on...
I have manged to copy over files from S3 to specific container in Azure Blob Storage ....I want to take to next step where I would like to move the latest files that was dropped in s3 on a daily basis. I am trying to get the last modified data or the last written date using the get metadata in using ADF acitivty but stugging to get this info...
i'm also aware we have filter by the last modifed data but this didint worked for me..
is there a efficent way of doing it ....
GetMetadata --> Loop throguh each child item ---> loop again to get the last modified date for each item ...this didnt worked...
I want to know whats the best way to inceremntally load the file from aws3 to blob storage...
As you mentioned that every new file of yours will be in TableA_YYYYMMDD
format, you can use wildcard path in the copy activity source to get the latest files.
Here, I have taken my source as ADLS gen2, but the process is same.
These are my source csv files:(Ignore my last modified date here, I have created yesterdays file for sample demo)
First, store the current date in YYYYMMDD
format in a string variable.
@utcNow('yyyyMMdd')
Now, use this in giving wildcard file path like below.
*@{variables('mydate')}*
Here, my files end with .csv
and thats why I have given *
at the end of the expression. So, you can keep it if there is any file extension for your files.
Use the Recursively
option, if there is any requirement of copying files recursively from sub folders.
In the sink file, give your target folder path.
Execute the file and yours latest files will be copied like mine.