My question is quite complicated and I think I don't have a good approach but you will tell me. I'm sure. ;) I'm in Azure Data Factory.
I have got a Dataflow that generates several files. I did partition with "Name file as column data". These files are csv and are stored in Azure Storage.
After that, in the pipeline I will get all files in the Dataset by using GetMetadata with argument "Child Items". "Resource" is the dataset that I used in Data flow too.
After that in the ForEach, I get all files in the Dataset that mean all files in specific folder in Azure Storage and I send to an API. This API processes the file by passing fileName do things and move the files in Archive folder.
The problem is there is a possibility that API don't move file to the Archive folder because of there is a error in the process. If it happens next time the pipeline will call again the file because it's always in the Dataset folder.
I would like to be more secure. I would like the ForEach to only process the files just created by the Dataflow.
If you got idea to do that I will get it. :)
To be honest, I am not sure it's good approach like it's done.
I tried to pass filename to GetMetadata and I wanted to iterate on the list of filenames with that. But I don't think it's possible to send a list.
I am able to achieve your requirement by using Filter by Last modified
in Get meta data activity approach and credit to @Joel Cochran for the suggestion.
These are files before the pipeline run.
Create a string variable with @utcNow()
before the Dataflow activity. After the dataflow activity, Use the variable as the start date of the Filter by Last modified
of the Get meta data activity and at the end date give @utcNow()
like below.
This will filter the files which are created by the dataflow.
Files in my target location after pipeline run.
Child Items array of Get meta data activity.
You can pass this list to your ForEach activity.