Search code examples
azureazure-data-factoryazure-data-lakeazure-data-lake-gen2

Copy data every 1 minute from DataLake by DataFactory


I have a Data Lake storage with the following folder structure:

{YEAR}
 - {MONTH}
  - {DAY}
   - {HOUR}
     - {sometext}_{YEAR}_{MONTH}_{DAY}_{HOUR}_{Minute}_{someuuid}.json

example

enter image description here

Could you please help me to configure Data Factory Copy data action? I need to run Trigger every 1 minute - to copy data from Data Lake by previous minute to Cosmos DB I've tried this enter image description here where the first expresion is

@formatDateTime(utcnow(),'yyyy/MM/dd/HH')

and the second one

@{formatDateTime(utcnow(),'yyyy')}_@{formatDateTime(utcnow(),'MM')}_@{formatDateTime(utcnow(),'dd')}_@{formatDateTime(utcnow(),'HH')}_@{formatDateTime(addMinutes(utcnow(), -1),'mm')}*.json

But it can skip some data, especially when Hour changes. I'm a new in Data Factory and don't know what is the more efficient way how to do that. Please help


Solution

  • The Pipeline Expression Language has a number of Date functions built in. You can use the addMinutes function to add 1 minute.

    To avoid clock skew, I would capture the utcnow() value and store it without any formatting:

    enter image description here

    In another variable, add a minute to the captured value rather than executing utcnow() again:

    enter image description here

    Once you have those variables, just use them to format the date string(s). enter image description here

    Result:

    enter image description here

    NOTE: use concat with the formatDateString to get the wildcard value you want: enter image description here

    Result:

    enter image description here