I am working on an ETL pipeline using docker airflow. I want to trigger my pipeline whenever any new file is uploaded to S3 bucket. Is there any S3sensor in airflow that checks any new file in bucket? The S3sensor should ignore the existing files in location and should only trigger when new file is added to S3.
You have several options to achieve this goal:
to_process
, the next time it will compares between the files list and the files in the state store to know it there are new files or not, then your dag process the records in the state store which have a state != done
, and when it finish the processing, it updates the state to done
. You can add other metadata like created_at
, processed_at
, and other states like error
to reprocess them in the next run or send an alert to your team.