Search code examples
pythonairflowscheduling

Why execution date is in the past when running a DAG with Airflow?


i have something i don't understand with the execution date. I have the following dag :

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime

default_args = {
    'owner': 'me',
    'depends_on_past': True,
    'email': '[email protected]',
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 0,
}

dag = DAG(
    'dag_test',
    default_args=default_args,
    description="DAG test",
    schedule_interval='0 15 * * *',
    concurrency=1,
    catchup=False,
    start_date=datetime(2024, 1, 1)
)

task = BashOperator(
    task_id='task',
    bash_command='echo 1',
    dag=dag,
)

When i activate the dag, it is running everyday at 3PM but the execution date is the day before. Example : when the dag is triggered on the 16th february, the execution date is 15th february.

Thanks for your help.

I expect to have the same date between the trigger and the execution date.


Solution

  • You need to have a look at data-interval for DAG runs.

    A DAG run is usually scheduled after its associated data interval has ended, to ensure the run is able to collect all the data within the time period. In other words, a run covering the data period of 2020-01-01 generally does not start to run until 2020-01-01 has ended, i.e. after 2020-01-02 00:00:00.

    A DAG run is executed at the end of the period of time it covers to respect idempotence principles.

    A best practice to design DAGs is to handle data using time-partitioning.