Search code examples
airflowairflow-scheduler

Understanding Airflow's execution_date and schedule


New to airflow coming from cron, trying to understand how the execution_date macro gets applied to the scheduling system and when manually triggered. I've read the faq, and setup a schedule to what I expected would execute with the correct execution_date macro filled in.

I would like to run my dag weekly, on Thursday at 10am UTC. Occasionally I would run it manually. My understanding was the the dag's start date should be one period behind the actual date I want the dag to start. So, in order to execute the dag today, on 4/9/2020, with a 4/9/20020 execution_date I setup the following defaults:

default_args = {
    'owner': 'airflow',
    'start_date': dt.datetime(2020, 4, 2),
    'concurrency': 4,
    'retries': 0
}

And the dag is defined as:

with DAG('my_dag',
        catchup=False,
        default_args=default_args,
        schedule_interval='0 10 * * 4',
        max_active_runs=1,
        concurrency=4,
         ) as dag:

opr_exc = BashOperator(task_id='execute_dag',bash_command='/path/to/script.sh --dt {{ ds_nodash }}')

While the dag executed on time today 4/9, it executed with the ds_nodash of 20200402 instead of 20200409. I guess I'm still confused since catchup was turned off, start date was one week prior thus I was expecting 20200409.

Now, I found another answer here, that basically explains that execution_date is at the start of the period, and always one period behind. So going forward should I be using next_ds_nodash? Wouldn't this create a problem for manually triggered dags, since execution_date works as expected when run on-demand. Or does next_ds_nodash translate to ds_nodash when manually triggered?

Question: Is there a happy medium that allows me to correctly get the execution_date macro passed over to my weekly run dag when running scheduled AND when manually triggered? What's best practice here?


Solution

  • After a bit more research and testing, it does indeed appear that next_ds_nodash becomes equivalent to ds_nodash when manually triggering the dag.

    Thus if you are in a similar situation, do the following to correctly schedule your weekly run job (with optional manually triggers)

    1. Set the start_date one week prior to the date you actually want to start
    2. Configure the schedule_interval accordingly for when you want to run the job
    3. Use the next execution date macros for wherever you expect to get the expected current execution date for when the job runs.

    This works for me, but I don't have to deal with any catchup/backfill options, so YMMV.