I have tried viewing similar answers on stackoverflow to this problem, however my case is slightly different.
I am executing backfill jobs via Airflow CLI, and the backfilled dag runs get stuck in a running state, with the first task in the dag in a queued (grey) state.
The scheduler doesn't seem to ever kick off the first task.
I do not have depends_on_past=True
set as dag_defaults
dag_defaults = {
"start_date": datetime.today() - timedelta(days=2),
"on_failure_callback": on_failure_callback,
"provide_context": True
}
I am forced to Run every task manually. :( Rather than just letting the scheduler take its course and run them automatically.
Note: I am executing the backfill cli commands via Airflow worker pods on a K8S cluster.
Has anyone else faced a similar issue using the backfill cli commands?
UPDATE: I realised my backfill runs fall outside the total dag interval. I.e before the dag
start_date
causing a blocking schedule dependancy.
While you can still create the run, it will not run automatically, but you can manually run each task.
As a workaround would need to change the start_date
to be before or on my oldest backfill date.
Would be nice if there was a way to override the backfill cmd or provide a --force option that could mock the start_date in for that specific dag_run, rather than being bound to the total interval.
UPDATE: I realised my backfill runs fall outside the total dag interval. I.e before the dag start_date causing a blocking schedule dependancy.
While you can still create the run, it will not run automatically, but you can manually run each task.
As a workaround would need to change the start_date
to be before or on my oldest backfill date.
Would be nice if there was a way to override the backfill cmd or provide a --force option that could mock the start_date in for that specific dag_run, rather than being bound to the total interval.