Search code examples
airflowairflow-scheduler

Airflow scheduler stuck


I'm testing the use of Airflow, and after triggering a (seemingly) large number of DAGs at the same time, it seems to just fail to schedule anything and starts killing processes. These are the logs the scheduler prints:

[2019-08-29 11:17:13,542] {scheduler_job.py:214} WARNING - Killing PID 199809
[2019-08-29 11:17:13,544] {scheduler_job.py:214} WARNING - Killing PID 199809
[2019-08-29 11:17:44,614] {scheduler_job.py:214} WARNING - Killing PID 2992
[2019-08-29 11:17:44,614] {scheduler_job.py:214} WARNING - Killing PID 2992
[2019-08-29 11:18:15,692] {scheduler_job.py:214} WARNING - Killing PID 5174
[2019-08-29 11:18:15,693] {scheduler_job.py:214} WARNING - Killing PID 5174
[2019-08-29 11:18:46,765] {scheduler_job.py:214} WARNING - Killing PID 22410
[2019-08-29 11:18:46,766] {scheduler_job.py:214} WARNING - Killing PID 22410
[2019-08-29 11:19:17,845] {scheduler_job.py:214} WARNING - Killing PID 42177
[2019-08-29 11:19:17,846] {scheduler_job.py:214} WARNING - Killing PID 42177
...

I'm using a LocalExecutor with a PostgreSQL backend DB. It seems to be happening only after I'm triggering a large number (>100) of DAGs at about the same time using external triggering. As in:

airflow trigger_dag DAG_NAME

After waiting for it to finish killing whatever processes he is killing, he starts executing all of the tasks properly. I don't even know what these processes were, as I can't really see them after they are killed...

Did anyone encounter this kind of behavior? Any idea why would that happen?


Solution

  • The reason for the above in my case was that I had a DAG file creating a very large number of DAGs dynamically.

    The "dagbag_import_timeout" config variable which controls "How long before timing out a python file import while filling the DagBag" was set to the default value of 30. Thus the process filling the DagBag kept timing out.