I have a managed airflow cluster using cloud composer-1.17.6-airflow-2.0.2. The cluster is fairly small (4 worker pods, 1 scheduler pod) and has auto-scaling enabled.
However, I am experiencing airflow worker restarts very frequently even though only few jobs are running.
This is the message I saw before a restart happens on the worker pods using kubectl logs --previous
worker: Warm shutdown (MainProcess)
Any idea what could be causing it? I tried setting celery acks_late
to True and celery worker_max_tasks_per_child
to 500, however, the issue still persist.
thank you in advance.
To anyone encountering this issue, I've resolved this a couple of months ago by basically refactoring my dynamic DAG. This happens during the parsing of the DAG, which happens on cyclic, I had a couple of logic needed to construct the dag that basically performs backend calls (calling to BigQuery API, calling Xcom backend etc).
As a practice, heavy operations (like external calls) should be avoided when constructing the DAG logic.
I refactored the logic and removed those and the dag parsing improved exponentially from parsing 150 DAGs in 150 seconds to parsing 150 DAGs in 3 seconds. The worker restart never occurred again since.