Search code examples
kubernetesgoogle-cloud-platformairflowgoogle-cloud-composercelery-task

Airflow Worker - Warm Shutdown


I have a managed airflow cluster using cloud composer-1.17.6-airflow-2.0.2. The cluster is fairly small (4 worker pods, 1 scheduler pod) and has auto-scaling enabled.

However, I am experiencing airflow worker restarts very frequently even though only few jobs are running.

This is the message I saw before a restart happens on the worker pods using kubectl logs --previous

worker: Warm shutdown (MainProcess)

Any idea what could be causing it? I tried setting celery acks_late to True and celery worker_max_tasks_per_child to 500, however, the issue still persist.

thank you in advance.


Solution

  • To anyone encountering this issue, I've resolved this a couple of months ago by basically refactoring my dynamic DAG. This happens during the parsing of the DAG, which happens on cyclic, I had a couple of logic needed to construct the dag that basically performs backend calls (calling to BigQuery API, calling Xcom backend etc).

    As a practice, heavy operations (like external calls) should be avoided when constructing the DAG logic.

    I refactored the logic and removed those and the dag parsing improved exponentially from parsing 150 DAGs in 150 seconds to parsing 150 DAGs in 3 seconds. The worker restart never occurred again since.