Search code examples
airflowairflow-schedulergoogle-cloud-composer

The airflow scheduler stops working after updating pypi packages on google cloud composer 2.0.1


I am trying to migrate from google cloud composer composer-1.16.4-airflow-1.10.15 to composer-2.0.1-airflow-2.1.4, However we are getting some difficulties with the libraries as each time I upload the libs, the scheduler fails to work.

here is my requirements.txt

flashtext
ftfy
fsspec==2021.11.1
fuzzywuzzy
gcsfs==2021.11.1
gitpython
google-api-core
google-api-python-client
google-cloud
google-cloud-bigquery-storage==1.1.0
google-cloud-storage
grpcio
sklearn
slackclient
tqdm
salesforce-api
pyjwt
google-cloud-secret-manager==1.0.0
pymysql
gspread
fasttext
spacy
click==7.1.2
papermill==2.1.1
tornado>=6.1
jupyter

Here is the code I use to update the libs :

gcloud composer environments update $AIRFLOW_ENV \                   
    --update-pypi-packages-from-file requirements.txt \
    --location $AIRFLOW_LOCATION

It works with success but then the dag tasks are not scheduled anymore and the scheduler heartbeat becomes read.

I have tried to remove all the libs and it is scheduled again some times after. I have tried to only add via the interface simple libraries : pandas or flashtext but right after the update, the schedule becomes red again and the tasks stays unscheduled.

I can't find any error log in the log interface. Would you have an idea on how I could see some logs regarding those errors or if you know why those libs are making my env fail ?

Thanks


Solution

  • We have found out what was happening. The root cause was the performances of the workers. To be properly working, composer expects the scanning of the dags to take less than 15% of the CPU ressources. If it exceeds this limit, it fails to schedule or update the dags. We have just taken bigger workers and it has worked well