We have Airflow installed using GCP composer and all of us sudden Webserver / scheduler went down.
We just tried restarting by updating some dummy variables or worker nodes and always we are getting below error
Error: UPDATE operation on this environment failed 1 hour ago with the following error message:
Operation failed. Couldn't start composer-agent, a GKE job that updates kubernetes resources. Please check if your GKE cluster exists, is healthy and contains non-empty 'default-pool' node pool.
Any suggestion as our environment was complete stuck
Based on my understanding, it appears that the composer agent pod is unable to pull the container images because of incorrect DNS records for private Google access to *.pkg.dev. I believe you would only have records for *.gcr.io, as this is where the images were previously hosted more info here