Search code examples
google-app-enginegoogle-app-engine-pythonapp-engine-flexible

Diagnosing error in deploying GAE flex app


I've been using GAE flex for awhile now, and all of a sudden my deploy process ends on the command line with:

ERROR: (gcloud.app.deploy) Error Response: [4] Flex operation projects/MY-PROJECT/regions/us-central1/operations/xxx error [DEADLINE_EXCEEDED]: An internal error occurred while processing task /appengine-flex-v1/insert_flex_deployment/flex_create_resources>2019-09-04T21:29:03.412Z8424.ow.0: Gave up polling Deployment Manager operation MY-PROJECT/operation-xxx.

My logs don't have any helpful info. These are relevant logs from the deployment:

2019-09-04T14:07:07Z [2019-09-04 14:07:07 +0000] [1] [INFO] Shutting down: Master
2019-09-04T14:07:06Z [2019-09-04 14:07:06 +0000] [16] [INFO] Worker exiting (pid: 16)
2019-09-04T14:07:06Z [2019-09-04 14:07:06 +0000] [14] [INFO] Worker exiting (pid: 14)
2019-09-04T14:07:05Z [2019-09-04 14:07:05 +0000] [13] [INFO] Worker exiting (pid: 13)
2019-09-04T14:07:05Z [2019-09-04 14:07:05 +0000] [11] [INFO] Worker exiting (pid: 11)
2019-09-04T14:07:05Z [2019-09-04 14:07:05 +0000] [10] [INFO] Worker exiting (pid: 10)
2019-09-04T14:07:05Z [2019-09-04 14:07:05 +0000] [9] [INFO] Worker exiting (pid: 9)
2019-09-04T14:07:05Z [2019-09-04 14:07:05 +0000] [8] [INFO] Worker exiting (pid: 8)
2019-09-04T14:07:05Z [2019-09-04 14:07:05 +0000] [1] [INFO] Handling signal: term
2019-09-04T14:03:04Z [2019-09-04 14:03:04 +0000] [16] [INFO] Booting worker with pid: 16
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [14] [INFO] Booting worker with pid: 14
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [13] [INFO] Booting worker with pid: 13
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [11] [INFO] Booting worker with pid: 11
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [10] [INFO] Booting worker with pid: 10
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [9] [INFO] Booting worker with pid: 9
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [8] [INFO] Booting worker with pid: 8
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [1] [INFO] Using worker: sync
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [1] [INFO] Starting gunicorn 19.9.0

The instance exists in the console and appears to be running, but it just returns a 404. The code runs fine locally.

Any ideas for how to diagnose what is going on?

I wonder if Google reduced a default deadline since the current deadline appears to be 4 minutes and my build has always taken longer than 4 minutes.


Solution

  • I figured this out and it is kind of a crazy Google Cloud bug. TL; DR -- Don't use Google Cloud Organization Policy Constraints.

    Here is what happened according to my best understanding:

    • For my Google Cloud project, I picked the us-central region.
    • About 6 months ago I set a Google Cloud policy constraint for my organization so that I would use only US-based resources. This set a policy that allowed US resources that existed at that time.
    • My recent deploys of my flex app were being deployed to the us-central1-f zone. I believe Google picked the zone and I don't have control over that.
    • The us-central1-f was not allowed by my location policy because that zone did not exist at the time I set my location policy.
    • This caused my deploy to crash with the unhelpful error message in my question.

    The way I figured this out was that I deployed Google's hello world flask app, and when deploying that app, I received a more helpful error message that allowed me to understand the problem.