Search code examples
pythongoogle-app-enginegoogle-cloud-platformgcloudgoogle-cloud-build

Google Cloud Build via cloudbuild.yaml times out randomly


I have two Google App Engine instances, as well as the queue.yaml and cron.yaml files deployed automatically on push to master branch via Google Cloud Build triggers. The trigger runs the following cloudbuild.yaml file:

steps:
- name: gcr.io/cloud-builders/gcloud
  dir: website
  args: ['app', 'deploy', 'app.yaml']
  waitFor: ["-"]
- name: gcr.io/cloud-builders/gcloud
  dir: support_backend
  args: ['app', 'deploy', 'support_backend.yaml']
  waitFor: ["-"]
- name: gcr.io/cloud-builders/gcloud
  dir: website
  args: ['app', 'deploy', queue.yaml]
  waitFor: ["-"]
- name: gcr.io/cloud-builders/gcloud
  dir: website
  args: ['app', 'deploy', 'cron.yaml']
  waitFor: ["-"]
timeout: 900s

The app.yaml configures a Python 3.7 Standard Environment running Django, and the support_backend.yaml configures the same environment running Flask.

I've not had any problems with the deployment until yesterday, when the build started randomly timing out. Where before the whole process would take around five minutes, now the build times out (after 10 minutes).

I've not made any big code changes (literally just changed three lines), I'm not using any new libraries. I've tried to use the waitFor arguments so that none of the steps depend on each other. Sometimes the app.yaml times out (and sometimes it takes 2 minutes), sometimes the support_backend.yaml times out (and sometimes it takes 2 minutes). Which one of the instances fails is seemingly random. Furthermore, I've successfully deployed both apps independently via the gcloud console (using gcloud app deploy app.yaml), and this is working fine every time, takes about 3 minutes each. Edit: I've now had a deploy via gcloud console time out on me as well.

I've tried setting timeout: 900s - this doesn't seem to have any effect, the build still times out after 10 minutes. The Google Cloud Status dashboard mentions an outage in build services yesterday, but only for Asia servers, and my apps run on Europe. Also, the issue is marked as resolved since then and the problem still persists for me.

This is the end of a log from the failing build

...
File upload done.
Updating service [staging]...failed.

ERROR: (gcloud.app.deploy) Error Response: [4] Cloud build did not succeed within 10m.
Build error details: Build error details not available..
Check the build log for errors: https://console.cloud.google.com/gcr/builds/XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXX?project=XXXXXXXX

and the end of the build log it mentions

...
Step #1 - "builder": INFO     gzip_tar_runtime_package took 0 seconds
Step #1 - "builder": INFO     Finished gzipping tarfile.
Step #1 - "builder": INFO     Building app layer took 0 seconds
Step #1 - "builder": INFO     starting: Stitching layers into final image
Finished Step #1 - "builder"
TIMEOUT
ERROR: context deadline exceeded

Is there anything else I can try to identify why exactly is my build timing out?


Solution

  • The timeout in the cloudbuild.yaml file is about YOUR cloud build submission. However, when the command gcloud app deploy is run, a new Cloud Build is invoked, and it's this one which failed

    We also have experienced variable build duration since the last week. After reaching the Google Cloud support, they told us there is an issue in progress on Cloud Build. No resolution date provided.

    Only workaround for now: split your Cloud Build file to perform only small steps, try them several times, it will pass if you are lucky!