Search code examples
dockergoogle-app-enginegoogle-cloud-platformgoogle-compute-enginegoogle-cloud-build

Docker image deployed to Google Compute Engine keeps restarting


I built an image with Google Cloud Build using Docker Compose. In my cloudbuild.yml file I have the following steps:

  1. Build the docker image using docker compose
  2. Tag the built image
  3. Create an instance template
  4. Create instance group

Now here is the problem every time a new instance gets built the created container from the image keeps restarting and never actually boots up. In spite of this I can build the image and start it as a container on the instance independent from the image from cloud build.

I managed to find some clues from the logs:

E1219 19:13:52 7f28dce6d700 api_server.cc:184 Metadata request unsuccessful: Server responded with 'Forbidden' (403): Transport endpoint is not connected

oauth2.cc:289 Getting auth token from metadata server docker

I also got some clue by running the following in the instance:

docker -a -i start <container_id>

Output: Unrecognized input header: 99

The cloudbuild.yml file looks like (I've replaced some variables with ...):

#cloudbuild.yaml
steps:
  - name: 'docker/compose:1.22.0'
    args: ['-f', 'docker/docker-compose.tb.prod.yml', 'up', '-d']
  - name: 'gcr.io/cloud-builders/docker'
    args: ['tag', 'tb:latest', '...']
  - name: 'gcr.io/cloud-builders/gcloud'
    args: [
      'beta', 'compute', '--project=...', 'instance-templates', 'create-with-container',
      'tb-app-staging-${COMMIT_SHA}',
      '--machine-type=n1-standard-2', '--network=...', '--network-tier=PREMIUM', '--metadata=google-logging-enabled=true',
      '--maintenance-policy=MIGRATE', '--service-account=...',
      '--scopes=https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append',
      '--tags=http-server,https-server', '--image=cos-stable-69-10895-62-0', '--image-project=cos-cloud', '--boot-disk-size=20GB', '--boot-disk-type=pd-standard',
      '--container-restart-policy=always', '--labels=container-vm=cos-stable-69-10895-62-0',
      '--boot-disk-device-name=...',
      '--container-image=...',
    ]   
  - name: 'gcr.io/cloud-builders/gcloud'
    args: [
      'beta', 'compute', '--project=...', 'instance-groups',
      'managed', 'rolling-action', 'start-update',
      'tb-app-staging',
      '--version',
      'template=...',
      '--zone=europe-west1-b',
      '--max-surge=20',
      '--max-unavailable=9999'
    ]   
images: ['...']
timeout: 1200s

Solution

  • I found the issue and I'll answer this question myself just incase someone else runs into the same issue.

    The problem was that in my docker-compose.yml I have the configuration for stdin_open and tty set to true but my cloudbuild.yml file did not accept it and was failing silently (annoying!).

    To fix the issue you will need to use the flags --container-stdin and --container-tty on the create-with-container command.

    More details can be found on the google docs https://cloud.google.com/compute/docs/containers/configuring-options-to-run-containers