Search code examples
google-cloud-platformdeploymentcontainersgoogle-cloud-run

Google Cloud Run Issue: Revision Not Ready to Serve Traffic - Container Fails to Start on Specified Port


I'm trying to deploy a service on Google Cloud Run, but I'm encountering an issue during deployment. I'm receiving the following error:

Deployment failed
ERROR: (gcloud.run.services.replace) Revision 'example-on-docker-00001-nl7' is not ready and cannot serve traffic. The user-provided container failed to start and listen on the port defined provided by the PORT=8080 environment variable. Logs for this revision might contain more information.

The revision ('example-on-docker-00001-nl7') of my service is not ready to serve traffic, and it appears that the container fails to start on the specified port (PORT=8080).

I've checked the service.yaml file and the container configuration, and it seems to be correct. I've made sure that environment variables are set up correctly. I've also verified the container image, and it seems to be valid.

How can I diagnose and resolve this issue? What else should I check to understand why the container isn't starting correctly on the specified port? Any help would be appreciated.

My Dockerfile to nginx:

FROM nginx:stable-alpine AS base
WORKDIR /app/public

EXPOSE 8080

FROM base AS development

FROM base AS distribution
ARG WORDPRESS
ARG CACHE_CONTROL
ARG NGINX_LISTEN_PORT
ENV NGINX_ENVSUBST_OUTPUT_DIR /etc/nginx/conf.d-from-template/
ADD build/environment/docker/nginx/etc/ /etc/nginx/
RUN /docker-entrypoint.d/20-envsubst-on-templates.sh
RUN rm /etc/nginx/conf.d/default.conf
ADD build/environment/docker/nginx/dist.tar.gz /app

my service.yaml:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: example
  generation: 1
  labels:
    cloud.googleapis.com/location: us-example
  annotations:
    run.googleapis.com/ingress: all
    run.googleapis.com/ingress-status: all
    run.googleapis.com/launch-stage: BETA
spec:
  template:
    metadata:
      labels:
        run.googleapis.com/startupProbeType: Default
      annotations:
        autoscaling.knative.dev/maxScale: '100'
        run.googleapis.com/startup-cpu-boost: 'true'
    spec:
      containerConcurrency: 80
      timeoutSeconds: 300
      serviceAccountName: example@developer.gserviceaccount.com
      containers:
      - name: nginx
        image: us-east1-docker.pkg.dev/project/repository/image:server-1
        ports:
          - name: http1
            containerPort: 8080
        env:
        - name: CACHE_CONTROL
          value: 'no-store, no-cache, must-revalidate, proxy-revalidate, max-age=0'
        - name: WORDPRESS
          value: wordpress
        resources:
          limits:
            memory: 512Mi
            cpu: 1000m
        startupProbe:
          timeoutSeconds: 240
          periodSeconds: 240
          failureThreshold: 1
          tcpSocket:
            port: 8080
      - name: wordpress
        image: us-east1-docker.pkg.dev/project/repository/image:fpm-1
        env:
        resources:
          limits:
            memory: 512Mi
            cpu: 1000m
        startupProbe:
          timeoutSeconds: 240
          periodSeconds: 240
          failureThreshold: 1
          tcpSocket:
            port: 9000
      
  traffic:
  - percent: 100
    latestRevision: true

I have reviewed the service.yaml configuration file to ensure that all container definitions, environment variables, and settings are correctly specified.

I expected the deployment to create a new revision of my service on Google Cloud Run and for that revision to be ready to serve traffic without any issues.

However, what actually resulted is an error during deployment with the following message:

Deployment failed
ERROR: (gcloud.run.services.replace) Revision 'example-on-docker-00001-nl7' is not ready and cannot serve traffic. The user-provided container failed to start and listen on the port defined provided by the PORT=8080 environment variable. Logs for this revision might contain more information.

Despite my efforts to ensure correct configuration, the container within the revision fails to start and listen on the specified port (PORT=8080), preventing it from serving traffic.

I'm seeking assistance in diagnosing and resolving this issue, as I've exhausted my initial troubleshooting steps and need further guidance to identify the root cause and fix the problem.


Solution

  • To troubleshoot you need to take these steps:

    1. try deploying each WEB sidecar separately on Cloud Run to ensure they start and serve traffic on their own, to confirm they are running.
    2. check Logs produces, they always contain a hint.
    3. when using sidecars, for ingress containers you might want to set depends-on param to ensure proper order how containers are started.
    4. the startup probe should be reasonable long to catch any slow starts.
    5. ensure the ingress container uses localhost:sidecarPort paths to address requests to sidecars.
    6. if you use a domain specific upstream, you need to add resolver 169.254.169.254 most probably to nginx config.

    I would increase the failureTreshold from 1 to something like 10, and would put a 1 second timeout.

    Number of times to retry the probe before marking the container as Unready.