Search code examples
kubernetesgoogle-cloud-platformgoogle-cloud-runknative

Google Cloud Run Internal error occurred while performing container health check


Sometimes (no specific correlation) Cloud Run fails to deploy a new revision because of issues with performing a container health check:

X Deploying... Deploying Revision. Waiting on revision <my_app>-s7px8.                                                                                                                                                             
  - Creating Revision... Internal error occurred while performing container health check.                                                                                                                                                   
  . Routing traffic... 
  1. I use this command to rollout: gcloud alpha run services replace ...

  2. Some google-specifc details from my cloudrun.yml:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: <my_app>
spec:
  traffic:
  - percent: 100
    latestRevision: true
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: '1'
        autoscaling.knative.dev/maxScale: '40'
        run.googleapis.com/vpc-access-connector: <my_app>-vpc-16gbps
        run.googleapis.com/sandbox: gvisor
    spec:
      timeoutSeconds: 50
      serviceAccountName: ...
      containerConcurrency: 100
      containers:
      - image: ...
        ports:
        - containerPort: 8080
          name: h2c
        resources:
          limits:
            cpu: '4'
            memory: 6Gi
  1. Application logs in Cloud Run look good, nothing strange
  2. The only "bad" logs in Cloud Run are:
{
  "protoPayload": {
    "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
    "status": {
      "code": 13,
      "message": "Ready condition status changed to False for Service <my_app> with message: Internal error occurred while performing container health check. Resource readiness deadline exceeded."
    },
    "serviceName": "run.googleapis.com",
    "response": {
      "apiVersion": "serving.knative.dev/v1",
      "kind": "Service",
      "status": {
        "observedGeneration": 1,
        "conditions": [
          {
            "type": "Ready",
            "status": "False",
            "message": "Internal error occurred while performing container health check. Resource readiness deadline exceeded.",
            "lastTransitionTime": "2022-10-05T14:11:19.884568Z"
          },
          {
            "type": "ConfigurationsReady",
            "status": "Unknown",
            "message": "Internal error occurred while performing container health check.",
            "lastTransitionTime": "2022-10-05T14:00:50.398740Z"
          },
          {
            "type": "RoutesReady",
            "status": "False",
            "reason": "RevisionFailed",
            "message": "Revision '<my_app>-s7px8' is not ready and cannot serve traffic. Internal error occurred while performing container health check.",
            "lastTransitionTime": "2022-10-05T14:11:19.884568Z"
          }
        ],
        "latestCreatedRevisionName": "<my_app>-s7px8",
      },
      "@type": "type.googleapis.com/google.cloud.run.v1.Service"
    }
  }
}
  1. The command fails within ~15-20 sec. I don't know what probes are executed by GCP (looks like TCP with 4 min timeout), but it looks too early to be exhausted.

  2. I don't have any custom startup probes, health checks, readiness/liveness probes, etc.

Does anybody face the same issue? Any ideas on where to look at?


Solution

  • Ok, that is what worked for me. I removed this line:

            autoscaling.knative.dev/minScale: '1'
    

    According to docs it's 0 by default.