Cloud Run sending SIGTERM with no visible scale down on container instances

I've deployed a Python FastAPI application on Cloud Run using Gunicorn + Uvicorn workers.

Cloud Run configuration:

Dockerfile


FROM python:3.8-slim

# Allow statements and log messages to immediately appear in the Knative logs
ENV PYTHONUNBUFFERED True

ENV PORT ${PORT}

ENV APP_HOME /app

ENV APP_MODULE myapp.main:app

ENV TIMEOUT 0

ENV WORKERS 4

WORKDIR $APP_HOME

COPY ./requirements.txt ./

# Install production dependencies.
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt

# Copy local code to the container image.
COPY . ./

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
# For environments with multiple CPU cores, increase the number of workers
# to be equal to the cores available.
# Timeout is set to 0 to disable the timeouts of the workers to allow Cloud Run to handle instance scaling.

CMD exec gunicorn --bind :$PORT --workers $WORKERS --worker-class uvicorn.workers.UvicornWorker --timeout $TIMEOUT $APP_MODULE  --preload

My application receives a requests and does the following:

Makes async call to cloud-firestore using firestore.AsyncClient
Runs an algorithm using Google OR-Tools. I've used a Cprofiler to check that this task on average takes < 500 ms to complete.
Adds a FastAPI async Background Task to write to BigQuery. This is achieved as follows:

from fastapi.concurrency import run_in_threadpool

async def bg_task():
    # create json payload
    errors = await run_in_threadpool(lambda: client.insert_rows_json(table_id, rows_to_insert))  # Make an API request.

I have been noticing intermittent Handling signal: term logs which causes Gunicorn to shut down processes and restart them. I can't get my head around as to why this might be happening. And the surprising bit is that this happens sometimes at off-peak hours when the API is receiving 0 requests. There doesn't seem to be any apparent scaling down of Cloud Run instances to be causing this issue either.

Issue is, this also happens quite frequently during production load to my API during peak hours - and even causes Cloud Run to autoscale from 2 to 3/4 instances. This adds cold start times to my API. My API receives on average 1 reqs/minute.

Cloud Run metrics during random SIGTERM

As clearly shown here, my API has not been receiving any requests in this period and Cloud Run has no business killing and restarting Gunicorn processes.

Another startling issue is that this seems to only happen in my production environment. In my development environment, I have the exact SAME setup but I don't see any of these issues there.

Why is Cloud Run sending SIGTERM and how do I avoid it?

Solution

Cloud Run is a serverless platform, that means the server management is done by Google Cloud and it can choose to stop some instance time to time (for maintenance reason, for technical issue reason,...).

But it changes nothing for you, of course a cold start but it should be invisible for your process, even in high load, because you have a min-instance param to 2 that keep the instance up and ready to serve the traffic without cold start.

Can you have 3 or 4 instances in parallel, instead of 2 (min value)? Yes, but the Billable instance is flat to 2. Cloud Run, again, is serverless, it can create instances to backup and be sure that the future shut down of some won't impact your traffic. It's an internal optimization. No addition cost, it just works well!

Can you avoid that? No, because it's serverless, and also because there no impact on your workloads.

Last point about "environment". For Google Cloud, all the project are production projects. No difference, google can't know what is critical or not, therefore all is critical.

If you note difference between 2 projects it's simply because your projects are deployed on different Google Cloud internal clusters. The status, performances, maintenance operations (...) are different between clusters. And again, you can't do anything for that.