Search code examples
google-cloud-run

readiness check for google cloud run - how?


I've searched quite extensively in the documentation at https://cloud.google.com/run/docs/how-to. I also found the YAML in the console.cloud.google.com, but I can't edit it. Is there a way to set it up using a command I might have missed?

EDIT: I couldn't find anything in https://cloud.google.com/sdk/gcloud/reference/beta/container/clusters/create about it either.

EDIT2:

I'm looking for a way to make Google cloud run have a readiness check for my app in a container. The same way that kubernetes does it - example here: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/. The problem is I don't want to have my service down for 30-60 seconds while the app in the container is still spinning up. Google instantly redirects the traffic causing users to wait for a long time when I push a new build.

EDIT3: Here's the time it takes to make the first initial request after I've deployed a new version. Postman request

EDIT4: The app I'm trying to start is in Python. It's a flask app serving a tensorflow model. I needs to load in several files into memory. This takes only 5-10seconds on my computer, but as you can it takes longer on cloud run.


Solution

  • Cloud Run does not have a readiness check other than confirming your service is listening on the specified port. Once that is done traffic starts routing to the new revision and previous serving revisions are scaled down as they wrap up in-progress requests.

    If your goal is to ensure the service is ready ASAP after deployment, you might make a heavier entrypoint that takes care of more setup tasks.

    A "heavier" entrypoint like this will help post-deploy responsiveness, at the cost of slower cold-starts.

    Examples of things you can front-load in the entrypoint (whether in BASH scripts or in your service before turning on the HTTP server):

    • Perform all necessary setup tasks such as loading files into memory.
    • Establish and preserve in global state any clients or connections to backing services.
    • Perform via your service code any healthchecks that backing services and resources are available.
    • Warm up in-container caches to minimize the first response.

    Again, this optimizes for post-deploy response by penalizing all cold starts.

    https://cloud.google.com/run/docs/tips#optimizing_performance