Search code examples
terraformterraform-provider-gcp

Terraform GCP: Updating Cloud Run service without user downtime


Can I update the Docker image of an existing resource "google_cloud_run_service" without destroying and recreating the service? How can I avoid user downtime of my Cloud Run service when I need to update the Docker image?

I have created several Terraform (TF) files to use with my CircleCI builds. In my TF files, I am building a REST API, testing, building a Docker image, and using TF to deploy that image to Google Cloud Provider to run as a Cloud Run service. In order to get this TF apply to work, I need to destroy the Cloud Run resource first, then recreate it with the latest Docker image built in the previous step. I can get this to work for a dev/test environment, but my approach will not work in a production environment, as there will be downtime.

I am looking for advice on an approach to update my resources without users experiencing service downtime.


Solution

  • You probably want to look at implementing a canary deployment strategies. The idea being that you create two service and promote from one service to the other. It effectively involves creating a new service with your new image and gradually cutting traffic over to it before tearing down the old service.

    By doing this and observing failure rates in the new service you have confidence in the new service before it has the opportunity to damage your customers' experience.

    Here's a couple resources that might help:

    CloudRun Automated Canary Testing

    Cloud Run and Gradual Rollouts

    Cloud Run Service Traffic Split