Search code examples
kubernetesgoogle-cloud-runpreemption

Preemptive Cloud Run on GKE


Is it possible to create a Cloud Run on GKE (Anthos) Kubernetes Cluster with Preemptible nodes and if so can you also enable plugins such as gke-node-pool-shifter and gke-pvm-killer or will it interfere with cloud run actions such as autoscaling pods

https://hub.helm.sh/charts/rimusz/gke-node-pool-shifter

https://hub.helm.sh/charts/rimusz/gke-pvm-killer


Solution

  • Technically a Cloud Run on GKE cluster is still a GKE cluster at the end of the day, so it can have preemptive node pools.

    However, some Knative Serving components, such as the activator and autoscaler are in the hot path of serving the requests. You need to make sure they don't end up in a preemptible pool. Similarly, the controller and webhook are somewhat central to the control plane lifecycle of Knative API objects, so you also need to make sure these pods end up in a non-preemptible node pool.

    Secondly, Knative (for now) does not support node selectors or taints/tolerations: https://knative.tips/pod-config/node-affinity/ It simply doesn't give you a way to specify nodeSelector or other affinity fields in the Pod template of Knative Service object.

    Therefore, you gotta find out a way (like implementing your mutating admission webhook for Knative-created pods) to add such node selectors to the Pods, which is quite tedious.

    However, by combining node taints and pd tolerations, I think you can have Knative system components end up in a non-preemptible pool, and everything else (i.e. Knative-created pods) in other nodes (i.e. preemptible nodes).