Search code examples
kubernetesgoogle-kubernetes-engineautoscaling

GKE Node Only Autoscale Up


How can I automatically trigger scaling up GKE nodes when resources are low without scaling back down?

Enabling node autoscaling in GKE doesn't have an option to disable scaling down. Metric-based node autoscaling also doesn't appear to be a feature.

I would like to scale the node pools up when resources are limited, but not scale down when resources are freed. Our architecture has stateful applications running on GKE nodes (in the process of fixing this inherited problem). As a result, nodes that scale down during work hours impact application availability.

Our current solution uses a log-based alert that notifies when pods fail to start due to resource scarcity and we manually increase the nodepool in Google Console.


Solution

  • On GKE, you have profiles to choose for cluster-autoscaler as follows:

    Balanced (default) - does check the nodes utilization every 10 seconds and marks a node for removal once monitored underutilized over 10mns. You can read more here about the node scale down process.

    Optimize-utilization - profile based on utilization or available resources, aggressive auto-scaledown, preferably for cost optimization.

    Also, as per documentation, you should consider pod scheduling and PodDisruptionBudget with your desired autoscaling-profile, since it prevents rescheduling and affects scale-down/scale-up process.

    In your case, I think it is best that you set minimum number of nodes per nodepool (if you already have projected your overall resource requests), and set the profile to balanced to automatically provision nodes based on workload requests. Include also PodDisruptionBudget on your workload to maintain application availability.

    As discussed also by aboitier, if you intend to make a node available and not join the scaled down process, you can check this command to apply the annotation needed:

    kubectl annotate node <nodename> cluster-autoscaler.kubernetes.io/scale-down-disabled=true
    

    You can check this link about how to prevent Cluster Autoscaler from scaling down a particular node.