Search code examples
kubernetesgoogle-kubernetes-engineautoscaling

GKE node pool with Autoscaling does not scale down


I have a GKE cluster with two nodepools. I turned on autoscaling on one of my nodepools but it does not seem to automatically scale down.

autoscaling enabled

I have enabled HPA and that works fine. It scales the pods down to 1 when I don't see traffic.

The API is currently not getting any traffic so I would expect the nodes to scale down as well.

But it still runs the maximum 5 nodes despite some nodes using less than 50% of allocatable memory/CPU.

5 nodes

What did I miss here? I am planning to move these pods to bigger machines but to do that I need the node autoscaling to work to control the monthly cost.


Solution

  • There are many reasons that can cause CA to not be downscaling successfully. If we resume how this should work normally it will be something like this:

    • Cluster autoscaler will periodically check (every 10 seconds) utilization of the nodes.
    • If the utilization factor is less than 0.5 the node will be considered as under utilization.
    • Then the nodes will be marked for removal and will be monitored for next 10 mins to make sure the utilization factor stays less than 0.5.
    • If even after 10 mins it stays under utilized then the node would be removed by cluster autoscaler.

    If above is not being accomplished, then something else is preventing your nodes to be downscaling. In my experience PDBs needs to be applied to kube-system pods and I would say that could be the reason why; however, there are many reasons why this can be happening, here are reasons that can cause downscaling issues:

    1. PDB is not applied to your kube-system pods. Kube-system pods prevent Cluster Autoscaler from removing nodes on which they are running. You can manually add Pod Disruption Budget(PDBs) for the kube-system pods that can be safely rescheduled elsewhere, this can be added with next command:

    `kubectl create poddisruptionbudget PDB-NAME --namespace=kube-system --selector app=APP-NAME --max-unavailable 1`
    

    2. Containers using local storage (volumes), even empty volumes. Kubernetes prevents scale down events on nodes with pods using local storage. Look for this kind of configuration that prevents Cluster Autoscaler to scale down nodes.

    3. Pods annotated with cluster-autoscaler.kubernetes.io/safe-to-evict: true. Look for pods with this annotation that can be preventing Nodes scaledown

    4. Nodes annotated with cluster-autoscaler.kubernetes.io/scale-down-disabled: false. Look for Nodes with this annotation that can be preventing cluster Autoscale. These configurations are the ones I will suggest you check on, in order to make your cluster to be scaling down nodes that are under utilized. -----

    Also you can see this page where explains the configuration to prevent the downscales, which can be what is happening to you.