In Kubernetes, the desired replica count for a Horizontal Pod Autoscaler (HPA) is defined as the following:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
Source
In our clusters, we have some HPA's that are configured with 5 periodSecond intervals.
behavior:
scaleUp:
policies:
- type: Percent
value: 100
periodSeconds: 5
From my understanding, in English this means "double the number of pods every 5 seconds".
Now, let's imagine a scenario where the pod count needs to double from 100 to 200 replicas because a bunch of messages got dumped into a Queue. During this scale-out, let's imagine 10 of the pods get scheduled immediately, but the remaining 90 pods get stuck in a "Pending" state while they wait 1-2 minutes for cluster-autoscaler to provision new nodes.
When the next calculation is run in 5 seconds (because periodSeconds=5), will currentReplicas
be:
a) 110 replicas
b) 200 replicas
c) Other
More specifically, how is currentReplicas
defined? Are NotReady
pods included in the currentReplica
count?
In Kubernetes, the currentReplicas
value used by the Horizontal Pod Autoscaler (HPA) refers to the number of pods that have been requested, regardless of whether they are in the Running, Pending, or NotReady states. So, NotReady and Pending pods are included in the currentReplicas
count.
In your scenario:
periodSeconds: 5
), starting from 100 pods.When the next HPA calculation is run in 5 seconds, currentReplicas
will be 200 replicas. This includes the 110 pods that are running and pending combined.
The HPA tracks the desired number of pods, which was set to 200 during the last scale-up.
Pods in the Pending or NotReady state are counted toward the currentReplicas
because the HPA bases its calculations on the number of pods requested, not just those in the Running state.
In this case, the currentReplicas during the next HPA calculation would be b) 200 replicas.