Search code examples
kuberneteskubernetes-helmamazon-ekshpa

How to implement Kubernetes horizontal pod autoscaling with scale up/down policies?


Kubernetes v1.19 in AWS EKS

I'm trying to implement horizontal pod autoscaling in my EKS cluster, and am trying to mimic what we do now with ECS. With ECS, we do something similar to the following

  • scale up when CPU >= 90% after 3 consecutive 1-min periods of sampling
  • scale down when CPU <= 60% after 5 consecutive 1-min periods of sampling
  • scale up when memory >= 85% after 3 consecutive 1-min periods of sampling
  • scale down when memory <= 70% after 5 consecutive 1-min periods of sampling

I'm trying to use the HorizontalPodAutoscaler kind, and helm create gives me this template. (Note I modified it to suit my needs, but the metrics stanza remains.)

{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: {{ include "microserviceChart.Name" . }}
  labels:
    {{- include "microserviceChart.Name" . | nindent 4 }}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ include "microserviceChart.Name" . }}
  minReplicas: {{ include "microserviceChart.minReplicas" . }}
  maxReplicas: {{ include "microserviceChart.maxReplicas" . }}
  metrics:
    {{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
    - type: Resource
      resource:
        name: cpu
        targetAverageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
    {{- end }}
    {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
    - type: Resource
      resource:
        name: memory
        targetAverageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
    {{- end }}
{{- end }}

However, how do I fit the scale up/down information shown in Horizontal Pod Autoscaling in the above template, to match the behavior that I want?


Solution

  • The Horizontal Pod Autoscaler automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed metrics (like CPU or Memory).

    There is an official walkthrough focusing on HPA and it's scaling:


    The algorithm that scales the amount of replicas is the following:

    • desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

    An example (of already rendered) autoscaling can be implemented with a YAML manifest like below:

    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
      name: HPA-NAME
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: DEPLOYMENT-NAME
      minReplicas: 1
      maxReplicas: 10
      metrics:
      - type: Resource
        resource:
          name: cpu
          target:
            type: Utilization
            averageUtilization: 75
      - type: Resource
        resource:
          name: memory
          target:
            type: Utilization
            averageUtilization: 75
    

    A side note!

    HPA will use calculate both metrics and chose the one with bigger desiredReplicas!

    Addressing a comment I wrote under the question:

    I think we misunderstood each other. It's perfectly okay to "scale up when CPU >= 90" but due to logic behind the formula I don't think it will be possible to say "scale down when CPU <=70". According to the formula it would be something in the midst of: scale up when CPU >= 90 and scale down when CPU =< 45.

    This example could be misleading and not 100% true in all scenarios. Taking a look on following example:

    • HPA set to averageUtilization of 75%.

    Quick calculations with some degree of approximation (default tolerance for HPA is 0.1):

    • 2 replicas:
      • scale-up (by 1) should happen when: currentMetricValue is >=80%:
        • x = ceil[2 * (80/75)], x = ceil[2,1(3)], x = 3
      • scale-down (by 1) should happen when currentMetricValue is <=33%:
        • x = ceil[2 * (33/75)], x = ceil[0,88], x = 1
    • 8 replicas:
      • scale-up (by 1) should happen when currentMetricValue is >=76%:
        • x = ceil[8 * (76/75)], x = ceil[8,10(6)], x = 9
      • scale-down (by 1) should happen when currentMetricValue is <=64%:
        • x = ceil[8 * (64/75)], x = ceil[6,82(6)], x = 7

    Following this example, having 8 replicas with their currentMetricValue at 55 (desiredMetricValue set to 75) should scale-down to 6 replicas.

    More information that describes the decision making of HPA (for example why it's doesn't scale) can be found by running:

    • $ kubectl describe hpa HPA-NAME
    Name:                                                     nginx-scaler
    Namespace:                                                default
    Labels:                                                   <none>
    Annotations:                                              <none>
    CreationTimestamp:                                        Sun, 07 Mar 2021 22:48:58 +0100
    Reference:                                                Deployment/nginx-scaling
    Metrics:                                                  ( current / target )
      resource memory on pods  (as a percentage of request):  5% (61903667200m) / 75%
      resource cpu on pods  (as a percentage of request):     79% (199m) / 75%
    Min replicas:                                             1
    Max replicas:                                             10
    Deployment pods:                                          5 current / 5 desired
    Conditions:
      Type            Status  Reason              Message
      ----            ------  ------              -------
      AbleToScale     True    ReadyForNewScale    recommended size matches current size
      ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
      ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
    Events:
      Type     Reason                   Age                   From                       Message
      ----     ------                   ----                  ----                       -------
      Warning  FailedGetResourceMetric  4m48s (x4 over 5m3s)  horizontal-pod-autoscaler  did not receive metrics for any ready pods
      Normal   SuccessfulRescale        103s                  horizontal-pod-autoscaler  New size: 2; reason: cpu resource utilization (percentage of request) above target
      Normal   SuccessfulRescale        71s                   horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target
      Normal   SuccessfulRescale        71s                   horizontal-pod-autoscaler  New size: 5; reason: cpu resource utilization (percentage of request) above target
    

    HPA scaling procedures can be modified by the changes introduced in Kubernetes version 1.18 and newer where the:

    Support for configurable scaling behavior

    Starting from v1.18 the v2beta2 API allows scaling behavior to be configured through the HPA behavior field. Behaviors are specified separately for scaling up and down in scaleUp or scaleDown section under the behavior field. A stabilization window can be specified for both directions which prevents the flapping of the number of the replicas in the scaling target. Similarly specifying scaling policies controls the rate of change of replicas while scaling.

    Kubernetes.io: Docs: Tasks: Run application: Horizontal pod autoscale: Support for configurable scaling behavior

    I'd reckon you could used newly introduced field like behavior and stabilizationWindowSeconds to tune your workload to your specific needs.

    I also do recommend reaching out to EKS documentation for more reference, support for metrics and examples.