Search code examples
kubernetesazure-aks

Why AKS nodes shows less amount of memory as allocatable where its actual memory is still available


I would like to know the factors which AKS nodes considering for reserve memory and how it calculates the allocateable memory.

In my cluster we have multiple nodes with (2 CPU, 7 GB RAM).

What I observed is all the nodes (18+) showing only 4 GB allocateable memory out of 7 GB. And due to this our cluster has resource conjunction for new deployments. Due to it we have to increase node count accordingly to meet the resource requirement.

Updated as i commented below adding the kubectl top node commands below. Here its strange that how nodes consumption % can be more than 100 %.

NAME                                CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
aks-nodepool1-xxxxxxxxx-vmssxxxx00   265m         13%    2429Mi          53%
aks-nodepool1-xxxxxxxxx-vmssxxxx01   239m         12%    3283Mi          71%
aks-nodepool1-xxxxxxxxx-vmssxxxx0g   465m         24%    4987Mi          109%
aks-nodepool2-xxxxxxxxx-vmssxxxx8i   64m          3%     3085Mi          67%
aks-nodepool2-xxxxxxxxx-vmssxxxx8p   114m         6%     5320Mi          116%
aks-nodepool2-xxxxxxxxx-vmssxxxx9n   105m         5%     2715Mi          59%
aks-nodepool2-xxxxxxxxx-vmssxxxxaa   134m         7%     5216Mi          114%
aks-nodepool2-xxxxxxxxx-vmssxxxxat   179m         9%     5498Mi          120%
aks-nodepool2-xxxxxxxxx-vmssxxxxaz   141m         7%     4769Mi          104%
aks-nodepool2-xxxxxxxxx-vmssxxxxb0   72m          3%     1972Mi          43%
aks-nodepool2-xxxxxxxxx-vmssxxxxb1   133m         7%     3684Mi          80%
aks-nodepool2-xxxxxxxxx-vmssxxxxb3   182m         9%     5294Mi          115%
aks-nodepool2-xxxxxxxxx-vmssxxxxb4   133m         7%     5009Mi          109%
aks-nodepool2-xxxxxxxxx-vmssxxxxbj   68m          3%     1783Mi          39%

So here I took example of aks-nodepool2-xxxxxxxxx-vmssxxxx8p 114m 6% 5320Mi 116% node

I calculated the memory usage of each pod in that node which was ,the total around 4.1 GB , and nodes allocatable memory was 4.6 GB out of 7GB actual.

Here "why the top node" output is not same as the each pods "top pods output" in that node ?

expected % == 4.1GB/4.6 GB== 93% But top node command gives output as 116%


Solution

  • This is an expected behavior of AKS to keep cluster safe and function properly.

    When you create a k8s cluster in AKS, it doesn't mean that you will get all the Memory/CPU that your VMs have. Depending on cluster configuration, it can consume even more than you shared. e.g. If you enabled OMS agents to get insights of AKS, it will also reserve some capacity.

    From official documentation, Kubernetes core concepts for Azure Kubernetes Service (AKS) --> Resource reservations. For associated best practices, see Best practices for basic scheduler features in AKS.

    AKS uses node resources to help the node function as part of your cluster. This usage can create a discrepancy between your node's total resources and the allocatable resources in AKS. Remember this information when setting requests and limits for user deployed pods.
    
    To find a node's allocatable resources, run:
    kubectl describe node [NODE_NAME]
    
    To maintain node performance and functionality, AKS reserves resources on each node. As a node grows larger in resources, the resource reservation grows due to a higher need for management of user-deployed pods.
    
    Two types of resources are reserved:
    - CPU
        Reserved CPU is dependent on node type and cluster configuration, which may cause less allocatable CPU due to running additional features.
    - Memory
        Memory utilized by AKS includes the sum of two values.
        
        - kubelet daemon
        The kubelet daemon is installed on all Kubernetes agent nodes to manage container creation and termination.
        By default on AKS, kubelet daemon has the memory.available<750Mi eviction rule, ensuring a node must always have at least 750 Mi allocatable at all times. When a host is below that available memory threshold, the kubelet will trigger to terminate one of the running pods and free up memory on the host machine.
        
        - A regressive rate of memory reservations for the kubelet daemon to properly function (kube-reserved).
        25% of the first 4 GB of memory
        20% of the next 4 GB of memory (up to 8 GB)
        10% of the next 8 GB of memory (up to 16 GB)
        6% of the next 112 GB of memory (up to 128 GB)
        2% of any memory above 128 GB
        
    Memory and CPU allocation rules:
    - Keep agent nodes healthy, including some hosting system pods critical to cluster health.
    - Cause the node to report less allocatable memory and CPU than it would if it were not part of a Kubernetes cluster.
    The above resource reservations can't be changed.
    
    For example, if a node offers 7 GB, it will report 34% of memory not allocatable including the 750Mi hard eviction threshold.
    
    0.75 + (0.25*4) + (0.20*3) = 0.75GB + 1GB + 0.6GB = 2.35GB / 7GB = 33.57% reserved
    
    In addition to reservations for Kubernetes itself, the underlying node OS also reserves an amount of CPU and memory resources to maintain OS functions.