Search code examples
azureazure-aks

Azure Kubernetes Service pod pending - 4 Insufficient cpu


i hope somebody can shade some light on this issue and how to solve it.

i have an Azure kubernetes service running on a FRee Tier with 2 worker nodes:

  • system node (vm Standard_D2ds_v5), this is used for helm charts and has autoscaling enable to max 2 instance
  • Linux node ( Standard_D2ds_v5), this is the node i am using for all my deployments and can scale up to 6 instances

I am releasing about 40 pods in total and each pod has an istio sidecar.

Today i start seeing that after each release, the new pod get in status pending with message: 0/5 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 4 Insufficient cpu. preemption: 0/5 nodes are available: 1 Preemption is not helpful for scheduling, 4 No preemption victims found for incoming pod..

I am a bit confused here because i cannot pin point exactly the issue.

Reading microsoft docs, i can see that there is a limit of 30pods per node, which mean i supposedly could have 180pods in total, 30 per node and 6 nodes in total on the linux worker scale set (right?)

If the issue is not the limit of pods running in the node, leaves me with the resource limitation and not setting the thresholds of how much resource each container can use.

i prepared this yaml file:

apiVersion: v1
kind: LimitRange
metadata:
  name: cpu-limit-range
  namespace: kube-system
spec:
  limits:
  - default:
      cpu: "500m"
    defaultRequest:
      cpu: "250m"
    type: Container
---
apiVersion: v1
kind: LimitRange
metadata:
  name: cpu-limit-range
  namespace: default
spec:
  limits:
  - default:
      cpu: "500m"
    defaultRequest:
      cpu: "250m"
    type: Container

i applied the resource limit but nothing seems to change, i can see the limits if i describe the namespaces but pods keep being in pending status.

Just to clarify as i believe is important. the workers use the azure CNI network which limits to 30pods per node.

Please if anyone can help me understand what i am doing wrong here i will be grateful and if you need more information, dont hesitate to ask


Solution

  • To resolve the issue of pending pods with insufficient CPU resources, you can try adjusting the CPU limits to a lower value and adding more worker nodes to your cluster.

    You can refer to below example and compare it with your setup.

    Here I have two nodes under my cluster navAKSCluster

    System Node Pool-

    az aks nodepool add \
      --resource-group navrg \
      --cluster-name navAKSCluster \
      --name systempool \
      --node-count 1 \
      --enable-cluster-autoscaler \
      --min-count 1 \
      --max-count 2 \
      --node-vm-size Standard_D2ds_v5 \
      --mode System
    
    

    user pool

    az aks nodepool add \
      --resource-group navrg \
      --cluster-name navAKSCluster \
      --name userpool \
      --node-count 1 \
      --enable-cluster-autoscaler \
      --min-count 1 \
      --max-count 6 \
      --node-vm-size Standard_D2ds_v5 \
      --mode User
    
    

    applied resource Limits

    apiVersion: v1
    kind: LimitRange
    metadata:
      name: cpu-limit-range
      namespace: kube-system
    spec:
      limits:
      - default:
          cpu: "500m"
        defaultRequest:
          cpu: "250m"
        type: Container
    ---
    apiVersion: v1
    kind: LimitRange
    metadata:
      name: cpu-limit-range
      namespace: default
    spec:
      limits:
      - default:
          cpu: "500m"
        defaultRequest:
          cpu: "250m"
        type: Container
    
    

    enter image description here

    and deployed an application

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx
      namespace: default
    spec:
      replicas: 10
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx:latest
            resources:
              requests:
                cpu: "100m"
                memory: "200Mi"
              limits:
                cpu: "500m"
                memory: "500Mi"
    
    

    enter image description here

    Now if you see, if I do

    kubectl get pods -n default
    
    

    enter image description here

    or

    kubectl top nodes
    
    

    enter image description here

    If you still face issues with pending pods due to CPU constraints, you can scale the node pool:

    kubectl scale deployment nginx --replicas=20 -n default
    
    

    Now verify

    kubectl get pods -n default
    
    

    enter image description here

    you can even set up Horizontal Pod Autoscaling

    kubectl autoscale deployment nginx --cpu-percent=50 --min=10 --max=50 -n default
    
    

    enter image description here

    References: