K8s cannot schedule new pods to worker nodes even though there are enough resources

Currently, I am facing an issue when K8s scale up new pods on old deployment and Rancher shows stuck on scheduling pods into K8s worker nodes. It eventually will be scheduled but will take some time, as I understand is to wait for the scheduler to find the node which fits the resource request. In the Event section of that deployment, it shows:

Warning FailedScheduling 0/8 nodes are available: 5 Insufficient memory, 3 node(s) didn't match node selector.

Then I go to the Nodes tab to check if there is any lack of memory on the worker nodes, and it shows my worker nodes like this:

STATE   NAME        ROLES   VERSION     CPU             RAM             POD                         
Active  worker-01   Worker  v1.19.5     14/19 Cores     84.8/86.2 GiB   76/110
Active  worker-02   Worker  v1.19.5     10.3/19 Cores   83.2/86.2 GiB   51/110
Active  worker-03   Worker  v1.19.5     14/19 Cores     85.8/86.2 GiB   63/110
Active  worker-04   Worker  v1.19.5     13/19 Cores     84.4/86.2 GiB   53/110
Active  worker-05   Worker  v1.19.5     12/19 Cores     85.2/86.2 GiB   68/110

But when I go into each server and check memory with top and free command, they output smimilar result like this one on worker-1 node:

top:
Tasks: 827 total,   2 running, 825 sleeping,   0 stopped,   0 zombie
%Cpu(s): 34.9 us, 11.9 sy,  0.0 ni, 51.5 id,  0.0 wa,  0.0 hi,  1.7 si,  0.0 st
KiB Mem : 98833488 total,  2198412 free, 81151568 used, 15483504 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 17101808 avail Mem

free -g:
              total        used        free      shared  buff/cache   available
Mem:             94          77           1           0          14          16
Swap:             0           0           0

So the memory available in the nodes are about 16-17 GB but still cannot schedule new pod into them. So my wonder is what causes this conflict of memory number, is it the amount between 86.2 (on Rancher GUI) and 94 (on server) is for the OS and other processes? And why Rancher shows K8s workload currently takes about 83-85 GB but in server the memory available is about 16-17GB. Is there any way to check deeper into this?

I'm still learning K8s so please explain in detail if you can or topics that talk about this.

Thanks in advance!

Solution

It doesn't matter what's actual resource consumption on worker nodes. What's really important is resource requests.

Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource.

but my wonder is why it shows almost full of 86.2 GB when the actual memory is 94GB

use kubectl describe node <node name> to see how much memory has been given available to kubelet on particular node

You will see something like

Capacity:
  cpu:                8
  ephemeral-storage:  457871560Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32626320Ki
  pods:               110
Allocatable:
  cpu:                8
  ephemeral-storage:  457871560Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32626320Ki
  pods:               110
......
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (1%)  100m (1%)
  memory             50Mi (0%)  50Mi (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)

K8s workload currently takes about 83-85 GB but in server the memory available is about 16-17GB.

From output of free in question, this is not really true

KiB Mem : 98833488 total, 2198412 free, 81151568 used, 15483504 buff/cache

2198412 free which is ~2GB and you have ~1.5GB in buff/cache.

You can use cat /proc/meminfo to get more details about OS-level memory info.