I have a lightweight Kubernetes cluster deployed with Rancher's K3s.
Most of the time pods runs fine in it, however I noticed that from time to time it runs into NodeDiskPressure, which causes existing Pods to be evicted.
Looking into the available disk in the host, I found out that higher cluster loads which precede this issue coincide with high amount of usage in containerd runtime storage. In normal scenarios the amount of used space for these volumes is 70%, but they go up to +90%, which might be causing the pod eviction.
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/3cd5b4cad915d0914436df95359e7685aa89fcd3f95f0b51e9a3d7db6f11d01b/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/fd2a513ce2736f10e98a203939aaa60bd28fbbb4f9ddbbd64a0aedbf75cae216/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/73865fcfa8b448d71b9b7c8297192b16612bd01732e3aa56d6e6a3936305b4a2/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/fc68e6653cec69361823068b3afa2ac51ecd6caf791bf4ae9a65305ec8126f37/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/7fcd3e8789f0ca7c8cabdc7522722697f76456607cbd0e179dd4826393c177ec/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/9334ed12649bcdb1d70f4b2e64c80168bdc86c897ddf699853daf9229516f5cf/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/de1c6f47cf82ff3362f0fc3ed4d4b7f5326a490d177513c76641e8f1a7e5eb1a/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/079c26817021c301cb516dab2ddcf31f4e224431d6555847eb76256369510482/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/d0da2f62430306d25565072edf478ad92752255a40830544101aeb576b862a5f/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/6965b5a7e133c6d96be6764356b2ee427a9d895e09358098f5c9a5fde97e2144/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/2180b0c76ca7c8666acfd5338754a1c4a063a65e1d2e804af997b36bab1771e7/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/88caedc991159c3509da8c2a43619c0e904f9c1e17f36b5c5afd5268ef2e00b4/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/0a76f599cda9501d36dd4a2fe3526a85b6360f1132cff109906a8b2f5ce9b9b0/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/6005d872441aa87e64857b6b07ca03e2b0962b6f130a047a179f31d28afe4794/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/e1a76ec6ffc3eb2a2557e6269ec59155eb8cfbd941b6e206b9017d3775322a68/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/c72b1307d12ec39676eadb37b8c72b130f335f10eeceab92504892f80696a1ad/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/ae4c5f3100f44ceae63da2afc6e72603baf2e08730e47e50ff3a78f7617e57cf/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/02672bd47cce3163cc31a9ac7fe524fc11d3736b90c2c3f6eb01572837574dd5/rootfs
overlay 6281216 4375116 1906100 70% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/1c41e3c57a500a00e2cd1c399e386d05d0269588155f8a276a63febe697e855b/rootfs
Tried increasing available RAM on the host, to no effect apparently.
As can be seen in the output above, total volume of overlay is currently set to 6 GB. I've looked into both K3s and containerd's documentation to try to find out how to increase the size of the overlay filesystem, but unfortunately could not find anything.
At first I thought that remounting it with a bigger space would help solve this, but then I'm not sure which lower/upper directories should I use in order to setup the overlay.
Maybe cleaning up the directory manually could solve the issue? I noticed some of the folders in /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/ are some days old. Looking inside of them, seems that they store Docker image layers but also Pod states? Still not sure if removing these would break anything, so for now I kept those.
Any hints?
After some research, found couple of things that helped me solve this issue:
K3s uses containerd for container runtime. It comes with crictl - which provides some containerd functionality. Following command cleans unused images that are stored in the cache (inside /var, at least for the vanilla installation of K3s):
crictl rmi --prune
Also possible to change k3s.service parameters, so that you create a threshold of consumed space that will trigger garbage collection. You just need to add the following parameters after k3s server
, e.g.:
--kubelet-arg=image-gc-high-threshold=85 --kubelet-arg=image-gc-low-threshold=80