I tracked down the CPU usage. Even after increasing the number of nodes I still get a persistent scheduling error with the following terms: Insufficient cpu, MatchNodeSelector, PodToleratesNodeTaints.
My hint came from this article. It mentions:
Do not allow new pods to schedule onto the node unless they tolerate the taint, but allow all pods submitted to Kubelet without going through the scheduler to start, and allow all already-running pods to continue running. Enforced by the scheduler.
The configuration contains the following.
spec:
replicas: 1
template:
metadata:
name: ceph-mds
namespace: ceph
labels:
app: ceph
daemon: mds
spec:
nodeSelector:
node-type: storage
... and more ...
Notice the node-type
. I have to kubectl label nodes node-type=storage --all
so I can label all nodes with node-type=storage
. I could also choose to only dedicate some nodes as storage nodes.
In kops edit ig nodes
, according to this hint, you can add this label in the following.
spec:
nodeLabels:
node-type: storage