I have created a k8s cluster with kops (1.21.4) on AWS and as per the docs on autoscaler. I have done the required changes to my cluster but when the cluster starts, the cluster-autoscaler pod is unable to schedule on any node. When I describe the pod, I see the following:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m31s (x92 over 98m) default-scheduler 0/4 nodes are available: 1 Too many pods, 3 node(s) didn't match Pod's node affinity/selector.
Looking at the deployment for cluster I see the following podAntiAffinity
:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- cluster-autoscaler
topologyKey: topology.kubernetes.io/zone
weight: 100
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- cluster-autoscaler
topologyKey: kubernetes.com/hostname
From this I understand that it want to prevent running pod on same node which already has cluster-autoscaler running. But that doesn't seem to justify the error seen in the pod status.
Edit: The pod for autoscaler has the following nodeSelectors
and tolerations
:
Node-Selectors: node-role.kubernetes.io/master=
Tolerations: node-role.kubernetes.io/master op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
So clearly, it should be able to schedule on master node too.
I am not sure what else do I need to do to make the pod up and running.
Posting the answer out of comments.
There are podAffinity
rules in place so first thing to check is if any errors in scheduling are presented. Which is the case:
0/4 nodes are available: 1 Too many pods, 3 node(s) didn't match Pod's node affinity/selector.
Since there are 1 control plane (on which pod is supposed to be scheduled) and 3 worked nodes, that leads to the error 1 Too many pods
related to the control plane.
Since cluster is running in AWS, there's a known limitation about amount of network interfaces
and private IP addresses
per machine type - IP addresses per network interface per instance type.
t3.small
was used which has 3 interfaces and 4 IPs per interface = 12 in total which was not enough.
Scaling up to t3.medium
resolved the issue.
Credits to Jonas's answer about the root cause.