Search code examples
kuberneteskopskubernetes-pod

Why kubernetes scheduler ignores nodeAffinity?


I have a kubernetes cluster version 1.12 deployed to aws with kops

The cluster has several nodes marked with a label 'example.com/myLabel' that takes the values a, b, c, d

For example:

Node name          example.com/myLabel
instance1          a
instance2          b
instance3          c
instance4          d

And there is a test deployment

apiVersion: apps/v1
kind: Deployment
metadata:
 name: test-scheduler
spec:
 replicas: 6
 selector:
   matchLabels:
     app: test-scheduler
 template:
   metadata:
     labels:
       app: test-scheduler
   spec:
     tolerations:
       - key: spot
         operator: Exists
     affinity:
       nodeAffinity:
         preferredDuringSchedulingIgnoredDuringExecution:
         - preference:
             matchExpressions:
             - key: example.com/myLabel
               operator: In
               values:
               - a
           weight: 40
         - preference:
             matchExpressions:
             - key: example.com/myLabel
               operator: In
               values:
               - b
           weight: 35
         - preference:
             matchExpressions:
             - key: example.com/myLabel
               operator: In
               values:
               - c
           weight: 30
         - preference:
             matchExpressions:
             - key: example.com/myLabel
               operator: In
               values:
               - d
           weight: 25
     containers:
     - name: a
       resources:
         requests:
           cpu: "100m"
           memory: "50Mi"
         limits:
           cpu: "100m"
           memory: "50Mi"
       image: busybox
       command:
         - 'sleep'
         - '99999'

According to the documentation, nodeAffinity must exist for each node that can be used for a scheduled pod and the node having the biggest weight sum is chosen.

I expect all pods to be scheduled to node instance1 with label ‘a’, but in my case, the nodes are chosen randomly.

For example, here are the 5 nodes planned for 6 pods from the deployment, including another1 and another2 nodes, which do not contain my label at all (there is another node with this label with the value 'd'):

NODE        LABEL
another1    NONE
node1        a
node2        b
node3        c
another2    NONE

All nodes have capacity, they are available and can run pods

I have 2 questions

  1. Why does this happen?

  2. Where does the k8s scheduler log information on how a node is assigned for a pod? Events do not contain this information and scheduler logs on masters are empty

UPDATE:

My nodes contains correctly labels

example.com/myLabel=a
example.com/myLabel=b
example.com/myLabel=c
example.com/myLabel=d

Solution

  • preferredDuringSchedulingIgnoredDuringExecution just means that the scheduler will add the weight you set to the algorithm it uses to choose which node to schedule to. This is not a hard rule but a preferred rule.

    With the weights you set, you will get a somewhat even spread. You would need to have a very large sample size before you would start to see the spread you are aiming for.

    Keep in mind that the "weight" is not just taken by the affinity you set, other factors of the nodes have their own weight as well. If you want to see the effect more clearly, use a greater weight difference between each affinity