Search code examples
kubernetesnodeselector

NodeSelector for Job - use it only if NodeSelector exists


I have a Job Object which shall use a Node Selector to only use Nodes, which have a GPU under the hood. I know to to set it (it gets converted from a string in a python program).

    job = f"""
    apiVersion: batch/v1
    kind: Job
    ....
          nodeSelector:
            sma-gpu-size: {gpu_size}
    """

Our ops team sets this selectors in the next few weeks, but currently when setting the node selector, the service is not able to start.

2022-09-20T07:20:24Z [Warning] 0/35 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 30 node(s) didn't match Pod's node affinity/selector.

Is it somehow possible to use these node_selectors only if they are available, something like this (pseudo yaml)?

    job = f"""
    apiVersion: batch/v1
    kind: Job
    ....
          nodeSelector:
            if_available:
                sma-gpu-size: {gpu_size}
            else:
                Any
    """

Solution

  • Is it somehow possible to use these node_selectors only if they are available

    It's not, but you can replace the nodeSelector with a nodeAffinity to achieve that.

    spec:
      [...]
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: sma-gpu-size
                operator: In
                values:
                - {gpu_size}
    

    From docs:

    preferredDuringSchedulingIgnoredDuringExecution: The scheduler tries to find a node that meets the rule. If a matching node is not available, the scheduler still schedules the Pod.

    After the label has been added, you can switch to requiredDuringSchedulingIgnoredDuringExecution:

    requiredDuringSchedulingIgnoredDuringExecution: The scheduler can't schedule the Pod unless the rule is met. This functions like nodeSelector, but with a more expressive syntax.

    or back to nodeSelector.