I have a Job Object which shall use a Node Selector to only use Nodes, which have a GPU under the hood. I know to to set it (it gets converted from a string in a python program).
job = f"""
apiVersion: batch/v1
kind: Job
....
nodeSelector:
sma-gpu-size: {gpu_size}
"""
Our ops team sets this selectors in the next few weeks, but currently when setting the node selector, the service is not able to start.
2022-09-20T07:20:24Z [Warning] 0/35 nodes are available: 2 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 30 node(s) didn't match Pod's node affinity/selector.
Is it somehow possible to use these node_selectors only if they are available, something like this (pseudo yaml)?
job = f"""
apiVersion: batch/v1
kind: Job
....
nodeSelector:
if_available:
sma-gpu-size: {gpu_size}
else:
Any
"""
Is it somehow possible to use these node_selectors only if they are available
It's not, but you can replace the nodeSelector
with a nodeAffinity
to achieve that.
spec:
[...]
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: sma-gpu-size
operator: In
values:
- {gpu_size}
From docs:
preferredDuringSchedulingIgnoredDuringExecution: The scheduler tries to find a node that meets the rule. If a matching node is not available, the scheduler still schedules the Pod.
After the label has been added, you can switch to requiredDuringSchedulingIgnoredDuringExecution
:
requiredDuringSchedulingIgnoredDuringExecution: The scheduler can't schedule the Pod unless the rule is met. This functions like nodeSelector, but with a more expressive syntax.
or back to nodeSelector
.