Search code examples
dockerkubernetesgpuresource-scheduling

kubernetes scheduling for expensive resources


We have a Kubernetes cluster.

Now we want to expand that with GPU nodes (so that would be the only nodes in the Kubernetes cluster that have GPUs).

We'd like to avoid Kubernetes to schedule pods on those nodes unless they require GPUs.

Not all of our pipelines can use GPUs. The absolute majority are still CPU-heavy only.

The servers with GPUs could be very expensive (for example, Nvidia DGX could be as much as $150/k per server).

If we just add DGX nodes to Kubernetes cluster, then Kubernetes would schedule non-GPU workloads there too, which would be a waste of resources (e.g. other jobs that are getting scheduled later and do need GPUs, may have other non-GPU resources there exhausted there like CPU and memory, so they would have to wait for non-GPU jobs/containers to finish).

Is there is a way to customize GPU resource scheduling in Kubernetes so that it would only schedule pods on those expensive nodes if they require GPUs? If they don't, they may have to wait for availability of other non-GPU resources like CPU and memory on non-GPU servers...

Thanks.


Solution

  • Using labels and label selectors for your nodes is right. But you need to use NodeAffinity on your pods.

    Something like this:

    apiVersion: v1
    kind: Pod
    metadata:
      name: run-with-gpu
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/node-type
                operator: In
                values:
                - gpu
      containers:
      - name: your-gpu-workload
        image: mygpuimage
    

    Also, attach the label to your GPU nodes:

    $ kubectl label nodes <node-name> kubernetes.io/node-type=gpu