Search code examples
kubernetesgoogle-kubernetes-enginedevopskubernetes-podcost-management

How to configure K8s cluster to utilize spare CPU capacity for ML training jobs (or other low-priority CPU-intensive work)


I'd like to use spare CPU capacity in our kubernetes cluster for low-priority jobs -- specifically ML training using Tensorflow in this case -- without depriving higher-priority services on our cluster from CPU when they suddenly spike, akin to how one would with OS process priority. Currently we configure our autoscaler to add more nodes if CPU usage exceeds 60%, meaning as much as 40% of our CPU is unused at all times.

Questions: (1) Is this possible with K8s? After some experimentation it seems that Pod priority is not exactly the same, as my lower priority deployment does not instantly yield back CPU to my higher priority deployment. (2) If not possible, is there another generally-used strategy to utilize intentionally-overprovisioned CPU capacity, but yield immediately to higher priority services?


Solution

  • According to https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/resource-qos.md#qos-classes

    In an overcommitted system (where sum of limits > machine capacity) containers might eventually have to be killed, for example if the system runs out of CPU or memory resources. Ideally, we should kill containers that are less important. For each resource, we divide containers into 3 QoS classes: Guaranteed, Burstable, and Best-Effort, in decreasing order of priority.

    You can do like:

    Set high to Guaranteed

    containers:
      name: high
        resources:
          limits:
            cpu: 8000m
            memory: 8Gi
    

    Set ml-job to Best-Effort.

    containers:
      name: ml-job
    

    I'm not sure if your ml-job is killable. If not, then this strategy might not suitable to you.