The use case I'm trying to get my head around takes place when you have various burstable pods scheduled on the same node. How can you ensure that the workload in a specific pod takes priority over another pod when the node's kernel is scheduling CPU and the CPU is fully burdened? In a typical Linux host my thoughts on contention between processes immediately goes to 'niceness' of the processes, however I don't see any equivalent k8s mechanism allowing for specification of CPU scheduling priority between the processes within pods on a node.
I've read of the newest capabilities provided by k8s which (if I interpret the documentation correctly) is just providing a mechanism for CPU pinning to pods which doesn't really scratch my itch. I'd still like to maximize CPU utilization by the "second class" pods if the higher priority pods don't have an active workload while allowing the higher priority workload to have CPU scheduling priority should the need arise.
So far, having not found a satisfactory answer I'm thinking that the community will opt for an architectural solution, like auto-scaling or segregating the workloads between nodes. I don't consider these to be truly addressing the issue, but really just throwing more CPUs at it which is what I'd like to avoid. Why spin up more nodes when you've got idle CPU?
The CPU request correlates to cgroup CPU priority. Basically if Pod A has a request of 100m CPU and Pod B has 200m, even in a starvation situation B will get twice as many run seconds as A.