Search code examples
linuxx86-64cpuamd-processoraffinity

Force Linux to schedule processes on CPU cores that share CPU cache


Modern AMD CPUs consist of multiple CCX. Each CCX has a separate L3 cache.

It's possible to set process affinity to limit a process to certain CPU cores.

Is there a way to force Linux to schedule two processes (parent process thread & child process) on two cores that share L3 cache, but still leave the scheduler free to choose which two cores?


Solution

  • Newer Linux may do this for you: Cluster-Aware Scheduling Lands In Linux 5.16 - there's support for scheduling decisions to be influenced by the fact that some cores share resources.

    If you manually pick a CCX, you could give them each the same affinity mask that allows them to schedule on any of the cores in that CCX.

    An affinity mask can have multiple bits set.


    I don't know of a way to let the kernel decide which CCX, but then schedule both tasks to cores within it. If the parent checks which core it's currently running on, it could set a mask to include all cores in the CCX containing it, assuming you have a way to detect how core #s are grouped, and a function to apply that.

    You'd want to be careful that you don't end up leaving some CCXs totally unused if you start multiple processes that each do this, though. Maybe every second, do whatever top or htop do to check per-core utilization, and if so rebalance? (i.e. change the affinity mask of both processes to the cores of a different CCX). Or maybe put this functionality outside the processes being scheduled, so there's one "master control program" that looks at (and possibly modifies) affinity masks for a set of tasks that it should control. (Not all tasks on the system; that would be a waste of work.)

    Or if it's looking at everything, it doesn't need to do so much checking of current load average, just count what's scheduled where. (And assume that tasks it doesn't know about can pick any free cores on any CCX, like daemons or the occasional compile job. Or at least compete fairly if all cores are busy with jobs it's managing.)


    Obviously this is not helpful for most parent/child processes, only ones that do a lot of communication via shared memory (or maybe pipes, since kernel pipe buffers are effectively shared memory).

    It is true that Zen CPUs have varying inter-core latency within / across CCXs, as well as just cache hit effects from sharing L3. https://www.anandtech.com/show/16529/amd-epyc-milan-review/4 did some microbenchmarking on Zen 3 vs. 2-socket Xeon Platinum vs. 2-socket ARM Ampere.