I know I can use os.sched_setaffinity
to set affinity, but it seems that I can't use it to change affinity in realtime. Below is my code:
First, I have a cpp program
// test.cpp
#include <iostream>
#include <thread>
#include <vector>
void workload() {
unsigned long long int sum = 0;
for (long long int i = 0; i < 50000000000; ++i) {
sum += i;
}
std::cout << "Sum: " << sum << std::endl;
}
int main() {
unsigned int num_threads = std::thread::hardware_concurrency();
std::cout << "Creating " << num_threads << " threads." << std::endl;
std::vector<std::thread> threads;
for (unsigned int i = 0; i < num_threads; ++i) {
threads.push_back(std::thread(workload));
}
for (auto& thread : threads) {
thread.join();
}
return 0;
}
Then, I compile it
g++ test.cpp -O0
and I'll get an a.out
file in the same directory.
Then, still in the same directory, I have a python file
# test.py
from subprocess import Popen
import os
import time
a = set(range(8, 16))
b = set(range(4, 12))
if __name__ == "__main__":
proc = Popen("./a.out", shell=True)
pid = proc.pid
print("pid", pid)
tic = time.time()
while True:
if time.time() - tic < 10:
os.sched_setaffinity(pid, a)
print("a", os.sched_getaffinity(pid))
else:
os.sched_setaffinity(pid, b)
print("b", os.sched_getaffinity(pid))
res = proc.poll()
if res is None:
time.sleep(1)
else:
break
a.out
would run a long time, and my expect for test.py
is: in the first 10 seconds, I would see cpu 8~15 busy while 0~7 idle; and after 10 seconds, I would see cpu 4~11 busy while others idle. But as I observed with htop
, I found that in the first 10 seconds, my observation indeed met my expect, however after 10 seconds, I could see b {4, 5, 6, 7, 8, 9, 10, 11}
every second, as if I successfully set the affinity; but on htop
, I still found that cpu 8~15 busy while 0~7 idle until the program normally stopped, which means I faild to set the affinity.
I'd like to ask why would this happen? I read the manual but didn't find anything to mention about it. And it seems that python's os.sched_setaffinity
doesn't return anything so I can't see the result.
I'm using AMD cpu, but I don't think that matters.
The root of the problem is that on Linux, sched_setaffinity() affects a thread, not a process. The main thread has the same id as the process, but subsequent threads have different ids, and they inherit the CPU affinity from the parent thread.
Apparently Python is quick enough to set the CPU affinity of the main thread, so the child threads inherited the initial CPU mask from the main thread, but changing the CPU affinity of the main thread later won't affect already running threads (but it would, in theory, affect new threads if they are spawned from the main thread later).
To work around it, you can introduce a helper function like this:
def ChangeProcessAffinity(pid, cpus):
for tid in map(int, os.listdir(f'/proc/{pid}/task/')):
try:
os.sched_setaffinity(tid, cpus)
except e:
pass # maybe log the error instead
...then call that in your while loop. Note that there are two problems with this:
I don't think there is an atomic way to set the CPU affinity for all threads in a process; the taskset
command line utility (which you already found) does approximately the same as I suggested above.
If you want a more robust solution, I think it's best to implement it inside the C++ binary itself (if you have control over it), since it knows when threads are started or stopped.