Trivia
Usually, when I want to write a multi-threaded program in C++, I ask the hardware regarding the number of supported concurrent threads as shown in what follows:
unsigned int numThreads = std::thread::hardware_concurrency();
This returns the total number of supported concurrency. Hence, if we have 2 CPUs each of which can support 12 threads, numThreads
will be equal to 24.
Problem
Recently I used numactl
to enforce a program to run on ONE CPU ONLY.
numactl -N 1 ./a.out
The problem is that std::thread::hardware_concurrency()
returns 24 even when I run it with numactl -N 1
. However, under such settings the output of nproc
is 12.
numactl -N 1 nproc --> output = 12
Question
Perhaps std::thread::hardware_concurrency()
is not designed to support such a scenario. That's not my concern. My question is, what is the best practice to get the supported number of threads when I want to run my program with numactl
.
Further information
In case you haven't dealt with numactl
, it can be used to run a process using a NUMA policy. For example, you can use it to enforce your program to be ran on one CPU only. The usage for such a case is shown above.
You'll have to use OS specific calls to inquire about the limitations that it imposes on your process.
hardware_concurrency
potentially returns a hint to the number of threads supported (by your hardware), or may return 0. The OS can limit your process to fewer threads than this number (or could potentially use more), whether using tools like numactl
, normal scheduling, or some other means. There is always the possibility that some process or user will change the allowable CPU set, which can effect the available concurrency. A typical C++ program is not expected to have to concern itself with these details, particularly since changes in the number of available threads are often transient.