In OpenBLAS, if you call openblas_set_num_threads
asking for a number of threads which is to be higher than the number of CPU threads that you have,
then the actual number of threads it will be set to use is your number of CPU Threads.
This can be seen in the source code
I am wondering if MKL has the same behavior? The docs do not explicitly mention it. but they do say:
The number specified is a hint, and Intel® MKL may actually use a smaller number.
MKL behavior is different and as matter of fact you can have more threads than there are cores.
The reason @Kristoffer doesn't see this in his answer, is because the dynamic adjustment is enabled per default:
By default, Intel® MKL can adjust the specified number of threads dynamically. [...] If dynamic adjustment of the number of threads is disabled, Intel® MKL attempts to use the specified number of threads in internal parallel regions (for more information, see theIntel® MKL Developer Guide). Use the mkl_set_dynamic function to control dynamic adjustment of the number of threads.
So if we use mkl_set_dynamic(0)
to switch the dynamic adjustment off, we will see the following:
>>> set_max_threads(44)
>>> get_max_threads()
6
>>> mkl_set_dynamic(0)
>>> get_max_threads()
44
So we see, that without dynamic adjustment MKL could use 44 threads. Whether this is really the case is another question, the help to mkl_get_dynamic
explains (even if the information seems to be a little bit outdated to me as get_max_threads
already is taken into consideration in get_max_threads
):
Suppose that the
mkl_get_max_threads
function returns the number of threads equal to N. [...] If dynamic adjustment is disabled, Intel ® MKL requests exactly N threads for internal parallel regions ([...]). However, the OpenMP* run-time library may be configured to supply fewer threads than Intel ® MKL requests, depending on the OpenMP* setting of dynamic adjustment.
OpenMP's method is given in Algorithm 2.1 OpenMP-5.0 specification (which I don't pretend to understand).
On my machine the important values are omp_get_thread_limit()=2147483647
and omp_get_dynamic()=0
, and so disabling MKL_DYNAMIC
and setting maximal thread-number higher I really can see descrease of performance due to more overhead.