If I run the following python code
def dummy(t):
A = np.random.rand(10000, 10000)
inv = np.linalg.inv(A)
return np.linalg.norm(inv)
if __name__ == "__main__":
with multiprocessing.Pool(2) as pool:
print(pool.map(dummy, range(20)))
more than the specified 2 processes are spawned, or at least it seems that way. More specifically, when I use htop
to monitor the system, it shows all threads as busy, i.e. 100% CPU usage.
I would expect that only 2 threads show full 100% usage, but perhaps that assumption is wrong.
Curiously enough, if the matrix size is increased (by a factor of 10), only the 2 specified threads are busy.
Used python version: 3.6.9 / 3.8.5. Machine: skylake server with 40 cores.
As the comment from @Booboo suggests, the example contains additional parallelism not accounted for. Most likely the numpy.linalg.inv
call uses some sort of multithreaded under the hood. Therefore the assumption, that only as many hardware threads as the number of processes specified in the Pool
constructor, is invalid. If the source of the additional parallelism is known and can be disabled, the expected behavior can be achieved.
This answer contains instructions about how to limit the number of threads available to numpy. This might give performance-benefits if you have a higher-level source of parallelism. Note that it can only be done globally through environment-variables before importing numpy, not on a per-function basis.