python numpy multiprocessing threadpool python-multiprocessing

multiprocessing.Pool spawns too many threads

If I run the following python code

def dummy(t):
    A = np.random.rand(10000, 10000)
    inv = np.linalg.inv(A)
    return np.linalg.norm(inv)


if __name__ == "__main__":
    with multiprocessing.Pool(2) as pool:
        print(pool.map(dummy, range(20)))

more than the specified 2 processes are spawned, or at least it seems that way. More specifically, when I use htop to monitor the system, it shows all threads as busy, i.e. 100% CPU usage. I would expect that only 2 threads show full 100% usage, but perhaps that assumption is wrong.

Curiously enough, if the matrix size is increased (by a factor of 10), only the 2 specified threads are busy.

Used python version: 3.6.9 / 3.8.5. Machine: skylake server with 40 cores.

Solution

As the comment from @Booboo suggests, the example contains additional parallelism not accounted for. Most likely the numpy.linalg.inv call uses some sort of multithreaded under the hood. Therefore the assumption, that only as many hardware threads as the number of processes specified in the Pool constructor, is invalid. If the source of the additional parallelism is known and can be disabled, the expected behavior can be achieved.

This answer contains instructions about how to limit the number of threads available to numpy. This might give performance-benefits if you have a higher-level source of parallelism. Note that it can only be done globally through environment-variables before importing numpy, not on a per-function basis.