python multithreading tensorflow reinforcement-learning openblas

Why would setting "export OPENBLAS_NUM_THREADS=1" impair the performance?

I try to set "export OPENBLAS_NUM_THREADS=1" as this document suggests. But I found a strange phenomenon that setting this significantly impairs the performance of my RL algorithms(I've done some tests for TD3 and SAC, all results consistently indicate that "export OPENBLAS_NUM_THREADS=1" impairs the performance). Why would this cause such a big problem?

BTW, the algorithms are implemented using Tensorflow1.13, data are fed into the neural network through tf.data.Dataset. all tests are done on BipedalWalker-v2 environment from OpenAI's Gym.

Solution

The linked guide suggests setting this variable specifically when using ray, not always.

AFAICS, that's because that framework itself spawns many processes (one for each actor or something), so each of them using multiple threads would bring no speedup. This is not the case when there's only one or only a few processes.

On a general note, OpenBLAS FAQ says that OpenBLAS' multithreading might "conflict" with the main program's multithreading and recommends setting OPENBLAS_NUM_THREADS=1 in such a case. The FAQ entry however fails to provide any details to verify its claim, so it can very well be obsolete! As per https://github.com/obspy/obspy/wiki/Notes-on-Parallel-Processing-with-Python-and-ObsPy, symptoms of such a "conflict" are rampant deadlocks and segfaults. So if you have nothing of the kind, you are in the clear. Major Python libraries are very responsible in dealing with such problem themselves rather than dumping them on the user, so I'm pretty sure that if OpenBLAS has any usage restrictions, numpy and scipy enforce them internally and automatically if you are using OpenBLAS through them.