I heard that using OMP_NUM_THREADS=1
before calling a Python script that use multiprocessing make the script faster.
Is it true or not ? If yes, why so ?
Since you said in a comment that your Python program is calling a C module that uses OpenMP:
OpenMP does multi-threading within a process, and the default number of threads is typically the number that the CPU can actually run simultaneously. (This is generally the number of CPU cores, or a multiple of that number if the CPU has an SMT feature such as Intel's Hyper-Threading.) So if you have, for example, a quad-core non-hyperthreaded CPU, OpenMP will want to run 4 threads by default.
When you use Python's multiprocessing
module, your program starts multiple Python processes which can run simultaneously. You can control the number of processes, but often you'll want it to be the number of CPU cores/threads, e.g. returned by multiprocessing.cpu_count()
So, what happens on that quad-core CPU if you run a multiprocessing
program that runs 4 Python processes, and each calls an OpenMP function runs 4 threads? You end up running 16 threads on 4 cores. That'll work, but not at peak efficiency, since each core will have to spend some time switching between tasks.
basically turns off the OpenMP multi-threading, so each of your Python processes remains single-threaded.
Make sure you're starting enough Python processes if you do this, though! If you have 4 CPU cores and you only run 2 single-threaded Python processes, you'll have 2 cores utilized and the other 2 sitting idle. (In this case you might want to set OMP_NUM_THREADS=2