I have 2 OpenMP parallel regions (I am using C++ under gcc under Linux) with different numbers of threads - let's say 4 in one and 8 in the other. Then, if I run ps -T $(pidof name_of_process)
, 4 SPID are the same all the time, but remaining 4 change for every invocation. A sample output:
The first output
PID SPID TTY STAT TIME COMMAND
7578 7578 pts/1 Rl+ 1:18 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 7579 pts/1 Rl+ 0:57 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 7580 pts/1 Rl+ 0:57 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 7581 pts/1 Rl+ 0:57 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 19381 pts/1 Rl+ 0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 19382 pts/1 Rl+ 0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 19383 pts/1 Rl+ 0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 19384 pts/1 Rl+ 0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
The second output
PID SPID TTY STAT TIME COMMAND
7578 7578 pts/1 Rl+ 1:23 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 7579 pts/1 Rl+ 1:01 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 7580 pts/1 Rl+ 1:01 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 7581 pts/1 Rl+ 1:01 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 22314 pts/1 Rl+ 0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 22315 pts/1 Rl+ 0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 22316 pts/1 Rl+ 0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
7578 22317 pts/1 Sl+ 0:00 ./rampack casino -i banana_cube.ini --start-from=0p5 --continue=100000
Does it mean that OpenMP is constantly creating new 4 threads when entering 8-threaded section and destroying afterwards (or when entering 4-threaded section)? I would assume so, but many places, such as here suggest, that the threads should persist and wait for their turn. I wouldn't be bothered about the internal workings of OpenMP, but I have a problem where memory mysteriously leaks and I am starting to suspect that some thread resources are not released (or maybe the memory becomes increasingly fragmented?).
So is it a correct behavior? I am using gcc, gcc --version
: gcc-8 (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0
.
Moreover, if it is the case, is it possible to force OpenMP not to constantly destroy and spawn new threads without making the 2 sections use the same number of threads?
It is probably new threads yes. This is totally dependent of the platform and the OpenMP implementation. Moreover, this is unspecified by the OpenMP specification and so a compliant behavior. However, the GCC runtime (GOMP) and Intel/Clang one (IOMP) tend to reuse the threads as much as possible in practice. On my machine (with 6 cores), I am not able to reproduce your issue with both GOMP with GCC-10.2 and IOMP with Clang-11.0. Moreover, the following program show the same thread IDs which likely means they are reused:
#include <cstdio>
#include <unistd.h>
#include <sys/types.h>
int main() {
#pragma omp parallel num_threads(4)
printf("%d\n", gettid());
printf("----------\n");
#pragma omp parallel num_threads(8)
printf("%d\n", gettid());
printf("----------\n");
#pragma omp parallel num_threads(4)
printf("%d\n", gettid());
/*
// Update n°1
printf("----------\n");
#pragma omp parallel num_threads(8)
printf("%d\n", gettid());
*/
}
You should check the result of this program. If you cannot reproduce the behavior of your program on this simple example, it means the problem is specific to the behavior of your application. It could be the sign that you use multiple conflicting OpenMP runtimes. To check this hypothesis, please set the environment variable OMP_DISPLAY_ENV=TRUE
and look at the result. This behavior also often appear when you use nested regions.
UPDATE n°1: With another section of 8 threads, GOMP on GCC-10.2 create new unneeded threads while IOMP on Clang-11.0 does not create additional threads. This might be a bug (or a very surprising behavior of GOMP).
UPDATE n°2:
While the behavior of a runtime is implementation defined, you can give some hints to the runtime using the environment variable OMP_DYNAMIC
. Here is what the OpenMP specification states:
The
OMP_DYNAMIC
environment variable controls dynamic adjustment of the number of threads to use for executingparallel
regions by setting the initial value of thedyn-var
ICV. The value of this environment variable must be one of the following:true
|false
. If the environment variable is set totrue
, the OpenMP implementation may adjust the number of threads to use for executingparallel
regions in order to optimize the use of system resources. If the environment variable is set tofalse
, the dynamic adjustment of the number of threads is disabled. The behavior of the program is implementation defined if the value ofOMP_DYNAMIC
is neithertrue
norfalse
.
However, using OMP_DYNAMIC=TRUE
does not fix the problem on GOMP/GCC. Moreover, on both GOMP/GCC and IOMP/Clang, it limits the number of created threads to the number of available hardware threads (at least on my machine).
Keep in mind that the observed behavior of the OpenMP runtimes are compliant with the specification and your program should not assume no new threads are created (although you may want to tune the behavior for sake of performance).