I'm attempting to optimise the use of thread in a complex for loop with OpenMP. The basic code looks like this:
for (...) //loop1
{
#pragma omp parallel
{
#pragma omp single
{
//section that needs to be executed only once
}
#pragma omp for
{
for (...) //loop2
{
...
}
}
#pragma omp single
{
//section that needs to be executed only once
}
#pragma omp for
{
for (...) //loop3
{
...
}
}
...
}
}
My problem is about thread creation/destruction as this code implies that every iteration creates and destroys N threads. Is there a way to tell the runtime to reuse the same threads (something like a thread pool) or is this something left to the implementation?
I need to pay attention to these constraints:
EDIT Each iteration of loop1 must be executed only once (so I can't make the whole loop parallel
First, threads will not be created/destroyed at each loop1 iteration. Usually there is a thread wait policy which you can control using the OMP_WAITY_POLICY
environment variable:
https://www.openmp.org/spec-html/5.0/openmpse55.html#x294-20640006.7
In addition, there is nothing (at least in your pseudo-code) preventing to wrap all the code in a single parallel region. The code could perfectly be:
#pragma omp parallel
for (...) //loop1
{
#pragma omp single
{
//section that needs to be executed only once
}
#pragma omp for
for (...) //loop2
{
...
}
#pragma omp single
{
//section that needs to be executed only once
}
#pragma omp for
for (...) //loop3
{
...
}
...
}
Implicit barriers, at the end of each worksharing construct (ie, first single, loop2, second single, and loope), will guaranttee the order between iterations. In the case those barriers were no needed, you can avoid them using the nowait
clause.