I'm trying to implement the following codes to see how OpenMP threads are managed over the nested loop where each inner/outer loops are separately implemented in a function and its caller.
Each loop is implemented with the statement
#pragma omp parallel for
and I'm assuming the pragma
for the inner loop is ignored.
To see this, I printed the thread number in each loop.
Then, what I could see is the following, where the thread id in the inner loop is always zero not identical to the thread number corresponding to the caller. Why does this happen?
Calling 0 from 0
Calling 2 from 1
Calling 6 from 4
Calling 8 from 6
Calling 4 from 2
Calling 7 from 5
Calling 5 from 3
Calling 0 from 0 // Expecting 3
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 0 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 0 from 0
Calling 0 from 0
Calling 0 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 9 from 7
Calling 1 from 0 // Expecting 7
Calling 2 from 0
Calling 3 from 0
Calling 0 from 0
Calling 3 from 1
Calling 0 from 0 // Expecting 1
Calling 1 from 0
Calling 2 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 3 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 0 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 0 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
Calling 1 from 0
Calling 0 from 0
Calling 1 from 0
Calling 2 from 0
Calling 3 from 0
#include <vector>
#include <omp.h>
#include <iostream>
#include <cstdio>
#include <limits>
#include <cstdint>
#include <cinttypes>
using namespace std;
const size_t kM = 4;
struct Mat
{
int elem[kM];
Mat(const Mat& copy)
{
for (size_t i = 0; i<kM; ++i)
this->elem[i] = copy.elem[i];
}
Mat()
{
for (size_t i = 0; i<kM; ++i)
elem[i] = 0;
}
void do_mat(Mat& m)
{
#pragma omp parallel for
for (int i = 0; i<kM; ++i)
{
printf(" \tCalling %d from %d\n", i, omp_get_thread_num());
elem[i] += m.elem[i];
}
}
};
int main ()
{
const int kN = 10;
vector<Mat> matrices(kN);
Mat m;
#pragma omp parallel for
for (int i = 0; i < kN; i++)
{
int tid = omp_get_thread_num();
printf("Calling %d from %d\n", i, tid);
matrices[i].do_mat(m);
}
return 0;
}
I'm not sure I understand what is that you expected, but the result you get is perfectly expected.
By default, OpenMP nested parallelism is disabled, meaning that any nested parallel
region will create as many teams of 1 thread as there are of threads from the outer level encountering them.
In your case, you outermost parallel
region creates a team of 8 threads. Each of these will reach the innermost parallel
region, and create a second level 1-thread team. Each of these second level thread, in its own team, is ranked 0, hence the printed 0s you have.
With the very same code, compiled with g++ 9.3.0, by setting the 2 environment variables OMP_NUM_THREADS
and OMP_NESTED
, I get the following:
OMP_NUM_THREADS="2,3" OMP_NESTED=true ./a.out
Calling 0 from 0
Calling 5 from 1
Calling 0 from 0
Calling 1 from 0
Calling 2 from 1
Calling 0 from 0
Calling 1 from 0
Calling 3 from 2
Calling 3 from 2
Calling 2 from 1
Calling 6 from 1
Calling 1 from 0
Calling 0 from 0
Calling 1 from 0
Calling 3 from 2
Calling 2 from 1
Calling 2 from 0
Calling 0 from 0
Calling 1 from 0
Calling 2 from 1
Calling 3 from 2
Calling 0 from 0
Calling 1 from 0
Calling 3 from 2
Calling 2 from 1
Calling 3 from 0
Calling 7 from 1
Calling 0 from 0
Calling 3 from 2
Calling 2 from 1
Calling 3 from 2
Calling 0 from 0
Calling 1 from 0
Calling 1 from 0
Calling 2 from 1
Calling 4 from 0
Calling 8 from 1
Calling 0 from 0
Calling 3 from 2
Calling 2 from 1
Calling 2 from 1
Calling 0 from 0
Calling 1 from 0
Calling 3 from 2
Calling 1 from 0
Calling 9 from 1
Calling 2 from 1
Calling 0 from 0
Calling 1 from 0
Calling 3 from 2
Maybe that corresponds better to what you expected to see?