I have a 2-D grid of n
xn
elements. In one iteration, I'm calculating the value of one element by averaging the values of its neighbors. That is:
for(int i=0;i<n;i++)
for(int j=0;j<n;j++)
grid[i][j] = (grid[i-1][j] + grid[i][j-1] + grid[i+1][j] + grid[i][j+1])/4.0;
And I need to run the above nested loop for iter
number of iterations.
What I need is the following:
iter
iterations will run sequentially, but during every iteration, the value of grid[i][j]
for every i
and j
should be calculated in parallel.In order to do that I have the following ideas and questions:
grid[i][j]
by making only those 4 elements private to the thread. (Basically grid is shared by all threads, but there is a local copy of 4 iteration-specific elements in every thread too.) Is this possible?barrier
be in fact needed for all the threads to finish and then start onto the next iteration?I'm very new to the OpenMP way of thinking and I'm utterly lost in this simple problem. I'd be grateful if somebody could help resolve my confusion.
In practice, you'd want to have (much) fewer threads than grid points, so each thread will be calculating a whole bunch of points (for example, one row). There is a certain overhead associated with starting OpenMP (or any other kind of) threads, and you program will be memory-bound rather than CPU-bound anyway. So starting a thread per grid point will defeat the whole purpose of parallelizing the computation. Hence, your idea #1 is not recommended (I am not quite sure I understood it correctly though; maybe this is not what you were proposing).
I would recommend (also pointed out by others in OP comments) you allocate twice the memory needed to store the grid values and use two pointers that are swapped between iterations: one points to memory holding previous iteration values that are read only, the other one to new iteration values that are write-only. Note that you will only swap the pointers, not actually copy the memory. After your iteration is done, you can copy the final result into desired location.
Yes, you need to synchronize threads between iterations, however in OpenMP this is usually done implicitly simply by opening a parallel region within the iteration loop (there is an implicit barrier at the end of a parallel region):
for (int iter = 0; iter < niter; ++iter)
{
#pragma omp parallel
{
// get range of points for current thread
// loop over thread's points and apply the stencil
}
}
or, using a parallel for
construct:
const int np = n*n;
for (int iter = 0; iter < niter; ++iter)
{
#pragma omp parallel for
for (int ip = 0; ip < np; ++ip)
{
const int i = ip / n;
const int j = ip % n;
// apply the stencil to [i,j]
}
}
The second version will auto-distribute the work evenly between the available threads, which is most likely what you want. In the first you have to do it manually.