Search code examples
copenmpnested-loops

OpenMP parallelization of nested loops


I am practicing for an exam and I would like to know how to solve correctly the following C program using OpenMP.

The following program searches the biggest element in positions i..N-1 from vector values[] and the results are stored in a second vector (maxs[]). Parallelize this program by adding the appropriate OpenMP statements. Provide two different solutions:

  1. One based on the parallelization of the outer loop
  2. The second one based on the parallelization ofthe inner loop.

Add variables and changes in the code in order to achieve a more efficient solution (the score of the exercise will take into account this).

Provide explanations that describe how the program is parallelized and how threads are going to collaborate in each case, assuming the application will run with T threads (T<<N).

int values[N], sums[N]
int i , j, s;

for (i = 0; i < N ; i ++) {
    s = values[i];

    for (j = i+1; j < N ; j ++)
        if (s < values[j])
            s = value[j];
 
    maxs[i] = s;
} 

For the first case, the answer I would provide is the following:

int values[N], sums[N]
int s;

#pragma omp parallel for schedule(dynamic, 1)
for (int i = 0; i < N ; i ++) {
    s = values[i];

    for (int j = i+1; j < N ; j ++)
        if (s < values[j])
             s = value[j];

    maxs[i] = s;
} 

I am simply parallelizing the outer loop, using a dynamic schedule. Thus, every time a thread is available it will handle an iteration. I do not use an ordered clause because the order of the execution does not really matter in this case. Moreover, I have not used the collapse(2) clause because the statement specifies that I only need to parallelize the outer loop (not the inner loop).

In the second scenario, I would answer the following:

int values[N], sums[N]
int s;

for (int i = 0; i < N ; i ++) {
    s = values[i];
    
    #pragma omp parallel for schedule(dynamic, 1)
    for (int j = i+1; j < N ; j ++)
        if (s < values[j])
             s = value[j];

    maxs[i] = s;
} 

How could I optimize the code in order to obtain the highest grade possible?

Thanks! :)


Solution

  • Ok, this is not strictly following your instructions but maybe it will benefit someone:

    data:  1  1  4  3  3  2  1  4  4  5  2  3  5  1  1  3  4  1  2  1  3
    maxs:  5  5  5  5  5  5  5  5  5  5  5  5  5  4  4  4  4  3  3  3  3
    

    This was generated (minus boilerplate and printing) by:

      int partial_max;
      partial_max=0;
    #pragma omp parallel for reduction(inscan,max:partial_max)
      for ( int i=values.size()-1; i>=0; i-- ) {
        auto v = values[i];
        partial_max = std::max( partial_max, v );
    #   pragma omp scan inclusive(partial_max)
        max_so_far[i] = partial_max;
      }
    

    Note that scan is fairly new and not every compiler may support it. I'm using gcc12.