c++c multithreading parallel-processing openmp

Simplest example of parallel run using OpenMP

Consider the following code construct,

int n = 0;

  #pragma omp parallel for collapse(2)
  for (int i = 0; i < 3; i++)
     for(int j = 0; j < 3; j++)
       n++;

Now the above is the simplest possible demo of a similar thing I am trying to implement in a code that takes a large amount of time. So, the main objective is to parallelize the loops such that the runtime reduces.

I am new to OpenMP, just know some commands and that's all. Now in the code I have written above, the final result comes wrong (n = 9 is the right answer). I guess, the loops are trying to access the same memory location simultaneuouly.

Now can someone give a simplest possible solution to this. Please note that I very much a noob regarding this. And any reading material regarding this will also be helpful. Thank you.

Solution

I guess, the loops are trying to access the same memory location simultaneuouly.

TL,DR : Yes, you have a race-condition during the updates of the variable n. One way of solving that is using the OpenMP reduction clause.

I am new to OpenMP, just know some commands and that's all. Now in the code I have written above, the final result comes wrong (n = 9 is the right answer).

The longer answer:

The #pragma omp parallel for will create a parallel region, and to the threads of that region the iterations of the loop that it encloses will be assigned, using the default chunk size, and the default schedule which is typically static. Bear in mind, however, that the default schedule might differ among different concrete implementation of the OpenMP standard.

From the OpenMP 5.1 you can read a more formal description :

The worksharing-loop construct specifies that the iterations of one or more associated loops will be executed in parallel by threads in the team in the context of their implicit tasks. The iterations are distributed across threads that already exist in the team that is executing the parallel region to which the worksharing-loop region binds.

Moreover,

The parallel loop construct is a shortcut for specifying a parallel construct containing a loop construct with one or more associated loops and no other statements.

Or informally, #pragma omp parallel for is a combination of the constructor #pragma omp parallel with #pragma omp for.

Therefore, what is happening in your code is that you have multiple threads modifying concurrently the value of n, to solve this problem you should use OpenMP reduction clause, which from the OpenMP standard one can read:

The reduction clause can be used to perform some forms of recurrence calculations (...) in parallel. For parallel and work-sharing constructs, a private copy of each list item is created, one for each implicit task, as if the private clause had been used. (...) The private copy is then initialized as specified above. At the end of the region for which the reduction clause was specified, the original list item is updated by combining its original value with the final value of each of the private copies, using the combiner of the specified reduction-identifier.

For a more detailed explanation on how the reduction clause works have a look a this SO Thread.

So to solve the race-condition in your code just changed it to:

 int n = 0;

  #pragma omp parallel for collapse(2) reduction(+:n)
  for (int i = 0; i < 3; i++)
     for(int j = 0; j < 3; j++)
        n++;