Search code examples
c++multithreadingparallel-processingopenmpopenmpi

how to use parallelize two serial for loops such that the work of the two for loops are distributed over the thread


I have written the below code to parallelize two 'for' loops.

#include <iostream>
#include <omp.h>
#define SIZE 100

    int main()
    {
        int arr[SIZE];
        int sum = 0;    
        int i, tid, numt, prod;
        double t1, t2;
        for (i = 0; i < SIZE; i++)
            arr[i] = 0;     
    
        t1 = omp_get_wtime();   
    
    #pragma omp parallel private(tid, prod)
        {       
            tid = omp_get_thread_num();
            numt = omp_get_num_threads();
            std::cout << "Tid: " << tid << " Thread: " << numt << std::endl;
    #pragma omp for reduction(+: sum) 
            for (i = 0; i < 50; i++) {
                prod = arr[i]+1;
                sum += prod;
            }
                
    #pragma omp for reduction(+: sum) 
            for (i = 50; i < SIZE; i++) {
                prod = arr[i]+1;
                sum += prod;
            }                                   
    
        }
    
        t2 = omp_get_wtime();
        std::cout << "Time taken: " << (t2 - t1) << ", Parallel sum: " << sum << std::endl;
    
        return 0;
    }

In this case the execution of 1st 'for' loop is done in parallel by all the threads and the result is accumulated in sum variable. After the execution of the 1st 'for' loop is done, threads start executing the 2nd 'for' loop in parallel and the result is accumulated in sum variable. In this case clearly the execution of the 2nd 'for' loop waits for the execution of the 1st 'for' loop to get over.

I want to do the processing of the two 'for' loop simultaneously over threads. How can I do that? Is there any other way I can write this code more efficiently. Ignore the dummy work that I am doing inside the 'for' loop.


Solution

  • You can declare the loops nowait and move the reduction to the end of the parallel section. Something like this:

    #   pragma omp parallel private(tid, prod) reduction(+: sum) 
        {       
    #           pragma omp for nowait 
                for (i = 0; i < 50; i++) {
                    prod = arr[i]+1;
                    sum += prod;
                }    
    #           pragma omp for nowait 
                for (i = 50; i < SIZE; i++) {
                    prod = arr[i]+1;
                    sum += prod;
                }
        }