Search code examples
cmultithreadingfor-loopparallel-processingopenmp

How to use OpenMP on nested for loops in a while loop?


I have recently been introduced to OpenMP and parallel programming, and am having some trouble using it properly.

I want to implement OpenMP on the following code to make it run faster.

int m = 101;
double e = 10;

double A[m][m], B[m][m];
for (int x=0; x<m; x++){
    for (int y=0; y<m; y++){
        A[x][y] = 0;
        B[x][y] = 1;
    }
}

while (e >= 0.0001){
    for (int x=0; x<m; x++){
        for (int y=0; y<m; y++){
            A[x][y] = 0.25*(B[x][y] - 0.2);
        }
    }
    e = 0;
    for (int x=0; x<m; x++){
        for (int y=0; y<m; y++){
            e = e + abs(A[x][y] - B[x][y]);
        }
    }    
}

I would like to run the loops simultaneously rather than one after another to speed up the run time. I believe the following code should work, but I am not sure if I am using OpenMP correctly.

int m = 101;
double e = 10;

double A[m][m], B[m][m];
#pragma omp parallel for private(x,y) shared(A,B) num_threads(2)
for (int x=0; x<m; x++){
    for (int y=0; y<m; y++){
        A[x][y] = 0;
        B[x][y] = 1;
    }
}

while (e >= 0.0001){
    #pragma omp parallel for private(x,y) shared(A,B) num_threads(2)
    for (int x=0; x<m; x++){
        for (int y=0; y<m; y++){
            A[x][y] = 0.25*(B[x][y] - 0.2);
        }
    }
    // I want to wait for the above loop to finish computing before starting the next
    #pragma omp barrier  
    e = 0;
    #pragma omp parallel for private(x,y) shared(A,B,e) num_threads(2)
    for (int x=0; x<m; x++){
        for (int y=0; y<m; y++){
            e = e + abs(A[x][y] - B[x][y]);
        }
    }    
}

Am I using OpenMP effectively and correctly? Also, I am not sure if I can use OpenMP for my while loop as it requires the inner loops to be computed before It can determine if it need to run again.


Solution

  • Assuming that code work, here are some improvements that you can make:

    int m = 101;
    double e = 10;
    
    double A[m][m], B[m][m];
    
    #pragma omp parallel num_threads(2) shared(A, B)
    {
    
        #pragma omp for
        for (int x=0; x<m; x++){
            for (int y=0; y<m; y++){
                A[x][y] = 0;
                B[x][y] = 1;
           }
        }
    
       while (e >= 0.0001){
        #pragma omp for
        for (int x=0; x<m; x++){
            for (int y=0; y<m; y++){
                A[x][y] = 0.25*(B[x][y] - 0.2);
            }
        }
        
        #pragma omp single
        e = 0;
    
        #pragma omp for reduction (+:e)
        for (int x=0; x<m; x++){
            for (int y=0; y<m; y++){
                e = e + abs(A[x][y] - B[x][y]);
            }
        }    
      }
    }
    

    Instead of creating every time a parallel region, you can improve by only creating one for the entire code. Furthermore, since you are using only 2 threads there are not many load-balancing problems, but if you were to increase the number of threads you may get better performance by using a static scheduling with chunk = 1.

    You do not need to make the loop variables x and y private, OpenMP will do that for you. In your last nested loops you have e = e + abs(A[x][y] - B[x][y]); so you probably want for the threads to have the result of adding the 'e', therefore you should use reduction (+:e) to reduce the variable 'e' across the threads.