I have recently been introduced to OpenMP and parallel programming, and am having some trouble using it properly.
I want to implement OpenMP on the following code to make it run faster.
int m = 101;
double e = 10;
double A[m][m], B[m][m];
for (int x=0; x<m; x++){
for (int y=0; y<m; y++){
A[x][y] = 0;
B[x][y] = 1;
}
}
while (e >= 0.0001){
for (int x=0; x<m; x++){
for (int y=0; y<m; y++){
A[x][y] = 0.25*(B[x][y] - 0.2);
}
}
e = 0;
for (int x=0; x<m; x++){
for (int y=0; y<m; y++){
e = e + abs(A[x][y] - B[x][y]);
}
}
}
I would like to run the loops simultaneously rather than one after another to speed up the run time. I believe the following code should work, but I am not sure if I am using OpenMP correctly.
int m = 101;
double e = 10;
double A[m][m], B[m][m];
#pragma omp parallel for private(x,y) shared(A,B) num_threads(2)
for (int x=0; x<m; x++){
for (int y=0; y<m; y++){
A[x][y] = 0;
B[x][y] = 1;
}
}
while (e >= 0.0001){
#pragma omp parallel for private(x,y) shared(A,B) num_threads(2)
for (int x=0; x<m; x++){
for (int y=0; y<m; y++){
A[x][y] = 0.25*(B[x][y] - 0.2);
}
}
// I want to wait for the above loop to finish computing before starting the next
#pragma omp barrier
e = 0;
#pragma omp parallel for private(x,y) shared(A,B,e) num_threads(2)
for (int x=0; x<m; x++){
for (int y=0; y<m; y++){
e = e + abs(A[x][y] - B[x][y]);
}
}
}
Am I using OpenMP effectively and correctly? Also, I am not sure if I can use OpenMP for my while loop as it requires the inner loops to be computed before It can determine if it need to run again.
Assuming that code work, here are some improvements that you can make:
int m = 101;
double e = 10;
double A[m][m], B[m][m];
#pragma omp parallel num_threads(2) shared(A, B)
{
#pragma omp for
for (int x=0; x<m; x++){
for (int y=0; y<m; y++){
A[x][y] = 0;
B[x][y] = 1;
}
}
while (e >= 0.0001){
#pragma omp for
for (int x=0; x<m; x++){
for (int y=0; y<m; y++){
A[x][y] = 0.25*(B[x][y] - 0.2);
}
}
#pragma omp single
e = 0;
#pragma omp for reduction (+:e)
for (int x=0; x<m; x++){
for (int y=0; y<m; y++){
e = e + abs(A[x][y] - B[x][y]);
}
}
}
}
Instead of creating every time a parallel region
, you can improve by only creating one for the entire code. Furthermore, since you are using only 2
threads there are not many load-balancing problems, but if you were to increase the number of threads you may get better performance by using a static
scheduling with chunk = 1
.
You do not need to make the loop variables x
and y
private
, OpenMP will do that for you. In your last nested loops you have e = e + abs(A[x][y] - B[x][y]);
so you probably want for the threads to have the result of adding the 'e', therefore you should use reduction (+:e)
to reduce the variable 'e' across the threads.