I have the following serial code that I would like to make parallel. I understand when using the collapse clause for nested loops, it's important to not have code before and after the for(i) loop since is not allowed. Then how do I parallel a nested for loop with if statements like this:
void foo2D(double Lx, double Ly, double KX[], double KY[], double ksqu[], double ninvksqu[], int FourierMeshType){
// Make a variable to account for way wavenumbers are set in FFTW3.
int k_counter = 0;
// kx corresponds to modes [0:n/2-1 , -n/2:-1]. This is why there is an additional step, just due to FFTW3's structuring
#pragma omp parallel for collapse(2)
for(int i = 0; i < nx ; i++){
for(int j = 0; j < nyk; j++){
if (i < nx/2){ // i<nx/2 --> 0 to 127
KX[j + nyk*i] =2.*M_PI*i/Lx; //K = 2pi/L* [0:nx/2-1 , -n/2:-1]' : creates a column vector with nx/2 elements starting from 0 ( from 0 to 127 then -128 to -1) 256/2 is 128
}
if( i >= nx/2){ // i >= nx/2 --> from -128 to -1
KX[j + nyk*i] = 2.* M_PI * (-i + 2.*k_counter) / Lx;
}
}
if( i >= nx/2){ // error here
k_counter++;
}
}
}
Where nx, ny, nyk
defined as:
static const int nx = 256;
static const int ny = 256;
static const int nyk = ny/2 + 1;
Is there a better way to rewrite this?
As pointed out in the comments by 1201ProgramAlarm, you can get rid of the error by eliminating the if
branch that exists between the two loops:
#pragma omp parallel for collapse(2)
for(int i = 0; i < nx ; i++){
for(int j = 0; j < nyk; j++){
...
}
if( i >= nx/2){ <== remove this
k_counter++;
}
}
This :
if( i >= nx/2){ <== remove this
k_counter++;
}
can be statically calculated beforehand. So your code would look like the following:
void makeFourierMesh2D(double Lx, double Ly, double KX[], double KY[], double ksqu[], double ninvksqu[], int FourierMeshType){
#pragma omp parallel for collapse(2)
for(int i = 0; i < nx ; i++){
for(int j = 0; j < nyk; j++){
if (i < nx/2){
KX[j + nyk*i] =2.*M_PI*i/Lx;
}
if( i >= nx/2){
int k_counter = i - nx/2;
KX[j + nyk*i] = 2.* M_PI * (-i + 2.*k_counter) / Lx;
}
}
}
}
You can then simplify (-i + 2.*k_counter)
to (-i + 2 * ( i - nx/2))
and then (-i + 2*i - nx)
, which is i - nx
.
To further remove the other loops you can just compute each branch in a separate pair of loops as follows:
#pragma omp parallel for collapse(2)
for(int i = 0; i < nx/2 ; i++){
for(int j = 0; j < nyk; j++){
KX[j + nyk*i] =2.*M_PI*i/Lx;
}
}
#pragma omp parallel for collapse(2)
for(int i = nx/2; i < nx ; i++){
for(int j = 0; j < nyk; j++){
KX[j + nyk*i] = 2.* M_PI * ((i - nx) / Lx);
}
}
as previously you can also simplify (-i + 2.*k_counter)
into i - nx
.