How can i parallelize this code using openmp: xp, yp, zp, gpx, gpy, and gpz are known 1D vectors.
for (ies = 0; ies < 1000000; ies++){
for (jes = ies+1; jes < 1000000; jes++){
double dxp = xp[ies] - xp[jes];
double dyp = yp[ies] - yp[jes];
double dzp = zp[ies] - zp[jes];
double distance = sqrt( dxp * dxp + dyp * dyp + dzp * dzp );
double gpspec = gpx[ies] * gpx[jes] + gpy[ies] * gpy[jes] + gpz[ies] * gpz[jes];
#pragma omp parallel for
for (kes = 1; kes <= 100; kes++){
double distan = kes * distance;
E1[kes] = E1[kes] + gpspec * sin(distan) / distan;
}
}
}
Here is a possibility (not tested)
#pragma omp parallel for reduction(+: E1) private(jes, kes) schedule(dynamic)
for (ies = 0; ies < 1000000; ies++){
for (jes = ies+1; jes < 1000000; jes++){
double dxp = xp[ies] - xp[jes];
double dyp = yp[ies] - yp[jes];
double dzp = zp[ies] - zp[jes];
double distance = sqrt( dxp * dxp + dyp * dyp + dzp * dzp );
double gpspec = gpx[ies] * gpx[jes] + gpy[ies] * gpy[jes] + gpz[ies] * gpz[jes];
for (kes = 1; kes <= 100; kes++){
double distan = kes * distance;
E1[kes] = E1[kes] + gpspec * sin(distan) / distan;
}
}
}
I've put a schedule(dynamic)
to try to compensate the workload imbalance between threads introduced by the triangular aspect of the index domain ies * jes
that the loops cover.
Also depending on the way E1
is defined, this may or may not be accepted by your compiler. But in any case, if the reduction(+: E1)
isn't accepted, there's always the possibility to do the reduction by hand with a critical
construct.