I try to parallelize the following code
double a = 1.5;
double b = 2.5;
double *arr = ...;
for (i = 0; i < N; ++i)
{
arr[i] = b;
b *= a;
}
I have tried different constellations of private
/ lastprivate
/ firstprivate
e.g.
#pragma omp parallel for shared(arr) firstprivate(a, b)
but I don't get the right result (comparing to the sequential run)
I assume first you want to avoid the obvious solution where the recurence on b
is removed:
#pragma omp parallel for private(c)
for (int i = 0; i < N; ++i) {
c = b * pow(a,(double)i);
arr[i] = c;
}
Because of the dependency, your initial code can not be parallelized like this. You have to determine the initial value of b
for each thread according to the first iteration it handles (this requires a test inside the loop, not very good for the performances) AND assume that a given thread will execute a contiguous range of iterations (probably a schedule(static,M)
clause can do, where M = (N-1)/(nbthreads)+1
)
int nbthreads, M;
#pragma omp parallel firstprivate(b)
{
#pragma omp single
{
nbthreads = omp_get_num_threads()
M = (N-1)/nbthreads + 1
}
bool first=true;
#pragma omp for schedule(static,M)
for (int i = 0; i < N; ++i) {
if (first) {
b *= pow(a,(double)i);
first = false;
}
arr[i] = c;
b *= a;
}
}
The right solution actually consists in manually distributing the iterations to the threads, so that you know in advance what is the first iteration of each thread:
int N=1234, M, nbthreads;
double b=2.5, a=1.5;
#pragma omp parallel
{
#pragma omp single
{
nbthreads = omp_get_num_threads();
M = (N-1)/nbthreads + 1;
}
int it = omp_get_thread_num();
int lb = it*M; // lower bound for the iteration in this thread
int ub = (it+1)*M; // upper bound ...
if (ub > N) ub = N;
b = b * pow(a,(double)lb);
for (int i = lb; i < ub; ++i) {
arr[i] = b;
b *= a;
}
}