Search code examples
c++parallel-processingdependenciesopenmp

How to remove dependencies in this OpenMP summation exemple


I'm starting with OpenMP now, and need to achieve parallelism in this task with OpenMP:

for (i = 0; i < N; i++){
 index += i*alpha + 4;
 sum += array[index];
}

I intend to use reduction operator to get the correct value of sum variable , but in order to do that I need to solve the dependency of the index variable first. How can I do that?

Thanks for all the help!


Solution

  • The loop cannot be straightforwardly parallelized because of the sequential dependency. However, it can be first split in two parts:

    for (i = 0; i < N; i++) {
     index += i*alpha + 4;
     tmp[i] = index;
    }
    for (i = 0; i < N; i++) {
     sum += array[tmp[i]];
    }
    

    Once split, the first loop exhibits a scan pattern and can be parallelized although the parallel version may not be faster than the sequential one. The second loop is a simple reduction that can be quite easily parallelized.

    Performing a scan patterns with OpenMP is usually a bit tricky. Hopefully, because alpha seems to be a loop constant, all the value of index can easily predicted thanks to basic math properties:

    index_i = index_init + 0*alpha+4 + 1*alpha+4 + 2*alpha+4 + ... + i*alpha+4
            = index_init + (0+1+2+...+i)*alpha + 4*(i+1)
            = index_init + (i*(i+1)/2)*alpha + 4*(i+1)
            = index_init + (i+1)*(i*alpha + 8)/2
    

    Thus we can write the final resulting code:

    #pragma omp parallel for reduction(+:sum)
    for (i = 0; i < N; i++) {
     sum += array[index + (i*(i+1)/2)*alpha + 4*(i+1)];
    }
    

    Moreover, if alpha is an integer and index starts from zero, the array indexing can be made slightly faster:

    #pragma omp parallel for reduction(+:sum)
    for (i = 0; i < N; i++) {
     sum += array[(i+1)*(i*alpha+8)/2];
    }