Search code examples
c++vectorparallel-processingopenmpreduction

Openmp and reduction on std::vector?


I want to make this code parallel:

std::vector<float> res(n,0);
std::vector<float> vals(m);
std::vector<float> indexes(m);
// fill indexes with values in range [0,n)
// fill vals and indexes
for(size_t i=0; i<m; i++){
  res[indexes[i]] += //something using vas[i];
}

In this article it's suggested to use:

#pragma omp parallel for reduction(+:myArray[:6])

In this question the same approach is proposed in the comments section.

I have two questions:

  1. I don't know m at compile time, and from these two examples it seems that's required. Is it so? Or if I can use it for this case, what do I have to replace ? with in the following command #pragma omp parallel for reduction(+:res[:?]) ? m or n?
  2. Is it relevant that the indexes of the for are relative to indexes and vals and not to res, especially considering that reduction is done on the latter one?

However, If so, how can I solve this problem?


Solution

  • It is fairly straight forward to do a user declared reduction for C++ vectors of a specific type:

    #include <algorithm>
    #include <vector>
    
    #pragma omp declare reduction(vec_float_plus : std::vector<float> : \
                                  std::transform(omp_out.begin(), omp_out.end(), omp_in.begin(), omp_out.begin(), std::plus<float>())) \
                        initializer(omp_priv = decltype(omp_orig)(omp_orig.size()))
    
    std::vector<float> res(n,0);
    #pragma omp parallel for reduction(vec_float_plus : res)
    for(size_t i=0; i<m; i++){
        res[...] += ...;
    }
    

    1a) Not knowing m at compile time is not a requirement.

    1b) You cannot use the array section reduction on std::vectors, because they are not arrays (and std::vector::data is not an identifier). If it were possible, you'd have to use n, as this is the number of elements in the array section.

    2) As long as you are only reading indexes and vals, there is no issue.

    Edit: The original initializer caluse was simpler: initializer(omp_priv = omp_orig). However, if the original copy is then not full of zeroes, the result will be wrong. Therefore, I suggest the more complicated initializer which always creates zero-element vectors.