Search code examples
c++templateslambdaopenmp

OpenMP with C++: Internal Compiler Error with parallelized Lambda in Template Function


I have code that calls a template function which defines a lambda and then calls it. When trying to parallelize the lambda using a custom reduction in OpenMP, I get an internal compiler error. I'm using gcc (gcc (GCC) 12.2.1 20230201) inside a zsh shell.

Here is a MWE of the error:

#include <iostream>

template <class T>
T add(std::size_t const &maxs) {
  auto step = [&](auto const &maxs) {
    T a = T(0);
#pragma omp declare reduction(MyRed:T                                     \
                              : omp_out += omp_in)                             \
    initializer(omp_priv(omp_orig))
#pragma omp parallel for schedule(dynamic, 10) reduction(MyRed : a)
    for (std::size_t s = 0; s <= maxs; ++s) {
      a += T(1);
    }
    return a;
  };

  return step(maxs);
}

int main() {
  auto a = add<double>(100);

  std::cout << "a=" << a << std::endl;
}

The compiler stumbles on line 6: #pragma omp declare reduction(MyRed:double.

It compiles fine if add() is not templated, the OpenMP directives are not in a Lambda, or the Lambda takes no arguments.

Question: I'm aware this is a compiler bug which should be reported, and I'll do so (edit: I did). But is the error caused by me writing bad code?

If not, is there a way how I can retain the same code structure but get the compiler to understand me? I need to have the structure of main calling a template function calling a parallelized lambda.

Thanks for any help!


Solution

  • Found a workaround that worked for my case, and thought I'd share it in case someone had a similar problem (unlikely, I know...):

    Specifying the type of the argument in the lambda function made the compiler understand the code again:

    #include <iostream>
    
    template <class T>
    T add(std::size_t const &maxs) {
      auto step = [&](std::size_t const &maxs) {
        T a = T(0);
    #pragma omp declare reduction(MyRed:T                                     \
                                  : omp_out += omp_in)                             \
        initializer(omp_priv(omp_orig))
    #pragma omp parallel for schedule(dynamic, 10) reduction(MyRed : a)
        for (std::size_t s = 0; s <= maxs; ++s) {
          a += T(1);
        }
        return a;
      };
    
      return step(maxs);
    }
    
    int main() {
      auto a = add<double>(100);
    
      std::cout << "a=" << a << std::endl;
    }