OpenMP with C++: Internal Compiler Error with parallelized Lambda in Template Function

I have code that calls a template function which defines a lambda and then calls it. When trying to parallelize the lambda using a custom reduction in OpenMP, I get an internal compiler error. I'm using gcc (gcc (GCC) 12.2.1 20230201) inside a zsh shell.

Here is a MWE of the error:

#include <iostream>

template <class T>
T add(std::size_t const &maxs) {
  auto step = [&](auto const &maxs) {
    T a = T(0);
#pragma omp declare reduction(MyRed:T                                     \
                              : omp_out += omp_in)                             \
    initializer(omp_priv(omp_orig))
#pragma omp parallel for schedule(dynamic, 10) reduction(MyRed : a)
    for (std::size_t s = 0; s <= maxs; ++s) {
      a += T(1);
    }
    return a;
  };

  return step(maxs);
}

int main() {
  auto a = add<double>(100);

  std::cout << "a=" << a << std::endl;
}

The compiler stumbles on line 6: #pragma omp declare reduction(MyRed:double.

It compiles fine if add() is not templated, the OpenMP directives are not in a Lambda, or the Lambda takes no arguments.

Question: I'm aware this is a compiler bug which should be reported, and I'll do so (edit: I did). But is the error caused by me writing bad code?

If not, is there a way how I can retain the same code structure but get the compiler to understand me? I need to have the structure of main calling a template function calling a parallelized lambda.

Thanks for any help!

Solution

Found a workaround that worked for my case, and thought I'd share it in case someone had a similar problem (unlikely, I know...):

Specifying the type of the argument in the lambda function made the compiler understand the code again:

#include <iostream>

template <class T>
T add(std::size_t const &maxs) {
  auto step = [&](std::size_t const &maxs) {
    T a = T(0);
#pragma omp declare reduction(MyRed:T                                     \
                              : omp_out += omp_in)                             \
    initializer(omp_priv(omp_orig))
#pragma omp parallel for schedule(dynamic, 10) reduction(MyRed : a)
    for (std::size_t s = 0; s <= maxs; ++s) {
      a += T(1);
    }
    return a;
  };

  return step(maxs);
}

int main() {
  auto a = add<double>(100);

  std::cout << "a=" << a << std::endl;
}