I have code that calls a template function which defines a lambda and then calls it. When trying to parallelize the lambda using a custom reduction in OpenMP, I get an internal compiler error. I'm using gcc (gcc (GCC) 12.2.1 20230201) inside a zsh shell.
Here is a MWE of the error:
#include <iostream>
template <class T>
T add(std::size_t const &maxs) {
auto step = [&](auto const &maxs) {
T a = T(0);
#pragma omp declare reduction(MyRed:T \
: omp_out += omp_in) \
initializer(omp_priv(omp_orig))
#pragma omp parallel for schedule(dynamic, 10) reduction(MyRed : a)
for (std::size_t s = 0; s <= maxs; ++s) {
a += T(1);
}
return a;
};
return step(maxs);
}
int main() {
auto a = add<double>(100);
std::cout << "a=" << a << std::endl;
}
The compiler stumbles on line 6: #pragma omp declare reduction(MyRed:double
.
It compiles fine if add()
is not templated, the OpenMP directives are not in a Lambda, or the Lambda takes no arguments.
Question: I'm aware this is a compiler bug which should be reported, and I'll do so (edit: I did). But is the error caused by me writing bad code?
If not, is there a way how I can retain the same code structure but get the compiler to understand me? I need to have the structure of main calling a template function calling a parallelized lambda.
Thanks for any help!
Found a workaround that worked for my case, and thought I'd share it in case someone had a similar problem (unlikely, I know...):
Specifying the type of the argument in the lambda function made the compiler understand the code again:
#include <iostream>
template <class T>
T add(std::size_t const &maxs) {
auto step = [&](std::size_t const &maxs) {
T a = T(0);
#pragma omp declare reduction(MyRed:T \
: omp_out += omp_in) \
initializer(omp_priv(omp_orig))
#pragma omp parallel for schedule(dynamic, 10) reduction(MyRed : a)
for (std::size_t s = 0; s <= maxs; ++s) {
a += T(1);
}
return a;
};
return step(maxs);
}
int main() {
auto a = add<double>(100);
std::cout << "a=" << a << std::endl;
}