Search code examples
c++for-looptemplatesopenmpconstexpr

omp for loop for constexpr indexes


Suppose I have a function depending on a nontype template argument, an std::size_t, which can take value 0,...,N-1, with N known at compile time. An iteration over all values can be done with a std::sequence or with a template recursion. E.g.:

#include <utility>

template <std::size_t I>
void f() {
//...
}

template <std::size_t... I>
void loop_f_impl(std::index_sequence<I...>) {
   (f<I>(),...);
}

template <std::size_t N>
void loop_f() {
   loop_f_impl(std::make_index_sequence<N>{});
}

int main() {
   constexpr std::size_t N = 4;
   loop_f<N>();
}

How can I convert the "unrolled loop" to a standard for loop that I can parallelize with openmp? Something like that (which obviously does not compile...)

#pragma omp for
for (std::size_t i = 0; i < N; ++i)
     f<i>();

Clearly, if, say, N=3, I could implement that with

#pragma omp for
for (std::size_t i = 0; i < N; ++i)
    switch (i) {
        case 1:
             f<1>();
             break;
        case 2:
             f<2>();
             break;
        case 3:
             f<3>();
             break;
    }

I am interested however in a generic code that works for every N.


Solution

  • omp for loop for constexpr indexes

    You could change f to take I as an argument instead since i in your for loop is not constexpr and can't be used where one is needed.

    void f(std::size_t I) {
    }
    

    Another option, without using omp, could be to launch all f<I...>()s asynchronously:

    #include <future>
    #include <tuple>
    
    template <std::size_t... I>
    void loop_f_impl(std::index_sequence<I...>) {
        std::tuple all{ std::async(std::launch::async, f<I>)... };
    } // here all futures stored in the `tuple` wait until done
    

    An alternative could be to use one of the standard (since C++17) Execution Policies directly from loop_f in a std::for_each. Example:

    #include <algorithm>
    #include <array>
    #include <execution>
    
    template <std::size_t N>
    void loop_f() {
        // C++20 lambda template:
        constexpr auto funcs = []<std::size_t... Is>(std::index_sequence<Is...>) {
            return std::array{f<Is>...};
        }(std::make_index_sequence<N>{});
    
        std::for_each(std::execution::par_unseq, funcs.begin(), funcs.end(),
                      [](auto func) { func(); });
    }
    

    This will make use of Intel® oneAPI Threading Building Blocks or whatever your implementation uses as a backend.