Search code examples
c++clangcompiler-optimizationtemplate-meta-programmingloop-unrolling

How can I stop Clang from overexpanding nested loops via templates?


Consider this code:

#include <iostream>
typedef long xint;
template<int N>
struct foz {
    template<int i=0>
    static void foo(xint t) {
        for (int j=0; j<10; ++j) {
            foo<i+1> (t+j);
        }
    }
    template<>
    static void foo<N>(xint t) {
        std::cout << t;
    }

};

int main() {
    foz<8>::foo<0>(0);
}

When compiling in clang++ -O0, it compiles in seconds and then run for 4 seconds.

However, with clang++ -O2, compiling takes a long time and lots of memory. On Compiler Explorer, it can be seen that, with 8 changed to smaller value, it fully expands the loop.

I'm not making it fully no optimization, but to make it not recursive, just like what a nested loop should behave like. Is there anything I should do?


Solution

  • Loop unrolling optimization can be disabled; see on Compiler Explorer . The produced code is non-recursive and expressed in terms of nested loops.

    #pragma nounroll
    for (int j=0; j<10; ++j) {
        foo<i+1> (t+j);
    }
    

    Also you can manually tune unrolling instead of disabling it. Unrolling by 8 generates similar code to the one that is looping 8 times. (Compiler Explorer)

    #pragma unroll 8
    for (int j=0; j<10; ++j) {
        foo<i+1> (t+j);
    }