Search code examples
c++compiler-optimizationloop-unrolling

Will a C++ compiler inline a for-loop with a small number of terms?


Suppose I have a class Matrix5x5 (with suitably overloaded index operators) and I write a method trace for calculating the sum of its diagonal elements:

double Matrix5x5::trace(void){
    double t(0.0);
    for(int i(0); i <= 4; ++i){
        t += (*this)[i][i];
    }
    return t;
}

Of course, if I instead wrote:

return (*this)[0][0]+(*this)[1][1]+(*this)[2][2]+(*this)[3][3]+(*this)[4][4];

then I would be sure to avoid the overhead of declaring and incrementing my i variable. But it feels quite stupid to write out all those terms!

Since my loop has a constexpr number of terms that happens to be quite small, would a compiler inline it for me?


Solution

  • Yes! GCC does it at optimization level -O1 and above, and clang does it at optimization level -O2 and above.

    I tested it using this code:

    struct Matrix5x5 {
        double values[5][5];
        Matrix5x5() : values() {}
    
        double trace() {
            double sum = 0.0;
            for(int i = 0; i < 5; i++) {
                sum += values[i][i]; 
            }
            return sum; 
        }
    };
    
    double trace_of(Matrix5x5& m) {
        return m.trace(); 
    }
    

    And this is the assembly produced by both gcc and clang:

    trace_of(Matrix5x5&):
        pxor    xmm0, xmm0
        addsd   xmm0, QWORD PTR [rdi]
        addsd   xmm0, QWORD PTR [rdi+48]
        addsd   xmm0, QWORD PTR [rdi+96]
        addsd   xmm0, QWORD PTR [rdi+144]
        addsd   xmm0, QWORD PTR [rdi+192]
        ret
    

    You can play around with the code, and look at the corresponding assembly here: https://godbolt.org/z/p2uF0E.

    If you overload operator[], then you have to up the optimization level to -O3, but the compiler will still do it: https://godbolt.org/z/JInIME