Suppose I have a class Matrix5x5
(with suitably overloaded index operators) and I write a method trace
for calculating the sum of its diagonal elements:
double Matrix5x5::trace(void){
double t(0.0);
for(int i(0); i <= 4; ++i){
t += (*this)[i][i];
}
return t;
}
Of course, if I instead wrote:
return (*this)[0][0]+(*this)[1][1]+(*this)[2][2]+(*this)[3][3]+(*this)[4][4];
then I would be sure to avoid the overhead of declaring and incrementing my i
variable. But it feels quite stupid to write out all those terms!
Since my loop has a constexpr
number of terms that happens to be quite small, would a compiler inline it for me?
Yes! GCC does it at optimization level -O1
and above, and clang does it at optimization level -O2
and above.
I tested it using this code:
struct Matrix5x5 {
double values[5][5];
Matrix5x5() : values() {}
double trace() {
double sum = 0.0;
for(int i = 0; i < 5; i++) {
sum += values[i][i];
}
return sum;
}
};
double trace_of(Matrix5x5& m) {
return m.trace();
}
And this is the assembly produced by both gcc and clang:
trace_of(Matrix5x5&):
pxor xmm0, xmm0
addsd xmm0, QWORD PTR [rdi]
addsd xmm0, QWORD PTR [rdi+48]
addsd xmm0, QWORD PTR [rdi+96]
addsd xmm0, QWORD PTR [rdi+144]
addsd xmm0, QWORD PTR [rdi+192]
ret
You can play around with the code, and look at the corresponding assembly here: https://godbolt.org/z/p2uF0E.
If you overload operator[]
, then you have to up the optimization level to -O3
, but the compiler will still do it: https://godbolt.org/z/JInIME