Simple C code, only one double-precision adding.
void test(double *a, double *b, long n) {
for (long j = 0; j < n; j++)
for (long i = 0; i < n; i++) {
b[i] = b[i] + a[j];
}
}
Get ASM result in compiler explorer: https://godbolt.org/z/tJ-d39
There are one addpd
and two addsd
. Both are double-precision related.
Another similar rust code, got even more double-precision adding instruments: https://godbolt.org/z/c49Wuh
pub unsafe fn test(a: &mut [f64], b: &mut [f64], n: usize) {
for j in 0..n {
for i in 0..n {
*b.get_unchecked_mut(i) = *b.get_unchecked_mut(i) + *a.get_unchecked_mut(j);
}
}
}
Try compiling without optimizations and you will get only one addsd
instruction. The two extra additions in the C code are due to auto-vectorization. In particular if you look at lines 34 and 37 of the disassembly, you will see vector memory accesses. The addpd
is the main addition for the vectorized code and the two addsd
s are there to handle boundary conditions.
The extra instructions in the Rust code are due to loop unrolling.
As pointed out by @Peter Cordes, gcc doesn't do loop unrolling by default at -O3
optimization, whereas LLVM (on which the Rust compiler is based) does. Hence the difference between the C code and the Rust code.