Search code examples
cassemblyrustsimdauto-vectorization

Why one add calcuation in C/RUST has 3 double-precision floating-point add instruments in result ASM?


Simple C code, only one double-precision adding.

void test(double *a, double *b, long n) {
    for (long j = 0; j < n; j++)
    for (long i = 0; i < n; i++) {
        b[i] = b[i] + a[j];
    }
}

Get ASM result in compiler explorer: https://godbolt.org/z/tJ-d39

There are one addpd and two addsd. Both are double-precision related.

Another similar rust code, got even more double-precision adding instruments: https://godbolt.org/z/c49Wuh

pub unsafe fn test(a: &mut [f64], b: &mut [f64], n: usize) {
    for j in 0..n {
        for i in 0..n {
            *b.get_unchecked_mut(i) = *b.get_unchecked_mut(i) + *a.get_unchecked_mut(j);
        }
    }
}

Solution

  • Try compiling without optimizations and you will get only one addsd instruction. The two extra additions in the C code are due to auto-vectorization. In particular if you look at lines 34 and 37 of the disassembly, you will see vector memory accesses. The addpd is the main addition for the vectorized code and the two addsds are there to handle boundary conditions.

    The extra instructions in the Rust code are due to loop unrolling.

    As pointed out by @Peter Cordes, gcc doesn't do loop unrolling by default at -O3 optimization, whereas LLVM (on which the Rust compiler is based) does. Hence the difference between the C code and the Rust code.