Search code examples
cgccavx2auto-vectorization

gcc auto vectorization control flow in loop


In the code below, why is the second loop able to be auto vectorized but the first cannot? How can I modify the code so it does auto vectorize? gcc says:

note: not vectorized: control flow in loop.

I am using gcc 8.2, flags are -O3 -fopt-info-vec-all. I am compiling for x86-64 avx2.

#include <stdlib.h>
#include <math.h>

void foo(const float * x, const float * y, const int * v, float * vec, float * novec, size_t size) {
    size_t i;
    float bar;
    for (i=0 ; i<size ; ++i){
        bar = x[i] - y[i];
        novec[i] = v[i] ? bar : NAN;
    }
    for (i=0 ; i<size ; ++i){
        bar = x[i];
        vec[i] = v[i] ? bar : NAN;
    }
}

Update: This does autovectorize:

for (i=0 ; i<size ; ++i){
    bar = x[i];
    novec[i] = v[i] ? bar : NAN;
    novec[i] -= y[i];
}

I would still like to know why gcc says control flow for the first loop.


Solution

  • clang auto-vectorizes even the first loop, but gcc8.2 doesn't. (https://godbolt.org/z/cnlwuO)

    gcc vectorizes with -ffast-math. Perhaps it's worried about preserving FP exception flag status from the subtraction?

    -fno-trapping-math is sufficient for gcc to auto-vectorize (without the rest of what -ffast-math sets), so apparently it's worried about FP exceptions. (https://godbolt.org/z/804ykV). I think it's being over-cautious, because the C source does compute bar every time, whether or not it's used.

    gcc will auto-vectorize simple FP a[i] = b[i]+c[i] loops without any FP math options.