Search code examples
c++optimizationg++vectorizationauto-vectorization

Why is GCC not autovectorising this code unless I expicitly specify any of the possible cost model?


I have a little minimal sample algorithm (please ignore if the algorithm itself doesn't make sense and could be changed to be different, its just a contrived sample to demonstrate what I'm seeing).

When I compile it on any recent GCC with appropriate flags it doesn't autovectorise (https://godbolt.org/z/KvhKP9bsE).

If I add any of: -fvect-cost-model=dynamic -fvect-cost-model=cheap -fvect-cost-model=very-cheap -fvect-cost-model=unlimited

Then it does vectorise, but this doesn't make sense to me because -fvect-cost-model=dynamic is the default that should already be implied so I don't understand why setting that would change anything.

Why isn't this autovectorising without this flag and why does this flag change that? Please help me understand!

Compile flags:

-std=c++20 -O3 -fopt-info-all-vec -ffast-math -march=core-avx2

Algorithm:

void foo(const size_t n, float * __restrict__  a, float * __restrict__  b, float  * __restrict__  c) 
{
    float total=0.0f;
    float sum=0.0f;
    float max=0.0f;
    size_t count=0;

    for (size_t i = 0; i < n; ++i) {
        float temp = *b;
        const bool not_zero = temp != 0.0;
        if (i % 4 == 0) {
        sum += *b * not_zero;
        count += not_zero;
        }
        max = std::max(temp, max);
        *b = temp * *c;
        total += *b;
        *a *= *b;
        a += 1;
        b += 1;
        c += 1;
    }

    std::cout << total  << sum << max << count;
}

Solution

  • It turns out I was mistaken and that only 3 of the four cost models vectorise; -fvect-cost-model=very-cheap does not. This happens to be the default on Godbolt which was revealed after using -Q --help=optimizers to check.

    Godbolt for some reason is configured to default to very-cheap as the cost model, this differs from the standard GCC settings as well as the settings on every machine where I've checked.

    I'm not sure why this is, perhaps its because they want to make it more obvious if there are any parts of a loop that still remain scalar... (speculating here though)