Disable unrolling of a particular loop in GCC

I have the following 4x4 matrix-vector multiply code:

double const __restrict__ a[16];
double const __restrict__ x[4];
double       __restrict__ y[4];

//#pragma GCC unroll 1 - does not work either
#pragma GCC nounroll
for ( int j = 0; j < 4; ++j )
{
    double const* __restrict__ aj = a + j * 4;
    double const xj = x[j];

    #pragma GCC ivdep
    for ( int i = 0; i < 4; ++i )
    {
        y[i] += aj[i] * xj;
    }
}

I compile with -O3 -mavx flags. The inner loop is vectorized (single FMAD). However, gcc (7.2) keeps unrolling the outer loop 4 times, unless I use -O2 or lower optimization.

Is there a way to override -O3 unrolling of a particular loop?

NB. Similar #pragma nounroll works if I use Intel icc.

Solution

According to the documentation, #pragma GCC unroll 1 is supposed to work, if you place it just so. If it doesn't then you should submit a bug report.

Alternatively, you can use a function attribute to set optimizations, I think:

void myfn () __attribute__((optimize("no-unroll-loops")));