I want to optimize a single function with -mavx -mprefer-avx128
. Basically none of the code shall use AVX, except for one of the functions: This one should use AVX128.
I tried these things:
__attribute__((target("avx")))
void f() { ... }
=> seems to use avx2
__attribute__((target("prefer-avx128")))
void f() { ... }
=> does not compile
__attribute__((target("avx")))
__attribute__((optimize("prefer-avx128")))
void f() { ... }
=> does not compile
Maybe someone knows how this can be done?
-mprefer-avx128
, and its modern replacement -mprefer-vector-width=128
, are -m
options, not -f
, so they can only possibly work with target("string")
rather than optimize("string")
attributes.
But actually only some -m
options work as attributes; the GCC manual's list of x86 target attributes is mostly ISA extensions, arch=
and tune=
, but also includes prefer-vector-width=OPT
. There isn't one based on the older option -mprefer-avx128
; probably support for an attribute was added after -mprefer-avx128
was obsoleted in favour of the -mprefer-vector-width
option.
__attribute__((target("avx,prefer-vector-width=128")))
That enables AVX (AVX1 only, not AVX2), and tunes for 128-bit auto-vectorization. Since integer code is easier to auto-vectorize, I actually tested with AVX2:
__attribute__((target("avx2,prefer-vector-width=128")))
unsigned foo(unsigned *arr){
unsigned sum=0;
for(int i=0 ; i<10240; i++) {
sum += arr[i];
}
return sum;
}
__attribute__((target("avx2")))
unsigned bar(unsigned *arr){
unsigned sum=0;
for(int i=0 ; i<10240; i++) {
sum += arr[i];
}
return sum;
}
Compiled with gcc -O3 -mtune=haswell
(Godbolt), the first version uses vpaddd xmm
, the second uses vpaddd ymm
. (tune=haswell sets the normal vector-width preference to 256.)
Terminology: AVX1 supports 256-bit vector width for FP operations like vaddps
.
AVX2 is 256-bit integer operations like vpaddb ymm
, and lane-crossing shuffles with granularity finer than 128-bit like vpermps
/ vpermq
.
__attribute__((target("avx")))
will definitely not use AVX2 instructions if you didn't already enable them on the command line or with an earlier #pragma GCC target