Im trying to run code similar to the following
#include <immintrin.h>
void foo() {
__m128i a = _mm_set_epi8 (0,0,6,5,4,3,2,1,8,7,6,5,4,3,2,1);
__m128i b = _mm_set_epi8 (0,0,0,0,0,0,0,1,8,7,6,5,4,3,2,1);
__mmask16 m = _mm_cmpeq_epi8_mask(a,b); // supposedly requires avx512vl and avx512bw
std::cout<<m<<std::endl;
}
void bar() {
int dataa[8] = {1,0,1,0,1,0,1,0};
__m256i points = _mm256_lddqu_si256((__m256i *)&dataa[0]); // requires just mavx
(void)points;
}
However, I keep running into the error Illegal instruction (core dumped)
I compile the code with
g++ -std=c++11 -march=broadwell -mavx -mavx512vl -mavx512bw tests.cpp
According to Intel's intrinsics documentation, these flags should be sufficient to run both foo
and bar
. However, when either foo
or bar
is run, I get the same error message.
If I remove foo
, however, and compile WITHOUT -mavx512vl
, I can run bar
smoothly.
I already checked that my cpu supports the mno-avx512vl
and mno-avx512bw
flags so it should support mavx512vl
and mavx512bw
right?
What flags must I include to run both functions? Or am I missing something else?
Compile with gcc -march=native
. If you get compile errors, your source tried to use something your CPU doesn't support.
Related: Getting Illegal Instruction while running a basic Avx512 code
I already checked that my cpu supports the mno-avx512vl and mno-avx512bw flags so it should support mavx512vl and mavx512bw right?
That's the opposite of how GCC options work.
-mno-avx512vl
disables -mavx512vl
if any earlier option (like -march=skylake-avx512
or -mavx512vl
on its own) had set it.
-march=broadwell
doesn't enable AVX512 instructions because Broadwell CPUs can't run them natively. So -mno-avx512vl
has exactly zero effect at the end of g++ -std=c++11 -march=broadwell -mavx ...
Many options have long names starting with ‘-f’ or with ‘-W’—for example, -fmove-loop-invariants, -Wformat and so on. Most of these have both positive and negative forms; the negative form of -ffoo is -fno-foo. This manual documents only one of these two forms, whichever one is not the default.
from the GCC manual, intro part of section 3: Invoking GCC 3
(-m
options follow the same convention as -f
and -W
long options.)
This style of foo
vs. no-foo
is not unique to GCC; it's pretty common.
Faulting on _mm256_lddqu_si256
after compiling with -mavx512vl
GCC is dumb and uses an EVEX encoding for the load (probably vmovdqu64
) instead of a shorter VEX encoding. But you told it AVX512VL was available, so this is only an optimization problem, not correctness.
If you did compile the function with only AVX enabled, it would of course only use AVX instructions.