Is it my imagination, or is a PNOT
instruction missing from SSE and AVX? That is, an instruction which flips every bit in the vector.
If yes, is there a better way of emulating it than PXOR
with a vector of all 1s? Quite annoying since I need to set up a vector of all 1s to use that approach.
For cases such as this it can be instructive to see what a compiler would generate.
E.g. for the following function:
#include <immintrin.h>
__m256i test(const __m256i v)
{
return ~v;
}
both gcc and clang seem to generate much the same code:
test(long long __vector(4)):
vpcmpeqd ymm1, ymm1, ymm1
vpxor ymm0, ymm0, ymm1
ret