What I want is extracting a value from vector using a variable scalar index.
Like _mm_extract_epi8
/ _mm256_extract_epi8
but with non-immediate input.
(There are some results in the vector, the one with the given index is found out to be the true result, the rest are discarded)
Especially, if index
is in a GPR, the easiest way is probably to store val
to memory and then movzx
it into another GPR. Sample implementation using C:
uint8_t extract_epu8var(__m256i val, int index) {
union {
__m256i m256;
uint8_t array[32];
} tmp;
tmp.m256 = val;
return tmp.array[index];
Godbolt translation (note that a lot of overhead happens for stack alignment -- if you don't have an aligned temporary storage area, you could just vmovdqu
instead of vmovdqa
): https://godbolt.org/z/Gj6Eadq9r