Search code examples
csseintrinsicssign-extension

why _mm_extract_epi16 doesn't get expected result?


I have found the bug in my program caused by misused SSE '_mm_extract_epi16' instruction, like below code:

#include <smmintrin.h>
#include <iostream>

int main(int argc, const char * argv[]) {
    int16_t test_input[8] = {-1, 2, -3, -4, -5, -6, -7, -8};
    __m128i v_input = _mm_load_si128((__m128i *)test_input);
    int32_t extract = (int32_t)(_mm_extract_epi16(v_input, 1));

   return 0;
}

If the extracted value is positive, then I get the right value 2. Oppositely I get the wrong value '65533'. Or I can use the below code get the right value.

#include <smmintrin.h>
#include <iostream>

int main(int argc, const char * argv[]) {
    int16_t test_input[8] = {-1, 2, -3, -4, -5, -6, -7, -8};
    __m128i v_input = _mm_load_si128((__m128i *)test_input);
   int16_t extract = (_mm_extract_epi16(v_input, 1));
   int32_t result = extract;

   return 0;
}

I don't know why it happens.


Solution

  • int _mm_extract_epi16 ( __m128i a, int imm) matches the asm behaviour of the pextrw instruction of zero-extending into a 32-bit register.

    Intel's intrinsics API uses int all over the place even when an unsigned type would be more appropriate.

    If you want to do 16-bit sign-extension on the result,
    use (int16_t)_mm_extract_epi16(v,1). Or assign it to an int16_t variable so the upper bytes of the result are ignored to start with.

    unsigned 65533 = 2's complement -3. This is normal. (216 - 3 = 65533 = 0xfffd)