Search code examples
cx86ssesimdsign

SSE intrinsic over int16[8] to extract the sign of each element


I'm working with SSE intrinsic functions. I have an __m128i representing an array of 8 signed short (16 bit) values.

Is there a function to get the sign of each element?

EDIT1: something that can be used like this:

short tmpVec[8];
__m128i tmp, sgn;

for (i-0;i<8;i++)
    tmp.m128i_i16[i] = tmpVec[i]

sgn = _mm_sign_epi16(tmp);

of course "_mm_sign_epi16" doesn't exist, so that's what I'm looking for.

How slow it is to do it element by element?

EDIT2: desired behaviour: 1 for positive values, 0 for zero, and -1 for negative values.

thanks


Solution

  • You can use min/max operations to get the desired result, e.g.

    inline __m128i _mm_sgn_epi16(__m128i v)
    {
        v = _mm_min_epi16(v, _mm_set1_epi16(1));
        v = _mm_max_epi16(v, _mm_set1_epi16(-1));
        return v;
    }
    

    This is probably a little more efficient than explicitly comparing with zero + shifting + combining results.

    Note that there is already an _mm_sign_epi16 intrinsic in SSSE3 (PSIGNW - see tmmintrin.h), which behaves somewhat differently, so I changed the name for the required function to _mm_sgn_epi16. Using _mm_sign_epi16 might be more efficient when SSSE3 is available however, so you could do something like this:

    inline __m128i _mm_sgn_epi16(__m128i v)
    {
    #ifdef __SSSE3__
        v = _mm_sign_epi16(_mm_set1_epi16(1), v); // use PSIGNW on SSSE3 and later
    #else
        v = _mm_min_epi16(v, _mm_set1_epi16(1));  // use PMINSW/PMAXSW on SSE2/SSE3.
        v = _mm_max_epi16(v, _mm_set1_epi16(-1));
    #endif
        return v;
    }