I'm working with SSE intrinsic functions. I have an __m128i representing an array of 8 signed short (16 bit) values.
Is there a function to get the sign of each element?
EDIT1: something that can be used like this:
short tmpVec[8];
__m128i tmp, sgn;
for (i-0;i<8;i++)
tmp.m128i_i16[i] = tmpVec[i]
sgn = _mm_sign_epi16(tmp);
of course "_mm_sign_epi16" doesn't exist, so that's what I'm looking for.
How slow it is to do it element by element?
EDIT2: desired behaviour: 1 for positive values, 0 for zero, and -1 for negative values.
thanks
You can use min/max operations to get the desired result, e.g.
inline __m128i _mm_sgn_epi16(__m128i v)
{
v = _mm_min_epi16(v, _mm_set1_epi16(1));
v = _mm_max_epi16(v, _mm_set1_epi16(-1));
return v;
}
This is probably a little more efficient than explicitly comparing with zero + shifting + combining results.
Note that there is already an _mm_sign_epi16
intrinsic in SSSE3 (PSIGNW
- see tmmintrin.h
), which behaves somewhat differently, so I changed the name for the required function to _mm_sgn_epi16
. Using _mm_sign_epi16
might be more efficient when SSSE3 is available however, so you could do something like this:
inline __m128i _mm_sgn_epi16(__m128i v)
{
#ifdef __SSSE3__
v = _mm_sign_epi16(_mm_set1_epi16(1), v); // use PSIGNW on SSSE3 and later
#else
v = _mm_min_epi16(v, _mm_set1_epi16(1)); // use PMINSW/PMAXSW on SSE2/SSE3.
v = _mm_max_epi16(v, _mm_set1_epi16(-1));
#endif
return v;
}