I am converting a vectorized code from SSE2 intrinsics to AVX2 intrinsics, and would like to know how to check if a 256i (16-bit) vector contains any element greater than zero or not. Below is the code used in the SSE2:
int check2(__m128i vector1, __m128i vector2)
{
__m128i vcmp = _mm_cmplt_epi16(vector2, vector1);
int cmp = _mm_movemask_epi8(vcmp);
return ((cmp>0) ? 1 : 0) ;
}
I thought that the following code will work, bit it didn't.
int check2(__m256i vector1, __m256i vector2)
{
__m256i vcmp = _mm256_cmpgt_epi16(vector1, vector2);
int cmp = _mm256_movemask_epi8(vcmp);
return ((cmp>0) ? 1 : 0) ;
}
I would be thankful if somebody can advise
I think you just have a trivial bug - your function should be:
int check2(__m256i vector1, __m256i vector2)
{
__m256i vcmp = _mm256_cmpgt_epi16(vector1, vector2);
int cmp = _mm256_movemask_epi8(vcmp);
return cmp != 0;
}
The problem is that _mm256_movemask_epi8
returns 32 bit flags as a signed int, and you were testing this for > 0. Obviously if the MS bit is 1 then this test will fail (since the result will be < 0). You did not see this problem with the SSE version because it only returns 16 bits.