Search code examples
intelssesimdintrinsics

Is there a Intel SIMD comparison function that returns 0 or 1 instead of 0 or 0xFFFFFFFF?


I'm currently using the intel SIMD function: _mm_cmplt_ps( V1, V2 ). The function returns a vector containing the results of each component test. Based on if V1 components are less than V2 components, example:

XMVECTOR Result;

Result.x = (V1.x < V2.x) ? 0xFFFFFFFF : 0;
Result.y = (V1.y < V2.y) ? 0xFFFFFFFF : 0;
Result.z = (V1.z < V2.z) ? 0xFFFFFFFF : 0;
Result.w = (V1.w < V2.w) ? 0xFFFFFFFF : 0;

return Result;

However is there a function like this that returns 1 or 0 instead? A function that uses SIMD and no workarounds because it is supposed to be optimized + vectorized.


Solution

  • You can write that function yourself. It’s only 2 instructions:

    // 1.0 for lanes where a < b, zero otherwise
    inline __m128 compareLessThan_01( __m128 a, __m128 b )
    {
        const __m128 cmp = _mm_cmplt_ps( a, b );
        return _mm_and_ps( cmp, _mm_set1_ps( 1.0f ) );
    }
    

    Here’s more generic version which returns either of the 2 values. It requires SSE 4.1 which is almost universally available by now with 97.94% of users, if you have to support SSE2-only, emulate with _mm_and_ps, _mm_andnot_ps, and _mm_or_ps.

    // y for lanes where a < b, x otherwise
    inline __m128 compareLessThan_xy( __m128 a, __m128 b, float x, float y )
    {
        const __m128 cmp = _mm_cmplt_ps( a, b );
        return _mm_blendv_ps( _mm_set1_ps( x ), _mm_set1_ps( y ), cmp );
    }