Search code examples
assemblyssesimdsse2sse4

How to simulate pcmpgtq on sse2?


PCMPGTQ was introduced in sse4.2, and it provides a greater than signed comparison for 64 bit numbers that yields a mask.

How does one support this functionality on instructions sets predating sse4.2?

Update: This same question applies to ARMv7 with Neon which also lacks a 64-bit comparator. The sister question to this is found here: What is the most efficient way to support CMGT with 64bit signed comparisons on ARMv7a with Neon?


Solution

  • __m128i pcmpgtq_sse2 (__m128i a, __m128i b) {
        __m128i r = _mm_and_si128(_mm_cmpeq_epi32(a, b), _mm_sub_epi64(b, a));
        r = _mm_or_si128(r, _mm_cmpgt_epi32(a, b));
        return _mm_shuffle_epi32(r, _MM_SHUFFLE(3,3,1,1));
    }
    

    We have 32-bit signed comparison intrinsics so split the packed qwords into dwords pairs.

    If the high dword in a is greater than the high dword in b then there is no need to compare the low dwords.

    if (a.hi > b.hi) { r.hi = 0xFFFFFFFF; }
    if (a.hi <= b.hi) { r.hi = 0x00000000; }
    

    If the high dword in a is equal to the high dword in b then a 64-bit subtract will either clear or set all 32 high bits of the result (if the high dwords are equal then they "cancel" each other out, effectively a unsigned compare of the low dwords, placing the result in the high dwords).

    if (a.hi == b.hi) { r = (b - a) & 0xFFFFFFFF00000000; }
    

    Copy the comparison mask in the high 32-bits to the low 32-bits.

    r.lo = r.hi
    

    Updated: Here's the Godbolt