Search code examples
assemblyarmsimdwebassemblyneon

What is the most efficient way to support CMGT with 64bit signed comparisons on ARMv7a with Neon?


This question was originally posed for SSE2 here. Since every single algorithm overlapped with ARMv7a+NEON's support for the same operations, the question was updated to include the ARMv7+NEON versions. At the request of a commenter, this question is asked here to show that it is indeed a separate topic and to provide alternative solutions that might be more practical for ARMv7+NEON. The net purpose of these questions is to find ideal implementations for consideration into WebAssembly SIMD.


Solution

  • Signed 64-bit saturating subtract.

    Assuming my tests using _mm_subs_epi16 are correct and translate to 1:1 to NEON...

    uint64x2_t pcmpgtq_armv7 (int64x2_t a, int64x2_t b) {
        return vreinterpretq_u64_s64(vshrq_n_s64(vqsubq_s64(b, a), 63));
    }
    

    Would certainly seem to be the fastest achievable way to emulate pcmpgtq.


    The free chapter of Hacker's Delight gives the following formulas:

    // return (a > b) ? -1LL : 0LL; 
    int64_t cmpgt(int64_t a, int64_t b) {
        return ((b & ~a) | ((b - a) & ~(b ^ a))) >> 63; 
    }
    
    int64_t cmpgt(int64_t a, int64_t b) {
        return ((b - a) ^ ((b ^ a) & ((b - a) ^ b))) >> 63;
    }