This question was originally posed for SSE2 here. Since every single algorithm overlapped with ARMv7a+NEON's support for the same operations, the question was updated to include the ARMv7+NEON versions. At the request of a commenter, this question is asked here to show that it is indeed a separate topic and to provide alternative solutions that might be more practical for ARMv7+NEON. The net purpose of these questions is to find ideal implementations for consideration into WebAssembly SIMD.
Signed 64-bit saturating subtract.
Assuming my tests using _mm_subs_epi16
are correct and translate to 1:1 to NEON...
uint64x2_t pcmpgtq_armv7 (int64x2_t a, int64x2_t b) {
return vreinterpretq_u64_s64(vshrq_n_s64(vqsubq_s64(b, a), 63));
}
Would certainly seem to be the fastest achievable way to emulate pcmpgtq
.
The free chapter of Hacker's Delight gives the following formulas:
// return (a > b) ? -1LL : 0LL;
int64_t cmpgt(int64_t a, int64_t b) {
return ((b & ~a) | ((b - a) & ~(b ^ a))) >> 63;
}
int64_t cmpgt(int64_t a, int64_t b) {
return ((b - a) ^ ((b ^ a) & ((b - a) ^ b))) >> 63;
}