I'm using Neon Instrinics with clang.
I want to test two uint32x4_t
SIMD values for equality over all lanes.
So not 4 test results, but one single result that tells me if A and B are equal for all lanes.
On Intel AVX, I would use something like:
_mm256_testz_si256( _mm256_xor_si256( A, B ), _mm256_set1_epi64x( -1 ) )
What would be a good way to perform an all-lane equality test for NEON SIMD?
I am assuming I will need intrinsics that operate across lanes. Does ARM Neon have those features?
Try this:
uint16x4_t t = vqmovn_u32(veorq_u32(a, b));
vget_lane_u64(vreinterpret_u64_u16(t), 0) == 0
I expect the compiler to find target-specific optimizations when implementing that test.
I just realised something handy...
If you want to test that all lanes are less than some power of two, you can do this by replacing vqmovn_u32()
with vqshrn_n_u32()
; and I believe this can be extended to being within +/- a power of two (including the lower bound, excluding the upper bound) for signed types using vqrshrn_n_s32()
. For example, you should be able to accept both -1 and 0 in a single test using vqrshrn_n_s32(x, 1)
.