Search code examples
armsimdintrinsicsneon

Testing NEON SIMD registers for equality over all lanes


I'm using Neon Instrinics with clang.

I want to test two uint32x4_t SIMD values for equality over all lanes. So not 4 test results, but one single result that tells me if A and B are equal for all lanes.

On Intel AVX, I would use something like:

_mm256_testz_si256( _mm256_xor_si256( A, B ), _mm256_set1_epi64x( -1 ) )

What would be a good way to perform an all-lane equality test for NEON SIMD?

I am assuming I will need intrinsics that operate across lanes. Does ARM Neon have those features?


Solution

  • Try this:

    uint16x4_t t = vqmovn_u32(veorq_u32(a, b));
    vget_lane_u64(vreinterpret_u64_u16(t), 0) == 0
    

    I expect the compiler to find target-specific optimizations when implementing that test.


    I just realised something handy...

    If you want to test that all lanes are less than some power of two, you can do this by replacing vqmovn_u32() with vqshrn_n_u32(); and I believe this can be extended to being within +/- a power of two (including the lower bound, excluding the upper bound) for signed types using vqrshrn_n_s32(). For example, you should be able to accept both -1 and 0 in a single test using vqrshrn_n_s32(x, 1).