Search code examples
armneon

NEON intrinsic for sum of two subparts of a Q register


I have a value in uint16x8_t (a Q-register). If it was asm, I'd add two subparts of the register, e.g. for Q0 it would be vadd_u16(d0, d1) the result that I need. The problem is that I don't see how I can get that using neon intrinsics since there is no conversion from uint16x8_t to uint16x4x2_t to be able to pass low and high parts to vadd_u16.

There are lots of vreinterpret_x_y macros but not a single one converts from uint16x8_t to uint16x4x2_t. Am I missing something, how such operation should be done in arm-neon?


Solution

  • You can use vget_low and vget_high

    The problem is however that the compiler will make a total mess out of it, resulting in a terrible performance hit.

    The built-in Clang in Android Studio is especially bad dealing with those, so are GCC version less than 6.x

    Your only options are updating the toolchain to the most recent one, or sticking to assembly.