Search code examples
csimdintrinsicsneon

How can I convert u8 mask to u32 mask with ARM NEON intrinsic?


There is a uint8x8_t mask, obtained from intrinsics like vcgt_u8(), with values like:

0, 0, 0, 0,255, 0, 255, 255 

I would like to convert this mask to two uint32x4_t type masks. It seems vmovl_u8() and vmovl_u16() will still keep 255 instead of 65535 and 4294967295. How can I do this conversion?


Solution

  • A signed widen operation like vmovl_s will convert an all-ones pattern like 255 into 65535 and so on, so you need to vreinterpret your unsigned vector to signed, and back:

        uint8x8_t v = ...;
        int16x8_t i = vmovl_s8(vreinterpret_s8_u8(v));
        uint32x4_t low = vreinterpretq_u32_s32(vmovl_s16(vget_low_s16(i)));
        uint32x4_t high = vreinterpretq_u32_s32(vmovl_s16(vget_high_s16(i)));