Search code examples
c++armsimdintrinsicsneon

ARMv7 NEON: Unpack 32 bit mask to 64 bit mask


I have a 32 NEON bit mask that I need to unpack to 64 bits like so:

uint32x4_t mask = { 0xFFFFFFFF, 0xFFFFFFFF, 0, 0 };
uint64x2_t mask_lo = { 0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF };
uint64x2_t mask_hi = { 0, 0 };

What I came up with so far is this:

uint64x2_t mask_lo = vmovl_u32(vget_low_u32(mask));    // { 0x00000000FFFFFFFF, 0x00000000FFFFFFFF }
uint64x2_t mask_hi = vmovl_u32(vget_high_u32(mask));   // { 0, 0 }

The problem is, that it is missing the first two bytes of ones. I think it could be solved with vtstq_u64, but I am working with ARMv7, so it is sadly not avaiable for me.

Thanks!

EDIT: My bit mask elements are either all ones or all zeros!

EDIT 2: I just used vmovl_s32 instead of vmovl_u32:

mask_lo = vmovl_s32(vget_low_s32(vreinterpretq_s32_u32(mask)));
mask_hi = vmovl_s32(vget_high_s32(vreinterpretq_s32_u32(mask)));

Solution

  • It’s unclear why do you want 0xFFFFFFFF to unpack into 0xFFFFFFFFFFFFFFFF

    If you want sign extend, use reinterpret intrinsics, and vmovl_s32 for the unpacking. This will unpack 0x80000000 into 0xFFFFFFFF80000000

    If instead you want to duplicate the uint32_t lanes, use vzipq_u32 intrinsic with your source vector in both arguments. This will unpack 0x80000000 into 0x8000000080000000