I have a 32 NEON bit mask that I need to unpack to 64 bits like so:
uint32x4_t mask = { 0xFFFFFFFF, 0xFFFFFFFF, 0, 0 };
uint64x2_t mask_lo = { 0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF };
uint64x2_t mask_hi = { 0, 0 };
What I came up with so far is this:
uint64x2_t mask_lo = vmovl_u32(vget_low_u32(mask)); // { 0x00000000FFFFFFFF, 0x00000000FFFFFFFF }
uint64x2_t mask_hi = vmovl_u32(vget_high_u32(mask)); // { 0, 0 }
The problem is, that it is missing the first two bytes of ones. I think it could be solved with vtstq_u64
, but I am working with ARMv7, so it is sadly not avaiable for me.
Thanks!
EDIT: My bit mask elements are either all ones or all zeros!
EDIT 2: I just used vmovl_s32
instead of vmovl_u32
:
mask_lo = vmovl_s32(vget_low_s32(vreinterpretq_s32_u32(mask)));
mask_hi = vmovl_s32(vget_high_s32(vreinterpretq_s32_u32(mask)));
It’s unclear why do you want 0xFFFFFFFF
to unpack into 0xFFFFFFFFFFFFFFFF
If you want sign extend, use reinterpret intrinsics, and vmovl_s32
for the unpacking. This will unpack 0x80000000
into 0xFFFFFFFF80000000
If instead you want to duplicate the uint32_t
lanes, use vzipq_u32
intrinsic with your source vector in both arguments. This will unpack 0x80000000
into 0x8000000080000000