I'm trying to do something relatively simple using NEON vector instructions :
given an uint64x2_t
, I want to swap position of the 64-bit members.
Aka, if this was a simple normal code :
typedef struct {
U64 u[2];
} u64x2;
u64x2 swap(u64x2 in)
{
u64x2 out;
out.u[0] = in.u[1];
out.u[1] = in.u[0];
return out;
}
Surprisingly enough, I can't find an intrinsic for that. There is apparently an assembler instruction for it (VSWP
) but no corresponding intrinsic.
This is weird. It's about as trivial an operation as it can be, so it must be possible. The question is : how ?
edit : for reference, godbolt
outcome using @Jake answer :
https://godbolt.org/z/ueJ6nB .
No vswp
, but vext
works great.
You are right, NEON intrinsics doesn't support the VSWP
instruction.
However, you can resort to the VEXT
instruction instead which is also available in intrinsics.
out = vextq_u64(in, in, 1);
Alternately, you can make use of vcombine
(and pray that the compiler doesn't mess it up):
out = vcombine_U64(vget_high_u64(in), vget_low_u64(in));
But beware, the compilers tend to generate FUBAR machine codes when they see vcombine
and/or vget
.
Stay with the former, that's my advice.