Search code examples
cgccarmintrinsicsneon

Swap halves of a NEON vector with C/gcc intrinsics: no intrinsic for VSWP?


I'm trying to do something relatively simple using NEON vector instructions : given an uint64x2_t, I want to swap position of the 64-bit members.

Aka, if this was a simple normal code :

typedef struct {
    U64 u[2];
} u64x2;


u64x2 swap(u64x2 in)
{
    u64x2 out;
    out.u[0] = in.u[1];
    out.u[1] = in.u[0];
    return out;
}

Surprisingly enough, I can't find an intrinsic for that. There is apparently an assembler instruction for it (VSWP) but no corresponding intrinsic.

This is weird. It's about as trivial an operation as it can be, so it must be possible. The question is : how ?

edit : for reference, godbolt outcome using @Jake answer : https://godbolt.org/z/ueJ6nB . No vswp, but vext works great.


Solution

  • You are right, NEON intrinsics doesn't support the VSWP instruction.

    However, you can resort to the VEXT instruction instead which is also available in intrinsics.

    out = vextq_u64(in, in, 1);


    Alternately, you can make use of vcombine (and pray that the compiler doesn't mess it up):

    out = vcombine_U64(vget_high_u64(in), vget_low_u64(in));

    But beware, the compilers tend to generate FUBAR machine codes when they see vcombine and/or vget.

    Stay with the former, that's my advice.