Search code examples
carmsimdneonintrinsics

NEON simple vector assignment intrinsic?


Having r1,r3 and r4 of type uint32x4_t loaded into NEON registers I have the following code:

r3 = veorq_u32(r0,r3);   
r4 = r1;    
r1 = vandq_u32(r1,r3);   
r4 = veorq_u32(r4,r2);   
r1 = veorq_u32(r1,r0);

And I was just wondering whether GCC actually translates r4 = r1 into the vmov instruction. Looking at the disassembled code I wasn't surprised that it didn't. (moreover I can't figure out what the generated assembly code actually does)

Skimming through ARM's NEON intrinsics reference I couldn't find any simple vector->vector assignment intrinsic.

What's the easiest way to achieve this? I'm not sure how an inlined assembly code would look like since I don't know in which registers were r1 and r4 assigned by vld1q_u32. I don't need an actual swap, just assignment.


Solution

  • C has a concept of an abstract machine. Assignments and other operations are described in terms of this abstract machine. The assignment r4 = r1; says to assign r4 the value of r1 in the abstract machine.

    When the compiler generates instructions for a program, it generally does not exactly mimic everything that occurs in the abstract machine. It translates the operations that occur in the abstract machine into processor instructions that get the same results. The compiler will skip things like move instructions if it can figure out that it can get the same results without them.

    In particular, the compiler might not keep r1 in the same place every time. It might load it from memory into some register R7 the first time you need it. But then it might implement your statement r1 = vandq_u32(r1,r3); by putting the result in R8 while keeping the original value of r1 in R7. Then, when you later have r4 = veorq_u32(r4,r2);, the compiler can use the value in R7, because it still contains that value that r4 would have (from the r4 = r1; statement) in the abstract machine.

    Even if you explicitly wrote a vmov intrinsic, the compiler might not issue an instruction for it, as long as it issues instructions that get the same result in the end.