Search code examples
c++csimdarm64armv7

Is this a proper way to extract a byte from a NEON uint8x16_t vector?


I am a beginner to NEON intrinsics, and I wanted to work with uint8x16_t and also uint8x16x4_t. While working with it I came across a situation, where I wanted to extract a byte from a uint8x16_t. Being naive to the details I accidentally began extracting bytes from it using the [] operator at runtime. But my compiler CLANG happily compiled the code, gave no errors or warnings and I got the desired output.

I searched through the ARM reference guides and I never seemed to find any reference on using the [] operator on a uint8x16_t vector, after all it's a 128 bit register and not an array!? (Please correct me if I'm wrong).

Therefore, to bring light to the issue, I tracked the origin of the vector uint8x16_t in the header file arm_neon.h and I found this:

typedef __attribute__((neon_vector_type(16))) uint8_t uint8x16_t;
  • How is this stored in computer memory ?

  • Why am I able to use the [] operator on it directly, where I should be using:

    uint8_t fetch(uint8x16_t *r, int index) { unsigned char u[16]; vst1q_u8(u, *r); return u[index]; }

    instead of:

    uint8_t fetch(uint8x16_t *r, int index){ return (*r)[index]; } // This is much faster in performance!

Every help will be greatly appreciated!


Solution

  • Why am I able to use the [] operator on it directly

    Because gcc / clang define it in terms of GNU C native vectors (https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html), which do have well-defined rules for operators.

    ARM's docs probably don't guarantee that [] works, and there are probably some ARM compilers where it won't work.


    It's stored in memory (or not, if just in a register or optimized away) the same as any other type. The object-representation has the lowest element at the lowest address. uint8x16_t objects are like int objects in most ways, in terms of the compiler being able to decide where to keep them, and optimize them away, etc.