I am a beginner to NEON intrinsics, and I wanted to work with uint8x16_t
and also uint8x16x4_t
.
While working with it I came across a situation, where I wanted to extract a byte from a uint8x16_t
. Being naive to the details I accidentally began extracting bytes from it using the []
operator at runtime. But my compiler CLANG happily compiled the code, gave no errors or warnings and I got the desired output.
I searched through the ARM reference guides and I never seemed to find any reference on using the []
operator on a uint8x16_t
vector, after all it's a 128 bit register and not an array!? (Please correct me if I'm wrong).
Therefore, to bring light to the issue, I tracked the origin of the vector uint8x16_t
in the header file arm_neon.h
and I found this:
typedef __attribute__((neon_vector_type(16))) uint8_t uint8x16_t;
How is this stored in computer memory ?
Why am I able to use the []
operator on it directly, where I should
be using:
uint8_t fetch(uint8x16_t *r, int index) { unsigned char u[16]; vst1q_u8(u, *r); return u[index]; }
instead of:
uint8_t fetch(uint8x16_t *r, int index){ return (*r)[index]; } // This is much faster in performance!
Every help will be greatly appreciated!
Why am I able to use the
[]
operator on it directly
Because gcc / clang define it in terms of GNU C native vectors (https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html), which do have well-defined rules for operators.
ARM's docs probably don't guarantee that []
works, and there are probably some ARM compilers where it won't work.
It's stored in memory (or not, if just in a register or optimized away) the same as any other type. The object-representation has the lowest element at the lowest address. uint8x16_t
objects are like int
objects in most ways, in terms of the compiler being able to decide where to keep them, and optimize them away, etc.