I discover programming with vectorized data types for SIMD instructions (with this tutorial). From what I understand, a vector has a fixed size of 16 bytes. This schematic details it well and seems to answer my question:
A set of instructions including the basic operations (but also some more specific ones) is provided.
Nevertheless, just out of curiosity, I would like to know if there was a way to vectorize "custom data", and by that I mean mostly structures. I suppose that if the size of the structure is within the 16 byte range, it is possible, because in the end, the types are only byte sizes, however the instruction set does not seem to allow to operate directly on structures, for example to get a field.
So my question is the following: are we limited to the simple standard C types when vectorizing and SIMD operations? If not, how do we proceed? If yes, are there parallelization methods (other than multithreading) to operate simultaneously on structure vectors / arrays?
_mm_loadu_si128
/ _mm_storeu_si128
are strict-aliasing safe so you can use them on anything. The equivalents for ARM NEON are similar.
If you know the struct layout (which is fixed for a given ABI), you most certainly can load/store data in large chunks from structs or arrays of structs. e.g. Fast interleave 2 double arrays into an array of structs with 2 float and 1 int (loop invariant) member, with SIMD double->float conversion? does packed conversion then shuffle and blend. Another example: Sorting 64-bit structs using AVX?
Most of what you can do with asm is possible in C with intrinsics.
If you want to do different things to each struct member though, then you usually have a problem. e.g. a struct xy { float x,y; };
geometry vector is a poor fit for SIMD. Adding is fine (it's pure vertical), but dot product or rotation requires combining the x and y components of a single geometry vector, horizontally within a SIMD vector. Shuffling costs extra instructions.
This is the Array of Structs problem, and is usually best solved by storing your data as one struct of arrays. So you'd have float x[]
and float y[]
, so you can do a whole SIMD vector of four dot-products at once between x[i + 0..3]
, y[i + 0..3]
and x[j + 0..3]
, y[j + 0..3]
.
See https://stackoverflow.com/tags/sse/info for some links, specifically Slides: SIMD at Insomniac Games (GDC 2015) which also transcribes the text of the talk along with each slide. It has a more gradual introduction to these concepts, with some diagrams.