I need to code a lib to manage geometric transforms over points in a 2D space in C. These points will be aggregated in shapes and I want to be able to (auto)vectorize the processing of the full shapes through OpenMP.
The question I'm stuck with is the best way to proceed to declare the points:
typedef __attribute__((aligned(8))) float point_t[2];
or
typedef struct point_t
{
float x, y;
} point_t;
knowing that, later I will use a box type :
typedef __attribute__((aligned(64))) point_t box_t[4];
From a programming perspective, it is more legible to access box[1].y
than box[1][1]
(the y coordinate of the 2nd point of the box rectangle). Now, will the compilers understand that the struct is only a nice handler of an array and vectorize accordingly ?
It will depend on the compiler. The only way to be sure is to check the result.
The Compiler Explorer at godbolt.org is a convenient way to check what a compiler spits out. I wrote a trivial translate
function:
#ifdef USE_XY
#define X(p) ((p).x)
#define Y(p) ((p).y)
typedef struct point_t
{
float x, y;
} point_t;
#else
#define X(p) ((p)[0])
#define Y(p) ((p)[1])
typedef __attribute__((aligned(8))) float point_t[2];
#endif
typedef __attribute__((aligned(64))) point_t box_t[4];
void translate(box_t* box, float dx, float dy) {
#pragma omp simd
for (int i=0; i<4; ++i) {
X((*box)[i]) += dx;
Y((*box)[i]) += dy;
}
}
Compiling with ARM64 gcc 8.2 (results at https://gcc.godbolt.org/z/EEP17Yd7G), we get this for -O2 -fopenmp
and -O2 -fopenmp -DUSE_XY
:
translate:
dup v0.4s, v0.s[0]
ldr q3, [x0]
ldr q2, [x0, 16]
ins v0.s[1], v1.s[0]
ins v0.s[3], v1.s[0]
fadd v3.4s, v3.4s, v0.4s
fadd v0.4s, v2.4s, v0.4s
str q3, [x0]
str q0, [x0, 16]
ret
...and this for -O2
and -O2 -DUSE_XY
:
translate:
add x1, x0, 32
.L2:
ldp s3, s2, [x0]
fadd s3, s3, s0
fadd s2, s2, s1
stp s3, s2, [x0]
add x0, x0, 8
cmp x0, x1
bne .L2
ret
The former uses SIMD instructions, the latter does not. Whether or not -DUSE_XY
is there doesn't make a difference. So, we know that for this exact code and these exact compiler flags, this exact compiler version is capable of doing it. That, of course, does not guarantee that it will succeed for all your code.