Search code examples
c

C struct or array to design a geometric library


I need to code a lib to manage geometric transforms over points in a 2D space in C. These points will be aggregated in shapes and I want to be able to (auto)vectorize the processing of the full shapes through OpenMP.

The question I'm stuck with is the best way to proceed to declare the points:

typedef __attribute__((aligned(8))) float point_t[2];

or

typedef struct point_t
{
  float x, y;
} point_t;

knowing that, later I will use a box type :

typedef __attribute__((aligned(64))) point_t box_t[4];

From a programming perspective, it is more legible to access box[1].y than box[1][1] (the y coordinate of the 2nd point of the box rectangle). Now, will the compilers understand that the struct is only a nice handler of an array and vectorize accordingly ?


Solution

  • It will depend on the compiler. The only way to be sure is to check the result.

    The Compiler Explorer at godbolt.org is a convenient way to check what a compiler spits out. I wrote a trivial translate function:

    #ifdef USE_XY
    #define X(p) ((p).x)
    #define Y(p) ((p).y)
    typedef struct point_t
    {
      float x, y;
    } point_t;
    #else
    #define X(p) ((p)[0])
    #define Y(p) ((p)[1])
    typedef __attribute__((aligned(8))) float point_t[2];
    #endif
    
    typedef __attribute__((aligned(64))) point_t box_t[4];
    
    void translate(box_t* box, float dx, float dy) {
        #pragma omp simd
        for (int i=0; i<4; ++i) {
            X((*box)[i]) += dx;
            Y((*box)[i]) += dy;
        }
    }
    

    Compiling with ARM64 gcc 8.2 (results at https://gcc.godbolt.org/z/EEP17Yd7G), we get this for -O2 -fopenmp and -O2 -fopenmp -DUSE_XY:

    translate:
            dup     v0.4s, v0.s[0]
            ldr     q3, [x0]
            ldr     q2, [x0, 16]
            ins     v0.s[1], v1.s[0]
            ins     v0.s[3], v1.s[0]
            fadd    v3.4s, v3.4s, v0.4s
            fadd    v0.4s, v2.4s, v0.4s
            str     q3, [x0]
            str     q0, [x0, 16]
            ret
    

    ...and this for -O2 and -O2 -DUSE_XY:

    translate:
            add     x1, x0, 32
    .L2:
            ldp     s3, s2, [x0]
            fadd    s3, s3, s0
            fadd    s2, s2, s1
            stp     s3, s2, [x0]
            add     x0, x0, 8
            cmp     x0, x1
            bne     .L2
            ret
    

    The former uses SIMD instructions, the latter does not. Whether or not -DUSE_XY is there doesn't make a difference. So, we know that for this exact code and these exact compiler flags, this exact compiler version is capable of doing it. That, of course, does not guarantee that it will succeed for all your code.