Search code examples
c++castingshared-memorysimdavx

Casting structs to add definition to a shared-memory block in a SIMD application


I am building an application that requires the use of a large block of shared memory space of type double. This block needs to be byte aligned to ensure proper loading into SIMD registers. For example

double *ptr_x = (double *)_mm_malloc(sizeof(double) * 40, 32);

Internally, there are several calculations that use the allocated memory (this is where the SIMD processing comes in). It is more convenient to use variable names with appropriate class functions to make the code legible. Rather than performing the calculations and then moving the values to this memory block, I want to use the local variables to make the calculations but have those variables pointing back to the memory.

One way I have tried is to form data structures like:

struct Position{
double xCoord;
double yCoord;
double zCoord;
double zeroPad;
};

struct Velocity{
double xCoord;
double yCoord;
double zCoord;
double zeroPad;
};

and, define pointers to these structs, and reinterpret_cast pointers from the block of memory as follows:

Position *posCar;
Velocity *velCar;

posCar = reinterpret_cast<Position*>(ptr_x + 16);
velCar = reinterpret_cast<Position*>(ptr_x + 20);

Is there a preferred way to performing this mapping? Is this compiler safe? For this case, the structs are always of type double and come in groups of 4 to match the __m256d vector definition.

Appreciate any insight into a more preferred approach or the wisdom of experience in an issue that may crop up.


Solution

  • Is there a preferred way to performing this mapping?

    It’s subjective. C++ books say the preferred one is reinterpret_cast like you are doing. Personally, I think C-style casts like (Position*)( ptr_x + 16 ) is more readable.

    Also if you have these things at sequential addresses, consider defining a larger structure with both position and velocity.

    Is this compiler safe?

    I think the language standard says “undefined behavior”. In reality on AMD64 processors this works fine in all 4 major compilers.

    And one more thing.

    I want to use the local variables to make the calculations but have those variables pointing back to the memory.

    You can, but if these calculations are complicated and involve several steps, consider the performance implications. Memory is generally slow, several orders of magnitude slower than registers.

    For optimal performance, you should do something like that:

    __m256d pos = _mm256_loadu_pd( &posCar->x );
    __m256d vel = _mm256_loadu_pd( &velCar->x );
    // ..update these vectors somehow
    _mm256_storeu_pd( &posCar->x, pos );
    _mm256_storeu_pd( &velCar->x, vel );