DirectXMathConvert.inl assertion failure (DirectXMathConvert.inl line 704)

Would someone be kind enough to translate me this piece of code to human readable ?

|704| assert(((uintptr_t)pSource & 0xF) == 0);

Basically this assertion fails in my program, but not 100% of the time (without I recompile anything) which is rather strange.

The complete XMLoadFloat4A function being (line #697 - DirectXMathConvert.inl) :

|697| _Use_decl_annotations_
|698| inline XMVECTOR XM_CALLCONV XMLoadFloat4A
|699| (
|700|     const XMFLOAT4A* pSource
|701| )
|702| {
|703|     assert(pSource);
|704|     assert(((uintptr_t)pSource & 0xF) == 0);
|705| #if defined(_XM_NO_INTRINSICS_)
|706|     XMVECTOR V;
|707|     V.vector4_f32[0] = pSource->x;
|708|     V.vector4_f32[1] = pSource->y;
|709|     V.vector4_f32[2] = pSource->z;
|710|     V.vector4_f32[3] = pSource->w;
|711|     return V;
|712| #elif defined(_XM_ARM_NEON_INTRINSICS_)
|713|     return vld1q_f32_ex( reinterpret_cast<const float*>(pSource), 128 );
|714| #elif defined(_XM_SSE_INTRINSICS_)
|715|     return _mm_load_ps( &pSource->x );
|716| #endif
|717| }

Use cases :

// Convert an XMFLOAT4A to XMVECTOR
XMVECTOR getXMVECTORfromXMFLOAT4A(const XMFLOAT4A& v) {
    return XMLoadFloat4A(&v);
}
XMVECTOR foo = getXMVECTORfromXMFLOAT4A(XMFLOAT4A(1.0, 2.0, 3.0, 1.0));

// Transform XMFLOAT4A with XMMATRIX
XMFLOAT4A XMFloat4Transform(const XMFLOAT4A& v, const XMMATRIX& m) {
    XMVECTOR vec = XMLoadFloat4A(&v);
    XMVECTOR rot = XMVector4Transform(vec, m);
    XMFLOAT4A result;
    XMStoreFloat4A(&result, rot);
    return result;
}
XMMATRIX m = XMMatrixLookAtLH(...);
XMFLOAT4A foo (1.0, 2.0, 3.0, 1.0);
XMFLOAT4A bar = XMFloat4Transform(foo, m);

Why does this asserstion fail ? And why not 100% of the time ?

Solution

As MSDN says XMFLOAT4A "Describes an XMFLOAT4 structure aligned on a 16-byte boundary."

That is what the assert is checking. It is not sufficient for XMLoadFloat4A to have an XMFLOAT4, which only needs to be aligned for ist float members (8 bytes), it needs a XMFLOAT4A which is aligned on 16 byte boundary. This might be for performance reasons or because the intrinsics require it.

Normally XMFLOAT4A is marked with __declspec(align(16)), so the compiler knows that he must align this struct to 16 bytes. In your case you could check the declaration of XMFLOAT4A. I suggest using compiler switch /EP which writes out a file after preprocessor phase and before the compiler starts. That might help you detect if some macro messes with your XMFLOAT4A declaration.

You should also check which exact call fails.

Also: MSDN has an article on __declspec(align(#)). This says that if you pass a an XMFLOAT4A by value to a function then you lose alignment. In your code I only see pass by reference, but this is still an interesting point to keep in mind.