Search code examples
c++performanceintrinsicsarmadillo

Memory alignment of Armadillo vectors vec/fvec


I want to load __m256 directly from Armadillo vector data with .memptr(). Does Armadillo ensure the data memory is 256-bits aligned? If it is then I would just convert the float/double pointer returned by .memptr() to __m256 pointer and skip the _mm256_load_ps(), if it makes sense in terms of performance.


Solution

  • The Armadillo do not seems to talk about this point in the documentation so it is left unspecified. Thus, vector data are likely not ensured to be 32-bytes aligned.

    However, you do not need vector data to be aligned to load them in AVX registers: you can use the unaligned load intrinsic _mm256_loadu_ps. AFAIK, the performance of _mm256_load_ps and _mm256_loadu_ps is about the same on relatively-new x86 processors.