I am doing vectorization using AVX intrinsics, I want to fill constant floats like 1.0
into vector __m256
. So that in one register I got a vector{1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0}
Does anyone knows how to do it?
It is similar to this question constant float with SIMD
But I am using AVX not SSE
See here for the AVX intrinsics load and store operations. You simply need to declare, a float array, an AVX vector __m256
, and then use the appropriate operation to load the float array as an AVX vector.
In this case, the instruction _mm256_load_ps
is what you want.
Update: As mentioned in the comments, the data must be 32 bit aligned. See Intel data alignment documentation for a detailed explanation. I've made the solution code cleaner, as per Peter's comments. With optimisation enabled (-O3
), this produces the same code as Paul's answer (also with optimisation enabled). Without optimisations enabled, however, the number of instructions are the same, but all 8 floating point numbers are stored, rather than a single floating point answer as in Paul's answer.
Here is the modified example:
#include <immintrin.h> // For AVX instructions
#ifdef __GNUC__
#define ALIGN(x) x __attribute__((aligned(32)))
#elif defined(_MSC_VER)
#define ALIGN(x) __declspec(align(32))
#endif
static constexpr ALIGN(float a[8]) = {1.0f,1.0f,1.0f,1.0f,1.0f,1.0f,1.0f,1.0f};
int main() {
// Load the float array into an avx vector
__m256 vect = _mm256_load_ps(a);
}
You can easily check the assembly output with a few compilers by using the Godbolt interactive C++ compiler.