Search code examples
c++vectoravxavx2avx512

How to understand this AVX addition of two _m256i variables?


here is the code:

#include <immintrin.h>
#include <stdio.h>
#include <memory>
#include <iostream>

__m256i foo();
__m256i foo2();

int main() {

    __m256i vec1 = foo();
    __m256i vec2 = foo2();

    __m256i result = _mm256_add_epi32(vec1, vec2);


    /* Display the elements of the result vector */
    int32_t* res = (int32_t*)&vec1;
    std::cout << res[0] << std::endl;
    std::cout << res[1] << std::endl;
    std::cout << res[2] << std::endl;
    std::cout << res[3] << std::endl;
    std::cout << res[4] << std::endl;
    std::cout << res[5] << std::endl;
    std::cout << res[6] << std::endl;
    std::cout << res[7] << std::endl;
    system("pause");
    return 0;
}

__m256i foo() {
    __m256i v = { 1, 2, 3, 4, 5, 6, 7,8 };
    return v;
}

__m256i foo2() {
    __m256i w = { 1, 2, 3, 4, 5, 6, 7,8 };
    return w;
}

I have delved deep and digged hard on the internet to try to learn how this AVX stuff works, but still don't fully get it. Shouldn't the above code display this:

2, 4, 6, 8, 10, 12, 14, 18 ? since I am adding adding the two vectors of eight uints together?

the output I get in the console when running this is this:

67305985 134678021 0 0 0 0 0 0

How could this be? I am an experienced programmer with c#, and am trying to learn c++ now. Thanks in advance for any response/explanation!


Solution

  • In order to initialize a __m256i register, you should use the proper function.
    For initializing with 32 bit values, you can use _mm256_set_epi32.
    Note that the values you pass to it should be in the opposite order than you expect (or alternatively, use mm256_setr_epi32 which accepts them in the reverse order).

    Therefore your foo and foo2 should be:

    __m256i foo() {
        __m256i v = _mm256_set_epi32(8, 7, 6, 5, 4, 3, 2, 1);
        return v;
    }
    
    __m256i foo2() {
        __m256i w = _mm256_set_epi32(8, 7, 6, 5, 4, 3, 2, 1);
        return w;
    }
    

    Another issue in your code is probably a typo - you initialiazed int32_t* res from vec1 instead of from result.

    Live demo.

    Note:
    Accessing the __m256i via a pointer to int32_t like you do for printing violates the strict aliasing rule.
    As you can see in the live demo above MSVC supports it (-fno-strict-aliasing), but a proper access would require to copy the data into a buffer of int32_ts.
    You can see more info in this post: print a __m128i variable.