Search code examples
c++visual-studio-2012stdvectorsse2auto-vectorization

Auto-Vectorization in Visual Studio 2012 express on std::vector is not happening


I am having a simple program in which i am having 3 std::vector and using them in for loops. After enabling the compilation flag ON, i am testing whether these loops are optimized or not. But visual studio is showing that loop is not vectorized due to reason 1200. My sample code is as below.

#include <iostream>
#include <vector>
#include <time.h>
int main(char *argv[], int argc)
{
    clock_t t=clock();
    int tempSize=100;
    std::vector<double> tempVec(tempSize);
    std::vector<double> tempVec1(tempSize);
    std::vector<double> tempVec2(tempSize);

    for(int i=0;i<tempSize;i++)
    {
        tempVec1[i] = 20;
        tempVec2[i] = 30;
    }

    for(int i=0,imax=tempSize;i<imax;i++)
        tempVec[i] = tempVec1[i] + tempVec2[i];

    t =clock()-t;          // stop the clock
    std::cout <<"Time in millisecs = " <<  t/double(CLOCKS_PER_SEC) <<      std::endl;
    return 0;
}

And below is the output of this code compilation using option "/Qvec-report:2" enabled.

2> --- Analyzing function: main 2> d:\test\ssetestonvectors\main.cpp(12) : info C5002: loop not vectorized due to reason '1200' 2> d:\test\ssetestonvectors\main.cpp(18) : info C5002: loop not vectorized due to reason '1200'

When i read about the error code 1200 on msdn page: https://msdn.microsoft.com/en-us/library/jj658585.aspx It specifies that error code 1200 is due to "Loop contains loop carried data dependence"

I am unable to understand how this loop is containing that. I am having some sort of code that i need to optimize so that it can use Auto-Vectorization feature of Visual studio so that it can be optimized for SSE2. This code contains vector operations. So i am unable to do that because each time visual studio is showing some error code like this.


Solution

  • I think your problem is that:

        for(int i=0,imax=tempSize;i<imax;i++)
            tempVec[i] = tempVec1[i] + tempVec2[i];
    

    Is actually

        for(int i=0,imax=tempSize;i<imax;i++)
            tempVec.operator[](i) = tempVec1.operator[](i) + tempVec2.operator[](i);
    

    ... and the vectorizer is failing to look insider the function calls. The first fix for that is:

        const double* t1 = &tempVec1.front();
        const double* t2 = &tempVec2.front();
        double *t = &tempVec.front();
        for(int i=0,imax=tempSize;i<imax;i++)
            t[i] = t1[i] + t2[i];
    

    The problem with that, is that the vectoriser can't see that t, t1, and t2 don't overlap. You have to promise the compiler they don't:

        const double* __restrict t1 = &tempVec1.front();
        const double* __restrict t2 = &tempVec2.front();
        double * __restrict t = &tempVec.front();
        for(int i=0,imax=tempSize;i<imax;i++)
            t[i] = t1[i] + t2[i];
    

    Obviously (I hope) use of the __restrict keyword (which is not part of standard C++) means this code will not be portable to other C++ compilers.

    Edit: The OP has clarified that replacing calls to operator[] with call to at produces a different failure message (although that might be because at is more complex).

    If the problem is not the function calls, my next hypothesis is that operator [] boils down to something like return this.__begin[i]; and the vectorizer doesn't know that different std::vectors have non-overlapping memory. If so, the final code block is still the solution.