I am having a simple program in which i am having 3 std::vector and using them in for loops. After enabling the compilation flag ON, i am testing whether these loops are optimized or not. But visual studio is showing that loop is not vectorized due to reason 1200. My sample code is as below.
#include <iostream>
#include <vector>
#include <time.h>
int main(char *argv[], int argc)
{
clock_t t=clock();
int tempSize=100;
std::vector<double> tempVec(tempSize);
std::vector<double> tempVec1(tempSize);
std::vector<double> tempVec2(tempSize);
for(int i=0;i<tempSize;i++)
{
tempVec1[i] = 20;
tempVec2[i] = 30;
}
for(int i=0,imax=tempSize;i<imax;i++)
tempVec[i] = tempVec1[i] + tempVec2[i];
t =clock()-t; // stop the clock
std::cout <<"Time in millisecs = " << t/double(CLOCKS_PER_SEC) << std::endl;
return 0;
}
And below is the output of this code compilation using option "/Qvec-report:2" enabled.
2> --- Analyzing function: main 2> d:\test\ssetestonvectors\main.cpp(12) : info C5002: loop not vectorized due to reason '1200' 2> d:\test\ssetestonvectors\main.cpp(18) : info C5002: loop not vectorized due to reason '1200'
When i read about the error code 1200 on msdn page: https://msdn.microsoft.com/en-us/library/jj658585.aspx It specifies that error code 1200 is due to "Loop contains loop carried data dependence"
I am unable to understand how this loop is containing that. I am having some sort of code that i need to optimize so that it can use Auto-Vectorization feature of Visual studio so that it can be optimized for SSE2. This code contains vector operations. So i am unable to do that because each time visual studio is showing some error code like this.
I think your problem is that:
for(int i=0,imax=tempSize;i<imax;i++)
tempVec[i] = tempVec1[i] + tempVec2[i];
Is actually
for(int i=0,imax=tempSize;i<imax;i++)
tempVec.operator[](i) = tempVec1.operator[](i) + tempVec2.operator[](i);
... and the vectorizer is failing to look insider the function calls. The first fix for that is:
const double* t1 = &tempVec1.front();
const double* t2 = &tempVec2.front();
double *t = &tempVec.front();
for(int i=0,imax=tempSize;i<imax;i++)
t[i] = t1[i] + t2[i];
The problem with that, is that the vectoriser can't see that t, t1, and t2 don't overlap. You have to promise the compiler they don't:
const double* __restrict t1 = &tempVec1.front();
const double* __restrict t2 = &tempVec2.front();
double * __restrict t = &tempVec.front();
for(int i=0,imax=tempSize;i<imax;i++)
t[i] = t1[i] + t2[i];
Obviously (I hope) use of the __restrict
keyword (which is not part of standard C++) means this code will not be portable to other C++ compilers.
Edit: The OP has clarified that replacing calls to operator[]
with call to at
produces a different failure message (although that might be because at
is more complex).
If the problem is not the function calls, my next hypothesis is that operator []
boils down to something like return this.__begin[i];
and the vectorizer doesn't know that different std::vector
s have non-overlapping memory. If so, the final code block is still the solution.