I am trying out the Auto-Vectorizer mode of Visual Studio 2013 on x86_64, and I am a bit surprised with the following. Consider the naive code:
static void rescale( double * __restrict out, unsigned short * __restrict in, size_t n, const double intercept, const double slope )
{
for( size_t i = 0; i < n; ++i )
out[i] = slope * in[i] + intercept;
}
Visual Studio returns that it is failing on such naive example with:
--- Analyzing function: rescale
c:\users\malat\autovec\vec.c(18) : info C5012: loop not parallelized due to reason '1007'
Where compilation line is (I am only interested in SSE2 for now):
cl vec.c /O2 /Qpar /Qpar-report:2
Looking at the documentation:
Leads to:
Which reads as:
The loop induction variable or the loop bounds are not signed 32-bit numbers (int or long). Resolve this by changing the type of the induction variable.
Is there a way to rewrite this loop so that the Auto-Vectorizer mode is triggered properly ?
I failed to rewrite the code using a simple approach:
static void rescale( double * __restrict out, unsigned short * __restrict in, size_t n, const double intercept, const double slope )
{
const long first = (long)n;
const long secnd = n > LONG_MAX ? n - LONG_MAX : 0;
for( long i = 0; i < first; ++i )
out[i] = slope * in[i] + intercept;
for( long i = 0; i < secnd; ++i )
out[LONG_MAX+i] = slope * in[LONG_MAX+i] + intercept;
}
In the above case Visual Studio now reports:
--- Analyzing function: rescale
c:\users\malat\autovec\vec.c(21) : info C5012: loop not parallelized due to reason '1000'
c:\users\malat\autovec\vec.c(23) : info C5012: loop not parallelized due to reason '1000'
Which means:
The compiler detected a data dependency in the loop body.
I cannot see where there would be a data dependency in my second case.
How should I rewrite my initial code to please the Auto-Vectorizer mode of Visual Studio 2013 ?
Neither SSE2 nor its predecessor, SSE, have proper instructions that convert uint16_t
-s into double
-s.
Convert in
into double*
.