c visual-studio-2013 vectorization c11 sse2

info C5012: loop not parallelized due to reason ‘1007’

I am trying out the Auto-Vectorizer mode of Visual Studio 2013 on x86_64, and I am a bit surprised with the following. Consider the naive code:

static void rescale( double * __restrict out, unsigned short * __restrict in, size_t n, const double intercept, const double slope )
{
    for( size_t i = 0; i < n; ++i )
        out[i] = slope * in[i] + intercept;
}

Visual Studio returns that it is failing on such naive example with:

--- Analyzing function: rescale
c:\users\malat\autovec\vec.c(18) : info C5012: loop not parallelized due to reason '1007'

Where compilation line is (I am only interested in SSE2 for now):

cl vec.c /O2 /Qpar /Qpar-report:2

Looking at the documentation:

https://msdn.microsoft.com/en-us/library/hh872235(v=vs.120).aspx

Leads to:

https://msdn.microsoft.com/en-us/library/jj658585(v=vs.120).aspx#BKMK_ReasonCode100x

Which reads as:

The loop induction variable or the loop bounds are not signed 32-bit numbers (int or long). Resolve this by changing the type of the induction variable.

Is there a way to rewrite this loop so that the Auto-Vectorizer mode is triggered properly ?

I failed to rewrite the code using a simple approach:

static void rescale( double * __restrict out, unsigned short * __restrict in, size_t n, const double intercept, const double slope )
{
  const long first = (long)n;
  const long secnd = n > LONG_MAX ? n - LONG_MAX : 0;
  for( long i = 0; i < first; ++i )
    out[i] = slope * in[i] + intercept;
  for( long i = 0; i < secnd; ++i )
    out[LONG_MAX+i] = slope * in[LONG_MAX+i] + intercept;
}

In the above case Visual Studio now reports:

--- Analyzing function: rescale
c:\users\malat\autovec\vec.c(21) : info C5012: loop not parallelized due to reason '1000'
c:\users\malat\autovec\vec.c(23) : info C5012: loop not parallelized due to reason '1000'

Which means:

The compiler detected a data dependency in the loop body.

I cannot see where there would be a data dependency in my second case.

How should I rewrite my initial code to please the Auto-Vectorizer mode of Visual Studio 2013 ?

Solution

Neither SSE2 nor its predecessor, SSE, have proper instructions that convert uint16_t-s into double-s.

Convert in into double*.