Search code examples
cvisual-studio-2013vectorizationc11sse2

info C5012: loop not parallelized due to reason ‘1007’


I am trying out the Auto-Vectorizer mode of Visual Studio 2013 on x86_64, and I am a bit surprised with the following. Consider the naive code:

static void rescale( double * __restrict out, unsigned short * __restrict in, size_t n, const double intercept, const double slope )
{
    for( size_t i = 0; i < n; ++i )
        out[i] = slope * in[i] + intercept;
}

Visual Studio returns that it is failing on such naive example with:

--- Analyzing function: rescale
c:\users\malat\autovec\vec.c(18) : info C5012: loop not parallelized due to reason '1007'

Where compilation line is (I am only interested in SSE2 for now):

cl vec.c /O2 /Qpar /Qpar-report:2

Looking at the documentation:

Leads to:

Which reads as:

The loop induction variable or the loop bounds are not signed 32-bit numbers (int or long). Resolve this by changing the type of the induction variable.

Is there a way to rewrite this loop so that the Auto-Vectorizer mode is triggered properly ?

I failed to rewrite the code using a simple approach:

static void rescale( double * __restrict out, unsigned short * __restrict in, size_t n, const double intercept, const double slope )
{
  const long first = (long)n;
  const long secnd = n > LONG_MAX ? n - LONG_MAX : 0;
  for( long i = 0; i < first; ++i )
    out[i] = slope * in[i] + intercept;
  for( long i = 0; i < secnd; ++i )
    out[LONG_MAX+i] = slope * in[LONG_MAX+i] + intercept;
}

In the above case Visual Studio now reports:

--- Analyzing function: rescale
c:\users\malat\autovec\vec.c(21) : info C5012: loop not parallelized due to reason '1000'
c:\users\malat\autovec\vec.c(23) : info C5012: loop not parallelized due to reason '1000'

Which means:

The compiler detected a data dependency in the loop body.

I cannot see where there would be a data dependency in my second case.

How should I rewrite my initial code to please the Auto-Vectorizer mode of Visual Studio 2013 ?


Solution

  • Neither SSE2 nor its predecessor, SSE, have proper instructions that convert uint16_t-s into double-s.

    Convert in into double*.