Intel compilator for loop speed c

I'm struggling to understand why this code runs with blistering speed with Intel compiler 12, and really slows down with Intel compiler 16

#include <stdlib.h>
#include <time.h>
int main(int argc, char *argv[])
{
    int i,t;
    int n=10000000;
    int T=1000;
    time_t t1,t2;

    // double A[n],B[n],C[n];
    double *A = (double*) malloc (sizeof(double)*n);
    double *B = (double*) malloc (sizeof(double)*n);
    double *C = (double*) malloc (sizeof(double)*n);



    for (i=0;i<n;i++)
    {
       A[i]=1.0;
       B[i]=2.0;
    }
    t1=clock();

    for (t=0;t<T;t++)
       for (i=0;i<n;i++)
          C[i]=A[i]*B[i];

    t2=clock();
    double sum=0.0;
    for (i=0;i<n;i++) sum += C[i];
    printf("sum %f\n",sum);
    printf("time %f\n",(double)(t2-t1)/CLOCKS_PER_SEC);
}

Intel compiler 12: Takes 0.1 second to run on sandy bridge; Intel compiler 16: Takes 25 seconds to run on sandy bridge

makefile: icc -O2 -o array array.c

Solution

Likely, one of the compilers aggressively optimizes away the whole burdensome nested loop. It seems likely that your optimized code actually ends up as:

t1=clock();
t2=clock();
double sum=0.0;
for (i=0;i<n;i++) sum += A[i]*B[i];

It is perfectly fine for the compiler to do such optimizations. You can block optimizations by making the loop iterators volatile.

Ensure that you have same level of optimization enabled on both compilers.