I'm struggling to understand why this code runs with blistering speed with Intel compiler 12, and really slows down with Intel compiler 16
#include <stdlib.h>
#include <time.h>
int main(int argc, char *argv[])
{
int i,t;
int n=10000000;
int T=1000;
time_t t1,t2;
// double A[n],B[n],C[n];
double *A = (double*) malloc (sizeof(double)*n);
double *B = (double*) malloc (sizeof(double)*n);
double *C = (double*) malloc (sizeof(double)*n);
for (i=0;i<n;i++)
{
A[i]=1.0;
B[i]=2.0;
}
t1=clock();
for (t=0;t<T;t++)
for (i=0;i<n;i++)
C[i]=A[i]*B[i];
t2=clock();
double sum=0.0;
for (i=0;i<n;i++) sum += C[i];
printf("sum %f\n",sum);
printf("time %f\n",(double)(t2-t1)/CLOCKS_PER_SEC);
}
makefile: icc -O2 -o array array.c
Likely, one of the compilers aggressively optimizes away the whole burdensome nested loop. It seems likely that your optimized code actually ends up as:
t1=clock();
t2=clock();
double sum=0.0;
for (i=0;i<n;i++) sum += A[i]*B[i];
It is perfectly fine for the compiler to do such optimizations. You can block optimizations by making the loop iterators volatile
.
Ensure that you have same level of optimization enabled on both compilers.