Today i came across a fascinating slide here. It compares two for loops given below.
First
for (int i=0; i<n; i++) {
a[i] * = 3;
}
Second
for (int i=0; i<n; i+=16) {
a[i] * = 3;
}
Here if the first loop will take 8ms the second should be taking only 1ms, at least that's what I expected. But the slide concludes differently. Can anyone please explain why my code may behave like this?
This blog post explains the phenomenon in details. Essentially:
The reason is that the first loop has better cache locality, resulting in less cache misses. There are many questions on the subject on SO such as: