Why loop tiling?

according to Wikipedia (http://en.wikipedia.org/wiki/Loop_tiling) and many other sources, loop tiling is a loop optimization technique which helps to take advantage of cache (locality of reference). The iteration space is divided into number of blocks and thus cache is better exploited.

From the link above, can somebody explain what difference it makes for 1D case (Overview section)? To my mind, the same number of cache misses will happen in both cases.

Solution

The "1D case" is incomplete, it just serves to describe the idea of blocking. There is no "body" so nothing you can analyze.

Blocking is useful in situations where the same data is used several times in an algorithm, and blocking ensures that the data is till in the cache when you reuse it.

So an example like

for(i=0; i<N; ++i){
  // Processing 1
  ...
}
for(i=0; i<N; ++i){
  // Processing 2
  ...
}

for(j=0; j<N; j+=B){
  for(i=j; i<min(N, j+B); ++i){
    // Processing 1
    ....
  for(i=j; i<min(N, j+B); ++i){
    // Processing 2
    ....
  }
}

would be more convincing.