Speed of C program execution

I got one problem at my exam for subject Principal of Programming Language. I thought for long time but i still did not understand the problem

Problem: Below is a program C, that is executed in MSVC++ 6.0 environment on a PC with configuration ~ CPU Intel 1.8GHz, Ram 512MB

#define M 10000
#define N 5000
int a[M][N];

void main() {
    int i, j;
    time_t start, stop;

    // Part A
    start = time(0);
    for (i = 0; i < M; i++)
        for (j = 0; j < N; j++)
            a[i][j] = 0;
    stop = time(0);
    printf("%d\n", stop - start);

    // Part B
    start = time(0);
    for (j = 0; j < N; j++)
        for (i = 0; i < M; i++)
            a[i][j] = 0;
    stop = time(0);
    printf("%d\n", stop - start);
}

Explain why does part A only execute in 1s, but it took part B 8s to finish?

Solution

Row-major order versus column-major order.

Recall first that all multi-dimensional arrays are represented in memory as a continguous block of memory. Thus the multidimensional array A(m,n) might be represented in memory as

a00 a01 a02 ... a0n a10 a11 a12 ... a1n a20 ... amn

In the first loop, you run through this block of memory sequentially. Thus, you run through the array traversing the elements in the following order

a00  a01  a02  ...  a0n  a10  a11  a12  ...  a1n  a20 ... amn

1    2    3         n    n+1  n+2  n+3 ...   2n   2n+1    mn

In the second loop, you skip around in memory and run through the array traversing the elements in the following order

a00  a10  a20  ...  am0  a01  a11  a21  ...  am1  a02  ...  amn

or, perhaps more clearly,

a00  a01  a02  ...  a10  a11  a12  ...  a20 ... amn
1    m+1  2m+1      2    m+2  2m+2      3       mn

All that skipping around really hurts you because you don't gain advantages from caching. When you run through the array sequentially, neighboring elements are loaded into the cache. When you skip around through the array, you don't get these benefits and instead keep getting cache misses harming performance.