OpenMP critical section matrix multiplication

First of all I know the example is not a noticeable improvement from a single threaded execution. I am learning the basic of OpenMP and I dont know why it doesnt work as I expected.

So, I am parallelizing the traditional matrix multiplication algorithm and finding what loops can be parallelized and which not. The algorithm is:

for (i = 0; i < SIZE; i++)
    for (j = 0; j < SIZE; j++)
        for (k = 0; k < SIZE; k = k + 1)
            C[i][j] = C[i][j] + A[i][k] * B[k][j];

I know that putting a parallel for on the k loop can cause unexpected results so I want to use a critical section to fix it like this:

for (i = 0; i < SIZE; i++)
    for (j = 0; j < SIZE; j++)
        #pragma omp parallel for shared(C)
        for (k = 0; k < SIZE; k = k + 1)
        {
            int tmp = A[i][k] * B[k][j];
            #pragma omp critical 
            {
                C[i][j] += tmp;
            }
        }
}

But when comparing results with the code below correct gives me 0. D is the matrix without omp. I know there are better ways to do this with reduction but I am learning about critical sections and want to know why this doesnt work.

int correct = 1;
    for (i = 0; i < SIZE && correct; i++)
        for (j = 0; j < SIZE && correct; j++)
            if (C[i][j] != D[i][j])
                correct = 0;

Solution

There are two fairly basic bugs in my code.

The tmp var is int when it should be double.
After changing tmp to double. The comparison between two doubles it is not associative and an epsilon should be taken in consideration when comparing.