First of all I know the example is not a noticeable improvement from a single threaded execution. I am learning the basic of OpenMP and I dont know why it doesnt work as I expected.
So, I am parallelizing the traditional matrix multiplication algorithm and finding what loops can be parallelized and which not. The algorithm is:
for (i = 0; i < SIZE; i++)
for (j = 0; j < SIZE; j++)
for (k = 0; k < SIZE; k = k + 1)
C[i][j] = C[i][j] + A[i][k] * B[k][j];
I know that putting a parallel for on the k loop can cause unexpected results so I want to use a critical section to fix it like this:
for (i = 0; i < SIZE; i++)
for (j = 0; j < SIZE; j++)
#pragma omp parallel for shared(C)
for (k = 0; k < SIZE; k = k + 1)
{
int tmp = A[i][k] * B[k][j];
#pragma omp critical
{
C[i][j] += tmp;
}
}
}
But when comparing results with the code below correct gives me 0. D is the matrix without omp. I know there are better ways to do this with reduction but I am learning about critical sections and want to know why this doesnt work.
int correct = 1;
for (i = 0; i < SIZE && correct; i++)
for (j = 0; j < SIZE && correct; j++)
if (C[i][j] != D[i][j])
correct = 0;
There are two fairly basic bugs in my code.