Search code examples
cmultithreadingparallel-processingopenmpmagic-square

OpenMp and #pragma omp for in c how it works and how to check if its doing his objective


I and doing a magic square program with OpenMP in C to try it to get faster. However, my times are higher than the sequencial implementation.

I don't know if I am doing the omp for in a good way and I can't understand if that loop for is spreading into threads how its supposed to get faster, or if I should use something else, Can someone help me?

My sample code:

#pragma omp parallel private(i,j)
  {
    //soma diagonal principal
    #pragma omp for
    for( i = 0; i < size; i++)
        sum += matrix[i][i];

    #pragma omp for
    for( i = 0; i < size; i++)
        sumAux += matrix[i][size-i-1];
    //printf("\nSoma diagonal principal %i e secundária %i\n", sum, sumAux);

    //------------------------------------------LINHAS E COLUNAS-----------------------------------------------------------
    #pragma omp for
    for(int i = 0; i < size; i++) {
        sumRow = 0;
        sumCol = 0;
    for(int j = 0; j < size; j++) {
        sumRow += matrix[i][j];
        sumCol += matrix[j][i];
        }
        //printf("soma  Linhas %i\n",sumRow );
        //printf("soma Colunas %i\n",sumCol );
    }
  }

    //------------------------------------------PRINTS-----------------------------------------------------------
    if (sum == sumCol && sum==sumRow && sum==sumAux  ) {
        printf("Quadrado magico com sum = %d \n", sum);
    } else {
        printf("Quadrado nao magico \n");
    }

    return 0;
}

Solution

  • The code has several race-conditions, namely the updates of the variables sum, sumAux, sumRow, and sumCol. Moreover, this:

    for(int i = 0; i < size; i++) {
        sumRow = 0;
        sumCol = 0;
    for(int j = 0; j < size; j++) {
        sumRow += matrix[i][j];
        sumCol += matrix[j][i];
        }
    }
    

    is just wrong, since:

    A "magic square" is an arrangement of numbers (usually integers) in a square grid, where the numbers in each row, and in each column, and the numbers in the forward and backward main diagonals, all add up to the same number.

    Therefore, you should check that the sum of the values of each row and the sum of the value of each column produces the same result as the sum of the diagonals (from the previous step). Moreover, you can optimize your sequential code by exiting earlier in case the constraints are not meet:

    int sum = 0, sum2 = 0; 
    
    for (int i = 0; i < size; i++){
        sum = sum + mat[i][i];
        sum2 = sum2 + mat[i][size-1-i]; 
    }   
    
    if(sum!=sum2) 
        return 0;
    
    for (int i = 0; i < size; i++) {     
        int rowSum = 0;
        int colSum = 0;     
        for (int j = 0; j < size; j++){
            rowSum += mat[i][j];
            colSum += mat[j][i];
        }
          
        if (rowSum != sum || sum != colSum)
            return 0;
    }
    return 1;
    

    To solve the aforementioned race-conditions you should use the OpenMP reduction clause:

    int sum = 0, sum2 = 0; 
    
    #pragma omp parallel for reduction(+:sum, sum2)
    for (int i = 0; i < size; i++){
        sum = sum + mat[i][i];
        sum2 = sum2 + mat[i][N-1-i]; 
    }   
    
    if(sum!=sum2) 
        return 0;
      
    for (int i = 0; i < size; i++) {     
        int rowSum = 0;
        int colSum = 0;     
        #pragma omp parallel for reduction(+:rowSum, colSum)
        for (int j = 0; j < size; j++){
            rowSum += mat[i][j];
            colSum += mat[j][i];
        }
          
        if (rowSum != sum || sum != colSum)
            return 0;
    }
    return 1;
    

    But my times are higher then the sequencial implementation..

    Introducing OpenMP in a code, or parallelism for that matter, will not magically make your code faster. TL;DR The work done in parallel should be big enough to overcome the overhead of the parallelism (e.g., thread creation, synchronization and so). To do that you need first to increase the size of the parallel tasks, namely by increase the input size to values that justify the aforementioned overhead of the parallelism.