How to apply openMP to a C++ function to validate all rows of a sudoku puzzle solution?

I am designing a program that will test to see whether a valid sudoku puzzle solution is given to the program or not. I first designed it in C++ but now I want to try to make it parallel. The program compiles fine without errors.

First I had to figure out a way to deal with using a return statement inside of a structured block. I just decided to make an array of bool's that are initialized to true. However the output from this function is false and I know for a fact the solution I am submitting is true. I am new to openMP and was wondering if anyone could help me out?

I have a feeling the issue is with my variable a getting set back to 0 and maybe also with my other variable nextSudokuNum getting set back to 1.

bool test_rows(int sudoku[9][9])
{
   int i, j, a;
   int nextSudokuNum = 1;
   bool rowReturn[9];

#pragma omp parallel for private(i)
   for(i = 0; i < 9; i++)
   {
      rowReturn[i] = true;
   }

#pragma omp parallel for private(i,j) \
   reduction(+: a, nextSudokuNum)
   for(i = 0; i < 9; i++)
   {
      for(j = 0; j < 9; j++)
      {
         a = 0;
         while(sudoku[i][a] != nextSudokuNum) {
            a++;
            if(a > 9) {
               rowReturn[i] = false;
            }
         }

         nextSudokuNum++;
      }
      nextSudokuNum = 1;
   }

   for(i = 0; i < 9; i++)
   {
      if(rowReturn[i] == false) {
         cout << "Invalid Sudoku Solution(Next Valid Sudoku Number Not Found)" << endl;
         cout << "Check row " << (i+1) << endl;
         return false;
      }
   }

   cout << "Valid sudoku rows(Returning true)" << endl;
   return true;
}

Solution

Disclaimer:

First off, do not parallelize very small loops or loops which execute nearly instantaneously. The overhead of creating the threads will dominate the benefit you would get by executing the inner statements of the loop in parallel. So unless each iteration you are parallelizing performs thousands-millions of FLOPs, the serial version of the code will run faster than the parallel version of the code.

Therefore, a better plan for parallelizing your (probable) tasks is to parallelize at a higher level. That is, presumably you are calling test_rows(sudoku), test_columns(sudoku), and test_box(sudoku) from one function somewhere else. What you can do is call these three serial functions in parallel using OpenMP sections where calling each of these three functions is a separate OpenMP section. This will only benefit from using 3 cores of your CPU, but presumably you are doing this on your laptop anyway so you probably only have 2 or 4 anyway.

Now to your actual problems:

You are not parallelizing over j, but merely over i. Therefore, you can see that your variable nextSudokuNum is not being reduced; for every i iteration, nextSudokuNum is self-contained. Thus it should be initialized inside the loop and made private in the #pragma omp parallel clause.

Likewise, you are not performing a reduction over a either. For every iteration of i, a is set, compared to, and incremented internally. Again it should be a private variable.

Therefore, your new code should look like:

   #pragma omp parallel for private(i,j,a,nextSudokuNum)
   for(i = 0; i < 9; i++)
   {
      // all private variables must be set internal to parallel region before being used
      nextSudokuNum = 1;

      for(j = 0; j < 9; j++)
      {
         a = 0;
         while(sudoku[i][a] != nextSudokuNum) {
            a++;
            if(a > 9) {
               rowReturn[i] = false;
            }
         }

         nextSudokuNum++;
      }
   }