Search code examples
c++openmp

OMP parallel for is not dividing iterations


I am trying to do distributed search using omp.h. I am creating 4 threads. Thread with id 0 does not perform the search instead it overseas which thread has found the number in array. Below is my code:

int arr[15]; //This array is randomly populated
int process=0,i=0,size=15; bool found=false;



 #pragma omp parallel num_threads(4)
 {

  int thread_id = omp_get_thread_num();

  #pragma omp cancellation point parallel


  if(thread_id==0){

       while(found==false){ continue; }

      if(found==true){

             cout<<"Number found by thread: "<<process<<endl;

             #pragma omp cancel parallel

            }


        }

   else{
         #pragma omp parallel for schedule(static,5)


          for(i=0;i<size;i++){



            if(arr[i]==number){  //number is a int variable and its value is taken      from user

                    found = true;

                    process = thread_id;




                  }

               cout<<i<<endl;

               }

                }



         }

The problem i am having is that each thread is executing for loop from i=0 till i=14. According to my understanding omp divides the iteration of the loops but this is not happening here. Can anyone tell me why and its possible solution?


Solution

  • Your problem is that you have a parallel inside a parallel. That means that each thread from the first parallel region makes a new team. That is called nested parallelism and it is allowed, but by default it's turned off. So each thread creates a team of 1 thread, which then executes its part of the for loop, which is the whole loop.

    So your omp parallel for should be omp for.

    But now there is another problem: your loop is going to be distributed over all threads, except that thread zero never gets to the loop. So you get deadlock.

    .... and the actual solution to your problem is a lot more complicated. It involves creating two tasks, one that spins on the shared variable, and one that does the parallel search.

    #pragma omp parallel
      {
    #   pragma omp single
        {
          int p = omp_get_num_threads();
          int found = 0;
    #     pragma omp taskgroup
          {
            /*
             * Task 1 listens to the shared variable
             */
    #       pragma omp task shared(found)
            {
              while (!found) {
                if (omp_get_thread_num()<0) printf("spin\n");
                continue; }
              printf("found!\n");
    #         pragma omp cancel taskgroup
            } // end 1st task
            /*
             * Task 2 does something in parallel,
             * sets `found' to true if found
             */
    #     pragma omp task shared(found)
            {
    #         pragma omp parallel num_threads(p-1)
    #         pragma omp for
              for (int i=0; i<p; i++)
                // silly test
                if (omp_get_thread_num()==2) {
                  printf("two!\n");
                  found = 1;
                }
            } // end 2nd task
          } // end taskgroup
        }
      }
    

    (Do you note the printf that is never executed? I needed that to prevent the compiler from "optimizing away" the empty while loop.)

    Bonus solution:

    #pragma omp parallel num_threads(4)
        {
         if(omp_get_thread_num()==0){  spin_on_found; }
         if(omp_get_thread_num()!=0){
          #pragma omp for nowait schedule(dynamic)  
              for ( loop ) stuff
    

    The combination of dynamic and nowait can somehow deal with the missing thread.