Search code examples
parallel-processingfortranopenmp

Fortran OMP : how to do a parallel and a single task?


I am a newbie in parallel programming. This is my serial code that I would like do parallelize

program main
  implicit none
  integer :: pr_number, i, pr_sum
  real :: pr_av
  
  pr_sum = 0
  do i=1,1000
! The following instruction is an example to simplify the problem.
! In the real case, it takes a long time that is more or less the same for all threads
! and it returns a large array
   pr_number = int(rand()*10) 
   pr_sum = pr_sum+pr_number
   pr_av = (1.d0*pr_sum) / i
   print *,i,pr_av ! In real case, writing a huge amount of data on one file
 enddo

 end program main

I woud like to parallelize pr_number = int(rand()*10) and to have only one print each num_threads. I tried many things but it does not work. For example,

program main
  implicit none
  integer :: pr_number, i, pr_sum
  real :: pr_av
  
  pr_sum = 0
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(pr_number) SHARED(pr_sum,pr_av)
!$OMP DO REDUCTION(+:pr_sum)
  do i=1,1000
   pr_number = int(rand()*10)
   pr_sum = pr_sum+pr_number
!$OMP SINGLE
   pr_av = (1.d0*pr_sum) / i
   print *,i,pr_av
!$OMP END SINGLE
 enddo
!$OMP END DO
!$OMP END PARALLEL

end program main

I have an error message at compilation time : work-sharing region may not be closely nested inside of work-sharing, critical or explicit task region.

How can I have an output like that (if I have 4 threads for example) ?

       4   3.00000000    
       8   3.12500000    
      12   4.00000000    
      16   3.81250000    
      20   3.50000000  
      ...

I repeat, I am a beginner on parallel programming. I read many things on stackoverflow but, I think, I have not yet the skill to understand. I work on it, but ...

Edit 1

To explain as suggested in comments. A do loop performs N times a lengthy calculation (N markov chain montecarlo) and the average of all calculations is written to a file at each iteration. The previous average is deleted, only the last one is kept, so process can be followed. I would like to parallelise this calculation over 4 threads. This is what I imagine to do but perhaps, it is not the best idea.

enter image description here

Thanks for help.


Solution

  • The value of the reduction variable inside the construct where the reduction happens is not really well defined. The reduction clause with a sum is typically implemented by each thread having a private copy of the reduction variable that they use for summing just the numbers for that very thread. At the and of the loop, the private copies are summed into the final sum. There is little point printing the intermediate value before the reduction is actually made.

    You can do the reduction in a nested loop and print the intermediate result every n iterations

    program main
      implicit none
      integer :: pr_number, i, j, pr_sum
      real :: pr_av
      
      pr_sum = 0
      !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(pr_number) SHARED(pr_sum,pr_av)
      do j = 1, 10
        !$OMP DO REDUCTION(+:pr_sum)
          do i=1,100
          pr_number = int(rand()*10)
          pr_sum = pr_sum+pr_number
        enddo
        !$OMP END DO
    
        !$omp single
        pr_av = (1.d0*pr_sum) / 100
        print *,j*100,pr_av
        !$omp end single
      end do
      !$OMP END PARALLEL
    
    end program main
    

    I kept the same rand() that may or may not work correctly in parallel depending on the compiler. Even if it gives the right results, it may actually be executed sequentially using some locks or barriers. However, the main point carries over to other libraries.

    Result

    > gfortran -fopenmp reduction-intermediate.f90 
    > ./a.out 
         100   4.69000006    
         200   9.03999996    
         300   13.7600002    
         400   18.2299995    
         500   22.3199997    
         600   26.5900002    
         700   31.0599995    
         800   35.4300003    
         900   40.1599998