Fortran OMP : how to do a parallel and a single task?

I am a newbie in parallel programming. This is my serial code that I would like do parallelize

program main
  implicit none
  integer :: pr_number, i, pr_sum
  real :: pr_av
  
  pr_sum = 0
  do i=1,1000
! The following instruction is an example to simplify the problem.
! In the real case, it takes a long time that is more or less the same for all threads
! and it returns a large array
   pr_number = int(rand()*10) 
   pr_sum = pr_sum+pr_number
   pr_av = (1.d0*pr_sum) / i
   print *,i,pr_av ! In real case, writing a huge amount of data on one file
 enddo

 end program main

I woud like to parallelize pr_number = int(rand()*10) and to have only one print each num_threads. I tried many things but it does not work. For example,

program main
  implicit none
  integer :: pr_number, i, pr_sum
  real :: pr_av
  
  pr_sum = 0
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(pr_number) SHARED(pr_sum,pr_av)
!$OMP DO REDUCTION(+:pr_sum)
  do i=1,1000
   pr_number = int(rand()*10)
   pr_sum = pr_sum+pr_number
!$OMP SINGLE
   pr_av = (1.d0*pr_sum) / i
   print *,i,pr_av
!$OMP END SINGLE
 enddo
!$OMP END DO
!$OMP END PARALLEL

end program main

I have an error message at compilation time : work-sharing region may not be closely nested inside of work-sharing, critical or explicit task region.

How can I have an output like that (if I have 4 threads for example) ?

       4   3.00000000    
       8   3.12500000    
      12   4.00000000    
      16   3.81250000    
      20   3.50000000  
      ...

I repeat, I am a beginner on parallel programming. I read many things on stackoverflow but, I think, I have not yet the skill to understand. I work on it, but ...

Edit 1

To explain as suggested in comments. A do loop performs N times a lengthy calculation (N markov chain montecarlo) and the average of all calculations is written to a file at each iteration. The previous average is deleted, only the last one is kept, so process can be followed. I would like to parallelise this calculation over 4 threads. This is what I imagine to do but perhaps, it is not the best idea.

Thanks for help.

Solution

The value of the reduction variable inside the construct where the reduction happens is not really well defined. The reduction clause with a sum is typically implemented by each thread having a private copy of the reduction variable that they use for summing just the numbers for that very thread. At the and of the loop, the private copies are summed into the final sum. There is little point printing the intermediate value before the reduction is actually made.

You can do the reduction in a nested loop and print the intermediate result every n iterations

program main
  implicit none
  integer :: pr_number, i, j, pr_sum
  real :: pr_av
  
  pr_sum = 0
  !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(pr_number) SHARED(pr_sum,pr_av)
  do j = 1, 10
    !$OMP DO REDUCTION(+:pr_sum)
      do i=1,100
      pr_number = int(rand()*10)
      pr_sum = pr_sum+pr_number
    enddo
    !$OMP END DO

    !$omp single
    pr_av = (1.d0*pr_sum) / 100
    print *,j*100,pr_av
    !$omp end single
  end do
  !$OMP END PARALLEL

end program main

I kept the same rand() that may or may not work correctly in parallel depending on the compiler. Even if it gives the right results, it may actually be executed sequentially using some locks or barriers. However, the main point carries over to other libraries.

Result

> gfortran -fopenmp reduction-intermediate.f90 
> ./a.out 
     100   4.69000006    
     200   9.03999996    
     300   13.7600002    
     400   18.2299995    
     500   22.3199997    
     600   26.5900002    
     700   31.0599995    
     800   35.4300003    
     900   40.1599998