I am a newbie in parallel programming. This is my serial code that I would like do parallelize
program main
implicit none
integer :: pr_number, i, pr_sum
real :: pr_av
pr_sum = 0
do i=1,1000
! The following instruction is an example to simplify the problem.
! In the real case, it takes a long time that is more or less the same for all threads
! and it returns a large array
pr_number = int(rand()*10)
pr_sum = pr_sum+pr_number
pr_av = (1.d0*pr_sum) / i
print *,i,pr_av ! In real case, writing a huge amount of data on one file
enddo
end program main
I woud like to parallelize pr_number = int(rand()*10)
and to have only one print
each num_threads.
I tried many things but it does not work. For example,
program main
implicit none
integer :: pr_number, i, pr_sum
real :: pr_av
pr_sum = 0
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(pr_number) SHARED(pr_sum,pr_av)
!$OMP DO REDUCTION(+:pr_sum)
do i=1,1000
pr_number = int(rand()*10)
pr_sum = pr_sum+pr_number
!$OMP SINGLE
pr_av = (1.d0*pr_sum) / i
print *,i,pr_av
!$OMP END SINGLE
enddo
!$OMP END DO
!$OMP END PARALLEL
end program main
I have an error message at compilation time : work-sharing region may not be closely nested inside of work-sharing, critical or explicit task region.
How can I have an output like that (if I have 4 threads for example) ?
4 3.00000000
8 3.12500000
12 4.00000000
16 3.81250000
20 3.50000000
...
I repeat, I am a beginner on parallel programming. I read many things on stackoverflow but, I think, I have not yet the skill to understand. I work on it, but ...
To explain as suggested in comments. A do loop
performs N times a lengthy calculation (N markov chain montecarlo) and the average of all calculations is written to a file at each iteration. The previous average is deleted, only the last one is kept, so process can be followed. I would like to parallelise this calculation over 4 threads.
This is what I imagine to do but perhaps, it is not the best idea.
Thanks for help.
The value of the reduction variable inside the construct where the reduction happens is not really well defined. The reduction clause with a sum is typically implemented by each thread having a private copy of the reduction variable that they use for summing just the numbers for that very thread. At the and of the loop, the private copies are summed into the final sum. There is little point printing the intermediate value before the reduction is actually made.
You can do the reduction in a nested loop and print the intermediate result every n iterations
program main
implicit none
integer :: pr_number, i, j, pr_sum
real :: pr_av
pr_sum = 0
!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(pr_number) SHARED(pr_sum,pr_av)
do j = 1, 10
!$OMP DO REDUCTION(+:pr_sum)
do i=1,100
pr_number = int(rand()*10)
pr_sum = pr_sum+pr_number
enddo
!$OMP END DO
!$omp single
pr_av = (1.d0*pr_sum) / 100
print *,j*100,pr_av
!$omp end single
end do
!$OMP END PARALLEL
end program main
I kept the same rand()
that may or may not work correctly in parallel depending on the compiler. Even if it gives the right results, it may actually be executed sequentially using some locks or barriers. However, the main point carries over to other libraries.
Result
> gfortran -fopenmp reduction-intermediate.f90
> ./a.out
100 4.69000006
200 9.03999996
300 13.7600002
400 18.2299995
500 22.3199997
600 26.5900002
700 31.0599995
800 35.4300003
900 40.1599998