I am very new to Fortran and am working through an exercise that involves summing numbers in parallel through OpenMP.
I have been given that the following code correctly calculates the sum of numbers in parallel through OpenMP
!$omp parallel do private (I)
!$omp+ reduction(+:totals)
do I=1,100
totals = totals + localsum(I)
enddo
!$omp end parallel do
If I adjust the above code so that I can run it in my own Fortran program, I produce
Program test
implicit none
real totals
double precision, dimension (1 : 100) :: localsum
integer I
!$omp parallel do private (I)
!$omp+ reduction(+:totals)
do I=1,100
localsum(I)=I
totals = totals + localsum(I)
enddo
!$omp end parallel do
print *, 'The calculated sum total is', totals
end
This program returns
The calculated sum total is 5050.00000
However, I'm not sure why I needed to add the additional line for
localsum(I)=I
when the original code didn't have this line. I notice that if I remove
!$omp+ reduction(+:totals)
Then
Program test
implicit none
real totals
double precision, dimension (1 : 100) :: localsum
integer I
!$omp parallel do private (I)
do I=1,100
localsum(I)=I
totals = totals + localsum(I)
enddo
!$omp end parallel do
print *, 'The calculated sum total is', totals
end
returns
The calculated sum total is 5050.00000
when the calculated total should be wrong. Including the reduction, !$omp+ reduction(+:totals)
, should be necessary to compute the correct totals.
Is there an alternative way of adjusting the do loop to match the original code provided? I'm not sure why I had to change
do I=1,100
totals = totals + localsum(I)
enddo
to
do I=1,100
localsum(I)=I
totals = totals + localsum(I)
enddo
in order to calculate the local sum.
With or without the !$omp+ reduction(+:totals)
the executed code is different.
Without this directive, you directly update the global var totals
. This may work (and in your example it worked), but it far from being guaranteed. The problem is that this may lead to races.
Assume thread a and thread b want to update this var. They need:
1. to fetch the var from memory
2. to update it in the processor
3. to write it back to memory
What will be the relative order of these operations in threads a and b? It is unspecified.
If the order is 1a2a3a1b2b3b, there is no problem.
If it is 1a1b2a2b3a3b there will be a problem: 1a1b(threads a et b fetch the same value)2a2b(they update it more or less simultaneously)3a3b(thread a writes its result and it is overwritten by thread b value).
To avoid that, you can have atomic operations, that guarantee that the read-modify-write cycle cannot be interrupted but they are very expensive and may slow down significantly the execution time.
To avoid that, you must use reductions. The line !$omp+ reduction(+:totals)
tells openmp to do a reduction in a safe and efficient way. What will actually be done is
totals
in a safe way: an atomic operation will be performed in such a way that the global var is properly updated and avoid races between threads.There are still atomic updates, but their number is reduced and the accumulation is mostly performed by fast local operations.
Concerning the usefulness of the line localsum(I)=I
, it is required is the vector localsum
is not initialized previously. But if the goal is just to add the first integers, you can just use
do I=1,100
totals = totals + I
enddo
The performance will be improved and the result identical. And both loops are parallelized in a similar way.