EDIT: I added some output and a link the code on github.
I try to write a finite difference function in Fortran using OpenMP, but the behavior seems different between ifort (15.0.1 20141023) and gfortran (4.8.5 20150623). The code works well in sequential.
Indeed, the result of the function seems shared with gfortran and I don't really understand how it works with ifort.
The code below works well with gfortran (-g -O3 -fopenmp
): all threads use the same location of fd
, the result of my function.
But with ifort (-g -O3 -traceback -heap-arrays
), it seems that it's not the case.
I don't know very well OpenMP specifications, I didn't find something related to the attribute of the function's result.
Finally, valgrind run well with gfortran but return a lot of errors with ifort
Did I missed something?
program test
!$ use OMP_lib
use iso_fortran_env
implicit none
integer :: x,y
integer,parameter :: Nbx = 1000, Nby = 1000
real(real64), dimension(Nbx,Nby) :: trucAD=0, trucD=0
real(real64) :: dx=1
!$OMP PARALLEL
!$OMP DO
do y=1,Nby
do x=1,Nbx
trucAD(x,y) = x*dx
enddo
enddo
!$OMP END DO
trucD = dfonc2dxd(trucAD,Nbx,Nby,dx)
!$OMP SINGLE
write(*,*) 'Reference result =====> ', Nbx*Nby*dx
write(*,*) 'Result per threads:'
!$OMP END SINGLE
!$OMP CRITICAL
!$ write(*,*) '#t: ',omp_get_thread_num(), 'test: ', sum(trucD)
!$OMP END CRITICAL
!$OMP END PARALLEL
contains
function dfonc2dxd(foncad,DimX, DimY,da) result(fd)
implicit none
integer, intent(in) :: DimX, DimY
real(real64), dimension(DimX,DimY) :: fd
real(real64), dimension(DimX,DimY), intent(in) :: foncad
real(real64),intent(in) :: da
integer :: qq , ii , xx
real(real64), dimension(11,5), parameter :: coef_diff5dc = reshape((/&
-2.391602219538d0, 5.832490322294d0, -7.650218001181d0, 7.907810563576d0, -5.922599052629d0, 3.071037015445d0,&
-1.014956769726d0, 0.170022256519d0, 0.002819958377d0,-0.004791009708d0, -0.000013063429d0,&
-0.180022054228d0, -1.237550583044d0, 2.484731692990d0,-1.810320814061d0, 1.112990048440d0, -0.481086916514d0,&
0.126598690230d0, -0.015510730165d0, 0.000021609059d0, 0.000156447570d0, -0.000007390277d0,&
0.057982271137d0, -0.536135360383d0, -0.264089548965d0, 0.917445877604d0, -0.169688364841d0, -0.029716326170d0,&
0.029681617641d0, -0.005222483773d0, -0.000118806260d0,-0.000118806260d0, -0.000020069730d0,&
-0.013277273810d0, 0.115976072920d0, -0.617479187931d0,-0.274113948204d0, 1.086208764653d0, -0.402951626982d0,&
0.131066986242d0, -0.028154858354d0, 0.002596328316d0, 0.000128743150d0, 0.000000000000d0,&
0.016756572303d0, -0.117478455239d0, 0.411034935097d0,-1.130286765151d0, 0.341435872099d0, 0.556396830543d0,&
-0.082525734207d0, 0.003565834658d0, 0.001173034777d0,-0.000071772607d0, -0.000000352273d0 /) &
,shape(coef_diff5dc),order=(/1,2/))
real(real64),dimension(5),parameter::coef_diff=(/0.872756993962667d0,-0.286511173973333d0,0.09032000128000002d0,&
-0.020779405824000d0,0.0024845946880000d0/)
real(real64) :: unsda
unsda = 1.0d0/da
!$OMP SINGLE
!$ write(*,*) 'Location of fd for every threads'
!$OMP END SINGLE
!$ write(*,*) omp_get_thread_num(), loc(fd)
!$OMP BARRIER
!$OMP DO SCHEDULE(STATIC)
do qq = 1, DimY
!calcul des points centrés
do ii=6, DimX-5
fd(ii,qq) = 0
do xx=1, 5
fd(ii,qq) = fd(ii,qq) + coef_diff(xx)*(foncad(ii+xx,qq) - foncad(ii-xx,qq))
enddo
fd(ii,qq) = unsda * fd(ii,qq)
enddo
! calcul des points décentrés
do ii=1,5 !points avant
fd(ii,qq) = 0
do xx=1, 11
fd(ii,qq) = fd(ii,qq) + coef_diff5dc(xx,ii)*foncad(xx,qq)
enddo
fd(ii,qq) = unsda*fd(ii,qq)
enddo
do ii = DimX, DimX-4,-1 !points arriere
fd(ii,qq) = 0
do xx=1, 11
fd(ii,qq) = fd(ii,qq) - coef_diff5dc(xx,DimX-ii+1)*foncad(DimX-xx+1,qq)
enddo
fd(ii,qq) = unsda * fd(ii,qq)
enddo
enddo
!$OMP END DO
end function dfonc2dxd
end program test
When I use gfortran, I obtain:
Location of result fd, for every threads
4 6304000
3 6304000
0 6304000
2 6304000
1 6304000
Reference=====> 1000000.0000000000
Result per threads:
#t: 4 test: 1000000.0000000000
#t: 2 test: 1000000.0000000000
#t: 3 test: 1000000.0000000000
#t: 1 test: 1000000.0000000000
#t: 0 test: 1000000.0000000000
All threads works on the same result in fortran function and the result seems correct. No valgrind warning.
When I try with ifort:
Location of result fd, for every threads
0 47402661998608
3 47402629984272
2 47402645991440
1 47402637987856
4 47402653995024
Reference=====> 1000000.00000000
Result per threads:
And it's locked. Result don't have the same location for every threads. With valgrind, I get a huge log of errors. I putted it on github: https://github.com/Hilid/test_OMP/blob/master/logval.txt
I don't think loc(fd)
is useful in any way. It can be a temporary variable anywhere in the memory (probably stack) or with some return-value optimization it can be the actual variable it is being assigned to. But only if it is being assigned to a variable. I don't think OpenMP changes much in this.
This code gives me a suspicion of a race condition, aren't multiple threads writing to the same shared variable?
!$OMP PARALLEL
...
trucD = dfonc2dxd(trucAD,Nbx,Nby,dx)