I know I once made a similar topic, but that one was different. This time, adding a print statement does not change whether or not I get a segfault.
call omp_set_num_threads(omp_get_max_threads())
!$omp parallel do &
!$omp default(firstprivate) &
!$omp private(present_surface) &
!$omp lastprivate(fermi)
do m = 1, fourth
do n = 1, third
do j = 1, second
do i = 1, first
!current angle is phi[i,j,ii,jj]
!we have to find the current fermi surface
present_surface = 0.
do a = 1, fourth
if (angle(i,j,n,m) == angle_array(a)) then
present_surface = surface(a)
end if
end do
if (radius(i,j,n,m) >= present_surface) then
fermi(i,j,n,m) = 0.
else
fermi(i,j,n,m) = 1.
end if
end do
end do
end do
end do
!$omp end parallel do
I'm not sure about the lastprivate(fermi) statement, but it doesn't matter at this moment. shared gives the same behaviour.
So, I run this script with 'first, second, third, fourth' being increased. Typical output:
[10] Time elapsed is 0.001 [s].
[15] Time elapsed is 0.002 [s].
[20] Time elapsed is 0.005 [s].
./compile: line 2: 4310 Segmentation fault python fortran_test.py
Well, how to proceed. I looked at gdb python; run fortran_test.py, and found:
(gdb) run fortran_test.py
Starting program: /usr/bin/python fortran_test.py
[Thread debugging using libthread_db enabled]
[New Thread 0x7fffed9b0700 (LWP 4251)]
[New Thread 0x7fffe8ba5700 (LWP 4252)]
[New Thread 0x7fffe83a4700 (LWP 4253)]
[New Thread 0x7fffe7ba3700 (LWP 4254)]
[10] Time elapsed is 0.008 [s].
[15] Time elapsed is 0.004 [s].
[20] Time elapsed is 0.005 [s].
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe7ba3700 (LWP 4254)]
0x00007fffe8dc0bb7 in __populate_MOD_fermi_integrand._omp_fn.0 () at populate.f90:31
31 do n = 1, third
If I change things in the inner loop - for instance, removing the j,i loops and putting them to a constant - then I just get the segfault at a different line.
I think this has something to do with memory, because it triggers as N increases. However, the things I've tried (export GOMP_STACKSIZE, OMP_STACKSIZE, ulimit) haven't fixed it, and I see no difference using them. (For the time being, I removed them).
Finally, the command to compile (it is in f2py):
f2py --fcompiler=gfortran --f90flags="-fopenmp -g" -lgomp -m -c populate populate.f90
As you can see, I'm quite stuck. I hope some of you know how to solve the problem.
My aim is to have N=100 run quickly (hence the openMP fortran function), but this shouldn't matter for the fortran code, should it?
If you're wondering, 4GB ram and my swap is 3.1G (Linux Swap / Solaris).
Thank you!
default(firstprivate) lastprivate(fermi)
means that each thread receives a private copy of angle
, radius
, and fermi
, with each being of size first x second x third x fourth
. In my experience, private arrays are always allocated on the stack, even when the option to automatically turn big arrays into heap arrays is given. Your code most likely crashes because of insufficient stack space.
Instead of having everything private, you should really take a look at the data dependency and pick the correct data-sharing classes. angle
, angle_array
, surface
, and radius
are never written to, therefore they all should be shared
. present_surface
is being modified and should be private
. fermi
is written to, but never at the same location by more than one thread, therefore it should also be shared
. Also, note that lastprivate
won't have the expected result as it makes available the value of the corresponding variable that it has in the thread that executes the logically last iteration of the m
-loop. first
, second
, third
, and fourth
are simply treated as constants and should be shared
. The loop variables, of course, are private
, but that is automated by the compiler.
!$omp parallel do private(present_surface)
do m = 1, fourth
do n = 1, third
do j = 1, second
do i = 1, first
!current angle is phi[i,j,ii,jj]
!we have to find the current fermi surface
present_surface = 0.
do a = 1, fourth
if (angle(i,j,n,m) == angle_array(a)) then
present_surface = surface(a)
end if
end do
if (radius(i,j,n,m) >= present_surface) then
fermi(i,j,n,m) = 0.
else
fermi(i,j,n,m) = 1.
end if
end do
end do
end do
end do
!$omp end parallel do