Search code examples
fortranopenmpf2py

Fortran: segmentation fault


I know I once made a similar topic, but that one was different. This time, adding a print statement does not change whether or not I get a segfault.

    call omp_set_num_threads(omp_get_max_threads())
    !$omp parallel do  &
    !$omp default(firstprivate) &
    !$omp private(present_surface) &
    !$omp lastprivate(fermi)
    do m = 1, fourth 
        do n = 1, third
            do j = 1, second
                do i = 1, first
                    !current angle is phi[i,j,ii,jj]
                    !we have to find the current fermi surface
                    present_surface = 0.
                    do a = 1, fourth    
                        if (angle(i,j,n,m) == angle_array(a)) then
                            present_surface = surface(a)
                        end if
                    end do 
                    if (radius(i,j,n,m) >= present_surface) then 
                        fermi(i,j,n,m) = 0.
                    else 
                        fermi(i,j,n,m) = 1.
                    end if          
                end do      
            end do      
        end do
    end do  
    !$omp end parallel do

I'm not sure about the lastprivate(fermi) statement, but it doesn't matter at this moment. shared gives the same behaviour.

So, I run this script with 'first, second, third, fourth' being increased. Typical output:

[10] Time elapsed is 0.001 [s].
[15] Time elapsed is 0.002 [s].
[20] Time elapsed is 0.005 [s].
./compile: line 2:  4310 Segmentation fault      python fortran_test.py

Well, how to proceed. I looked at gdb python; run fortran_test.py, and found:

(gdb) run fortran_test.py 
Starting program: /usr/bin/python fortran_test.py
[Thread debugging using libthread_db enabled]
[New Thread 0x7fffed9b0700 (LWP 4251)]
[New Thread 0x7fffe8ba5700 (LWP 4252)]
[New Thread 0x7fffe83a4700 (LWP 4253)]
[New Thread 0x7fffe7ba3700 (LWP 4254)]
[10] Time elapsed is 0.008 [s].
[15] Time elapsed is 0.004 [s].
[20] Time elapsed is 0.005 [s].

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe7ba3700 (LWP 4254)]
0x00007fffe8dc0bb7 in __populate_MOD_fermi_integrand._omp_fn.0 () at     populate.f90:31
31                              do n = 1, third

If I change things in the inner loop - for instance, removing the j,i loops and putting them to a constant - then I just get the segfault at a different line.

I think this has something to do with memory, because it triggers as N increases. However, the things I've tried (export GOMP_STACKSIZE, OMP_STACKSIZE, ulimit) haven't fixed it, and I see no difference using them. (For the time being, I removed them).

Finally, the command to compile (it is in f2py):

 f2py --fcompiler=gfortran --f90flags="-fopenmp -g" -lgomp -m -c populate populate.f90

As you can see, I'm quite stuck. I hope some of you know how to solve the problem.

My aim is to have N=100 run quickly (hence the openMP fortran function), but this shouldn't matter for the fortran code, should it?

If you're wondering, 4GB ram and my swap is 3.1G (Linux Swap / Solaris).

Thank you!


Solution

  • default(firstprivate) lastprivate(fermi) means that each thread receives a private copy of angle, radius, and fermi, with each being of size first x second x third x fourth. In my experience, private arrays are always allocated on the stack, even when the option to automatically turn big arrays into heap arrays is given. Your code most likely crashes because of insufficient stack space.

    Instead of having everything private, you should really take a look at the data dependency and pick the correct data-sharing classes. angle, angle_array, surface, and radius are never written to, therefore they all should be shared. present_surface is being modified and should be private. fermi is written to, but never at the same location by more than one thread, therefore it should also be shared. Also, note that lastprivate won't have the expected result as it makes available the value of the corresponding variable that it has in the thread that executes the logically last iteration of the m-loop. first, second, third, and fourth are simply treated as constants and should be shared. The loop variables, of course, are private, but that is automated by the compiler.

    !$omp parallel do private(present_surface)
    do m = 1, fourth 
        do n = 1, third
            do j = 1, second
                do i = 1, first
                    !current angle is phi[i,j,ii,jj]
                    !we have to find the current fermi surface
                    present_surface = 0.
                    do a = 1, fourth    
                        if (angle(i,j,n,m) == angle_array(a)) then
                            present_surface = surface(a)
                        end if
                    end do 
                    if (radius(i,j,n,m) >= present_surface) then 
                        fermi(i,j,n,m) = 0.
                    else 
                        fermi(i,j,n,m) = 1.
                    end if          
                end do      
            end do      
        end do
    end do  
    !$omp end parallel do