Search code examples
concurrencyfortrangfortran

Is there a straightforward way to do concurrent calculations with a random number generator in Fortran?


I have some finite element code I have programmed in Fortran 95 that I have optimised so that I can now get well over 16Mil. elements working under 2GB of memory footprint.

The source function for my code is not smooth so I am using a (stratified) Monte-Carlo method to integrate, which requires a random number generator to select sample locations

I have tried compiling with gfortran-9 using -fopenmp -Ofast -ftree-parallelize-loops=4 but the loop with the random number generator won't go parallel. I tried do concurrent but obviously that didn't work because random_number isn't 'pure'. https://stackoverflow.com/a/32637737/2372254 I also tried blocking my loop but that didn't work either.

Here is the code I am talking about

        do  k=1,n_els ! total elements is n_els**2. This is block
            do i=1+ (k-1)*n_els ,k*n_els
                supp_vec = 0 
                integ_vec = 0.0_wp

                ! in this subroutine I call random_number
                call do_element(ind, n_els, i, num_points_per_strat, &
                                strat_rows, strat_cols, supp_vec, integ_vec) 
                do j=1, 4
                    sc_vec(supp_vec(j) ) = integ_vec(j)
                end do
                ! give some info about progress

                if (mod( i , (n_els**2)/10) == 0) print*, i*10/((n_els**2)/10), "% done"
            end do
        end do

It seems I could write blocks to a file and call n different instances of the routine. I figure there must be a cleaner way to do that. Any tips on how to get that going faster?

I was considering writing a block-worth of points (depending on memory limits) to an array first and supplying that in the subroutine call. Before I try that I thought I would see if anybody had any advice about a better way. It would be good to keep the memory footprint down where possible.


Solution

  • As of version 7 and newer GFortran has a parallel random number generator. When implementing it, here's the OpenMP code I used to verify that the performance indeed scales with increasing numbers of threads (from https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02110.html ):

    
    ! Benchmark generating random numbers
    ! Janne Blomqvist 2015
    program randbench
    #ifdef _OPENMP
      use omp_lib
    #endif
      implicit none
      integer, parameter :: dp=kind(0.d0)          ! double precision
      integer, parameter :: i64 = selected_int_kind(18) ! At least 64-bit integer
    
    #ifdef _OPENMP
      print *, "Using up to ", omp_get_max_threads(), " threads."
    #endif
      call genr4
      call genr8
    contains
      subroutine genr4
        integer, parameter :: n = int(1e7)
        real, save :: r(n)
        integer :: i
        integer(i64) :: t1, t2, td
    #ifdef _OPENMP
        integer :: blocks, blocksize, l, h
    #endif
        Print *, "Generate default real random variables"
        call system_clock (t1)
        !$omp parallel do private(i)
        do i = 1, n
           call random_number(r(i))
        end do
        !$omp end parallel do
        call system_clock (t2)
        td = t2 - t1
        print *, "Generating ", n, " default reals individually took ", td, " ticks."
        call system_clock (t1)
    #ifdef _OPENMP
        blocks = omp_get_max_threads()
        blocksize = n / blocks
        !$omp parallel do private(l,h,i)
        do i = 0, blocks - 1
           l = i * blocksize + 1
           h = l + blocksize - 1
           !print *, "Low: ", l, " High: ", h
           call random_number(r(l:h))
        end do
    #else
        call random_number(r)
    #endif
        Call system_clock (t2)
        print *, "Generating ", n, " default reals as an array took  ", t2-t1, &
             " ticks. => ind/arr = ", real(td, dp) / (t2-t1)
      end subroutine genr4
    
      subroutine genr8
        integer, parameter :: n = int(1e7)
        real(dp), save :: r(n)
        integer :: i
        integer(i64) :: t1, t2, td
    #ifdef _OPENMP
        integer :: blocks, blocksize, l, h
    #endif
        print *, "Generate double real random variables"
        call system_clock (t1)
        !$omp parallel do
        do i = 1, n
           call random_number(r(i))
        end do
        call system_clock (t2)
        td = t2 - t1
        print *, "Generating ", n, " double reals individually took  ", td, " ticks."
        call system_clock (t1)
    #ifdef _OPENMP
        blocks = omp_get_max_threads()
        blocksize = n / blocks
        !$omp parallel do private(l,h,i)
        do i = 0, blocks - 1
           l = i * blocksize + 1
           h = l + blocksize - 1
           !print *, "Low: ", l, " High: ", h
           call random_number(r(l:h))
        end do
    #else
        call random_number(r)
    #endif
        call system_clock (t2)
        print *, "Generating ", n, " double reals as an array took   ", t2-t1, &
             " ticks. => ind/arr = ", real(td, dp) / (t2 -t1)
      end subroutine genr8
    
    end program