Search code examples
fortranmpimessage-passing

Why MPI_REDUCE shows different number at some array locations?


program main_mpi_test

    use mpi
    implicit none
    integer(kind=8) :: n                  
    integer(kind=8) :: max_optical_depth  
    integer(kind=8) :: bin               
    integer(kind=8) :: tmax               
    integer(kind=8) :: sum_scatt          
    integer(kind=8) :: i,j,k
    real(kind=8)    :: rad                

    real(kind=8), dimension(:), allocatable :: send_results, recv_results
    integer :: nrank, nproc,ierr,root

    call MPI_INIT(ierr)
    call MPI_COMM_RANK(MPI_COMM_WORLD, nrank, ierr)
    call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr)
    call MPI_BARRIER(MPI_COMM_WORLD, ierr)
    root = 0

    max_optical_depth = 10
    bin = 10
    tmax = max_optical_depth*bin

    allocate(send_results(tmax))
    allocate(recv_results(tmax))
    
    do i = nrank+1, tmax, nproc
        rad = real(i)/real(bin)
        send_results(i) = rad
        print*,'send', rad, send_results(i)
    end do

    call MPI_BARRIER(MPI_COMM_WORLD, ierr)
    call MPI_REDUCE(send_results, recv_results, tmax, MPI_DOUBLE_PRECISION, &
                    MPI_SUM, root, MPI_COMM_WORLD, ierr)

    if (nrank ==0) then
        do i = 1, tmax
            rad = real(i)/real(bin)
            print*,'recv',rad, recv_results(i)
        end do
    end if

    call MPI_FINALIZE(ierr)
    deallocate(send_results)
    deallocate(recv_results)

end program main_mpi_test

This is my code. I use MPI_REDUCE. And I use mpiifort compiler.

And the problem is when I print recv_results, the value of rad and recv_results must be equal, but it shows Nan or a large number about 1e186 at a some locations in the array. In other locations, rad and recv_results have the same value.

How can I solve this problem?

print rad, recv_results


Solution

  • Your program has quite a lot of problems.

    1. As noted in the comments use of real(8) and integer(8) is bad practice. They are not portable, may not do what you expect, and may not even be supported by a compiler. Learn about Fortran kinds from your favourite Fortran book or tutorial, and the links Vladimir provides (Fortran: integer*4 vs integer(4) vs integer(kind=4) and Fortran 90 kind parameter) are useful supporting material. My preferred way would be to use the iso_fortran_env standard module, see my program below
    2. Whatever integer(8) does (if it is supported) it is unlikely to be compatible with the MPI library. MPI is defined (for most routines) in terms of default Fortran integers, so I strongly suggest you use default Fortran integers
    3. You don't initialize all of the array send_results. Thus any results are possible as you observe. Uninitialized things can take any value, only by luck will they be zero.
    4. The Real function with only one argument converts that argument to a value of default real kind, which is not what you want here. This is closely related to 1) above, but to get the kind you want you need to supply that kind. See my program below
    5. While not a bug use of MPI_Barrier in the code below is completely unnecessary and just wastes times - MPI_Reduce is already blocking. As a rule of thumb you never need an mpi_barrier except for measuring timing, debugging, and possibly when using shared memory

    ( For the purists I should also probably call MPI_Type_create_f90_real but I'll be lazy and assume MPI_DOUBLE_PRECISION maps to real64 )

    Anyway, here's my program, and it working with gfortran, ifort and ifx

    ijb@ijb-Latitude-5410:~/work/stack$ cat red.f90
    Program main_mpi_test
    
      Use iso_fortran_env, Only : wp => real64
    
      Use mpi
      
      Implicit None
    
      Real( wp ), Dimension(:), Allocatable :: send_results, recv_results
    
      Real( wp )    :: rad                
    
      Integer :: n                  
      Integer :: max_optical_depth  
      Integer :: bin               
      Integer :: tmax               
      Integer :: sum_scatt          
      Integer :: i,j,k
      Integer :: nrank, nproc, ierr, root
    
      Logical :: worked
      
      Call MPI_INIT(ierr)
      Call MPI_COMM_RANK(MPI_COMM_WORLD, nrank, ierr)
      Call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr)
      Call MPI_BARRIER(MPI_COMM_WORLD, ierr)
      root = 0
      
      max_optical_depth = 10
      bin = 10
      tmax = max_optical_depth*bin
      
      Allocate(send_results(tmax))
      Allocate(recv_results(tmax))
    
      send_results = 0.0_wp
      Do i = nrank+1, tmax, nproc
         rad = Real( i, Kind( rad ) ) / Real( bin, Kind( rad ) )
         send_results(i) = rad
    !     Print*,'send', rad, send_results(i)
      End Do
      
      Call MPI_REDUCE(send_results, recv_results, tmax, MPI_DOUBLE_PRECISION, &
           MPI_SUM, root, MPI_COMM_WORLD, ierr)
      
      If (nrank ==0 ) Then
         worked = .true.
         Do i = 1, tmax
            rad = Real( i, Kind( rad ) ) / Real( bin, Kind( rad ) )
    !        Write( *, * ) 'recv', rad, recv_results(i)
            worked = worked .And. ( Abs( rad - recv_results(i) ) < 1e-15_wp  )
         End Do
         Write( *, * ) "Worked? ", worked
      End If
      
      Deallocate(send_results)
      Deallocate(recv_results)
      
      Call MPI_FINALIZE(ierr)
    
    End Program main_mpi_test
      
    ijb@ijb-Latitude-5410:~/work/stack$ mpif90 -Wall -Wextra -fcheck=all -O -g red.f90
    red.f90:18:16:
    
       18 |   Integer :: i,j,k
          |                1
    Warning: Unused variable ‘j’ declared at (1) [-Wunused-variable]
    red.f90:18:18:
    
       18 |   Integer :: i,j,k
          |                  1
    Warning: Unused variable ‘k’ declared at (1) [-Wunused-variable]
    red.f90:13:14:
    
       13 |   Integer :: n
          |              1
    Warning: Unused variable ‘n’ declared at (1) [-Wunused-variable]
    red.f90:17:22:
    
       17 |   Integer :: sum_scatt
          |                      1
    Warning: Unused variable ‘sum_scatt’ declared at (1) [-Wunused-variable]
    ijb@ijb-Latitude-5410:~/work/stack$ mpirun -np 4 ./a.out
     Worked?  T
    ijb@ijb-Latitude-5410:~/work/stack$ mpiifort red.f90
    ijb@ijb-Latitude-5410:~/work/stack$ mpirun -np 4 ./a.out
     Worked?  T
    ijb@ijb-Latitude-5410:~/work/stack$ mpiifort -fc=ifx red.f90
    ijb@ijb-Latitude-5410:~/work/stack$ mpirun -np 4 ./a.out
     Worked?  T
    ijb@ijb-Latitude-5410:~/work/stack$