Search code examples
fortranmpiintel-mpi

Fatal error in PMPI_Send: Invalid tag, error stack: MPI_Send(


I am using MPI to parallelise my Fortran code. The code below is the parellelisation part. I use 2 nodes to run the program.

1     DO i = 1, km(1), 1
2        DO j = 1, km(2), 1
3           DO k = 1, km(3), 1
4              IF (MOD((i-1)*km(2)*km(3)+(j-1)*km(3)+k-1, world_size) /= world_rank) CYCLE
5              CALL TRANSPORT(i,j,k,dk,ra,lz,iy,ch,nm,te,nb,po,tv,lr,ei,ks,ol,vm,t2,t3)
6              IF (world_rank == 0) THEN
7                 c2 = c2 + t2
8                 c4 = c4 + t3
9                 DO l = 1, world_size-1, 1
10                   IF ((i-1)*km(2)*km(3)+(j-1)*km(3)+k-1+l == km(1)*km(2)*km(2)) EXIT
11                   m = (i-1)*km(2)*km(3)+(j-1)*km(3)+k + l + 100000
12                   CALL MPI_RECV(c3,nm,MPI_DOUBLE_COMPLEX,l,m,MPI_COMM_WORLD,MPI_STATUS_IGNORE,ierr)
13                   c2 = c2 + c3
14                   n = (i-1)*km(2)*km(3)+(j-1)*km(3)+k + l + 1000000
15                   CALL MPI_RECV(c5,nm,MPI_DOUBLE_COMPLEX,l,n,MPI_COMM_WORLD,MPI_STATUS_IGNORE,ierr)
16                   c4 = c4 + c5
17                END DO
18             ELSE
19                m = (i-1)*km(2)*km(3)+(j-1)*km(3)+k + 100000
20                CALL MPI_SEND(t2,nm,MPI_DOUBLE_COMPLEX,0,m,MPI_COMM_WORLD,ierr)
21                n = (i-1)*km(2)*km(3)+(j-1)*km(3)+k + 1000000
22                CALL MPI_SEND(t3,nm,MPI_DOUBLE_COMPLEX,0,n,MPI_COMM_WORLD,ierr)
23             END IF
24          END DO
25       END DO
26    END DO
27    print*,'before final'
28    CALL MPI_BARRIER(MPI_COMM_WORLD,ierr)
29    print*,'after final'

What I found is that if the variables are 'km(1)=1, km(2)=1 and km(3)=2', the calculation could be done successfully; while, if the variables are 'km(1)=2, km(2)=2 and km(3)=1', the program can only outputs 'before final' on the 27th line and cannot output 'after final' on the 29th line.

It prints

 before final
 before final
Abort(604623620) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Send: Invalid tag, error stack:
PMPI_Send(157): MPI_Send(buf=0x7ffd802631c0, count=100, MPI_DOUBLE, dest=0, tag=1000002, MPI_COMM_WORLD) failed
PMPI_Send(95).: Invalid tag, value is 1000002

In other words, my code is stuck at the 'CALL MPI_BARRIER(MPI_COMM_WORLD,ierr)' command on the 28th line.

The reason why I added 28th line is because I want to output the 'c2' and 'c4' arrays after all process finishes the program. Would anyone please tell me why the code is stuck on the 28th line and how to resolve it? Thank you very much.

This is a simple reproducible code, which has the same structure as my original Fortran code. Would you please have a look at it and give me some suggestions? Thank you.

SUBROUTINE SUBROUT(i,j,k,t2,t3)
IMPLICIT NONE
INTEGER, PARAMETER :: dp=SELECTED_REAL_KIND(15,14)
INTEGER            :: i, j, k
REAL (KIND=dp)     :: t2(100), t3(100)
INTEGER            :: l, m, n

m = i*10+j*12+k-3
n = i*11+j-3+k*15

DO l = 1, 100, 1
   t2(l) = DBLE(l)+DBLE(m)
   t3(l) = DBLE(l)+DBLE(n)
END DO

RETURN
END SUBROUTINE SUBROUT

PROGRAM TEST
USE MPI
IMPLICIT NONE
INTEGER, PARAMETER :: dp=SELECTED_REAL_KIND(15,14)
INTEGER            :: i, j, k, l, m, n, km(3)
REAL (KIND=dp)     :: t2(100), t3(100), c2(100), c3(100), c4(100), c5(100)
INTEGER            :: world_size, world_rank, ierr

CALL MPI_INIT(ierr)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,world_size,ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,world_rank,ierr)

km(1) = 1
km(2) = 1
km(3) = 2

DO i = 1, km(1), 1
   DO j = 1, km(2), 1
      DO k = 1, km(3), 1
         IF (MOD((i-1)*km(2)*km(3)+(j-1)*km(3)+k-1, world_size) /= world_rank) CYCLE
         CALL SUBROUT(i,j,k,t2,t3)
         IF (world_rank == 0) THEN
            c2 = c2 + t2
            c4 = c4 + t3
            DO l = 1, world_size-1, 1
               IF ((i-1)*km(2)*km(3)+(j-1)*km(3)+k-1+l == km(1)*km(2)*km(2)) EXIT
               m = (i-1)*km(2)*km(3)+(j-1)*km(3)+k + l + 100000
               CALL MPI_RECV(c3,100,MPI_DOUBLE,l,m,MPI_COMM_WORLD,MPI_STATUS_IGNORE,ierr)
               c2 = c2 + c3
               n = (i-1)*km(2)*km(3)+(j-1)*km(3)+k + l + 1000000
               CALL MPI_RECV(c5,100,MPI_DOUBLE,l,n,MPI_COMM_WORLD,MPI_STATUS_IGNORE,ierr)
               c4 = c4 + c5
            END DO
         ELSE
            m = (i-1)*km(2)*km(3)+(j-1)*km(3)+k + 100000
            CALL MPI_SEND(t2,100,MPI_DOUBLE,0,m,MPI_COMM_WORLD,ierr)
            n = (i-1)*km(2)*km(3)+(j-1)*km(3)+k + 1000000
            CALL MPI_SEND(t3,100,MPI_DOUBLE,0,n,MPI_COMM_WORLD,ierr)
         END IF
      END DO
   END DO
END DO
print*,'before final'
CALL MPI_BARRIER(MPI_COMM_WORLD,ierr)
print*,'after final'

IF (world_rank == 0) THEN
    WRITE (UNIT=*, FMT=*) c2
    WRITE (UNIT=*, FMT=*) c4
END IF

CALL MPI_FINALIZE(ierr)

STOP
END PROGRAM TEST

This is my script file for job submission.

#!/bin/sh
#SBATCH --partition=3080ti
#SBATCH --job-name=test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12

module load compiler/2022.0.2
module load mkl/2022.0.2
module load mpi/2021.5.1
mpirun ./test

This is my compiling file.

#!/bin/sh
#SBATCH --partition=cpu
#SBATCH --job-name=qt
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

module load compiler/2022.0.2
module load mkl/2022.0.2
module load mpi/2021.5.1

mpiifort -qmkl -coarray -fp-model strict -no-wrap-margin -g -traceback -check test.f90 -o test

Solution

  • It's alluded to by Vladimir in

    Some could make the mistake of including unnecessary distinguishing information into the tag. E.g., the ranks of the sender and the receiver do not have to be included, they are a part of the envelope.

    but not mentioned explicitly: if you can find just one tag value, the simplest being 0, that's not also used in an asynchronous fashion, it's okay to just skip the calculation of m and n and simply use 0 in their place. This assumes rank 0 "knows" what data will be computed by each rank but in most programs that's indeed the case.

    Simplifying your program to

        SUBROUTINE SUBROUT(i,j,k,t2,t3)
          IMPLICIT NONE
          INTEGER, PARAMETER :: dp=SELECTED_REAL_KIND(15,14)
          INTEGER            :: i, j, k
          REAL (KIND=dp)     :: t2(100), t3(100)
          INTEGER            :: l, m, n
    
          m = i*10+j*12+k-3
          n = i*11+j-3+k*15
    
          DO l = 1, 100, 1
             t2(l) = DBLE(l)+DBLE(m)
             t3(l) = DBLE(l)+DBLE(n)
          END DO
    
          RETURN
        END SUBROUTINE SUBROUT
    
        PROGRAM TEST
          USE MPI
          IMPLICIT NONE
          INTEGER, PARAMETER :: dp=SELECTED_REAL_KIND(15,14)
          INTEGER            :: i, j, k, l, km(3)
          REAL (KIND=dp)     :: t2(100), t3(100), c2(100), c3(100), c4(100), c5(100)
          INTEGER            :: world_size, world_rank, ierr
    
          CALL MPI_INIT(ierr)
          CALL MPI_COMM_SIZE(MPI_COMM_WORLD,world_size,ierr)
          CALL MPI_COMM_RANK(MPI_COMM_WORLD,world_rank,ierr)
    
          km(1) = 1
          km(2) = 1
          km(3) = 2
    
          DO i = 1, km(1), 1
             DO j = 1, km(2), 1
                DO k = 1, km(3), 1
                   IF (MOD((i-1)*km(2)*km(3)+(j-1)*km(3)+k-1, world_size) /= world_rank) CYCLE
                   CALL SUBROUT(i,j,k,t2,t3)
                   IF (world_rank == 0) THEN
                      c2 = c2 + t2
                      c4 = c4 + t3
                      DO l = 1, world_size-1, 1
                         IF ((i-1)*km(2)*km(3)+(j-1)*km(3)+k-1+l == km(1)*km(2)*km(2)) EXIT
                         CALL      MPI_RECV(c3,100,MPI_DOUBLE,l,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE,ierr)
                         c2 = c2 + c3
                         CALL      MPI_RECV(c5,100,MPI_DOUBLE,l,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE,ierr)
                         c4 = c4 + c5
                      END DO
                   ELSE
                      CALL MPI_SEND(t2,100,MPI_DOUBLE,0,0,MPI_COMM_WORLD,ierr)
                      CALL MPI_SEND(t3,100,MPI_DOUBLE,0,0,MPI_COMM_WORLD,ierr)
                   END IF
                END DO
             END DO
          END DO
          print*,'before final'
          CALL MPI_BARRIER(MPI_COMM_WORLD,ierr)
          print*,'after final'
    
    
          IF (world_rank == 0) THEN
              WRITE (UNIT=*, FMT=*) c2
              WRITE (UNIT=*, FMT=*) c4
          END IF
     
          CALL MPI_FINALIZE(ierr)
    
          STOP
        END PROGRAM TEST
    

    should result in something that works on many MPI implementations, at least with respect to the message passing and assuming that the effective MPI_Reduce does not overwhelm the network. MPI_Reduce is probably also the next routine you should read up on.