Search code examples
cmpidistributed-computing

MPI_Recv not receiving all MPI_Send requests


I have a bug in my code. I have multiple processes all processing data from a binary tree. At the end, they should send the results to the master node (node 0) where the results will be processed. However, for some reason, some of the MPI_Sends are not being received.

int *output=(int*) malloc(sizeof(int)*(varNum+2)); //contains all variable values and maxSAT and assignNum

if(proc_id!=0 && proc_id<nodeNums){
    output[0]=maxSAT;
    output[1]=assignNum;
    for(i=2;i<varNum+2;i++){
        output[i]=varValues[i-2];
    }
    MPI_Send(output,varNum+2,MPI_INT,0,TAG,MPI_COMM_WORLD);
    printf("proc %d sent data\n",proc_id);
}
else if(proc_id==0){
    for(i=1;i<nodeNums;i++){
        printf("receiving data from %d\n",i);
        MPI_Recv(output,varNum+2,MPI_INT,i,TAG,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
        if(output[0]>maxSAT){
            maxSAT=output[0];
            assignNum=output[1];
            for(i=0;i<varNum;i++){
                varValues[i]=output[i+2];
            }   
        }
        else if(output[0]==maxSAT){
            assignNum+=output[1];
        }
    }
}

When I run it with 8 processes (nodeNums=8), this is the output.

proc 2 sent data
receiving data from 1
proc 5 sent data
proc 6 sent data
proc 3 sent data
proc 7 sent data
proc 1 sent data
proc 4 sent data

For some reason, all processes are sending data, but it is only receiving from 1. However, if I run it with 4 processes, everything is sent/received. Anyone has any idea why this happens?


Solution

  • The problem has nothing to do with MPI. Your mistake is the use of the same variable in two different but nested loops:

    else if(proc_id==0){
        for(i=1;i<nodeNums;i++){ <----------------- (1)
            ...
                for(i=0;i<varNum;i++){ <----------- (2)
                    varValues[i]=output[i+2];
                }
            ...
        }
    }
    

    After the inner loop completes, the value of i is equal to varNum and if it happens that varNum is greater or equal to nodeNums, the outer loop terminates too. Change the name of the loop variable of the inner loop.