Search code examples
cperformanceparallel-processingmpiopenmpi

Program Stuck while sending data to slaves MPI


I'm developing an application in c in which the user wants to find certain pattern of 2 digit numbers in a 2 Dimensional array.

For Example, There is a 10x10 array with random single digit numbers and user wants to find 1,0. Our program will search for 1 and when it is found, our program will search for 0 in all directions(top, bottom, sides, diagonals and anti diagonals) to depth 1. Simply, we can say it will search zero on the sides of 1 in a sub-matrix of size 3x3. The function search_number() is performing the job for searching second digit.

I've implemented sequential code for it and I'm trying to convert it into MPI.

I'm super noob with MPI and practicing it first time.

Here is my attempt with MPI.

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define N   255
#define BS  N/2

MPI_Status status;
int search_number(int arr[N][N],int row,int col,int digit_2){
    int count=0;
    for (int i=row-1;i<=row+1;i++){     //from -row to +row = 3 indexes for rows
        for(int j=col-1;j<=col+1;j++){  //from -col to +col = 3 indexes for cols
            // skip for [row,col] and -1 for both [i,j] as well as till maximum size
            if(i<0 || j<0 || i>=N || j>=N || i==row && j==col) continue;
            if(arr[i][j] == digit_2){ //if second number is found, increase the counter
                count++;
            }
        }
    }
    return count;
}

int main(int argc, char **argv)
{
    int nproc,taskId,source,i,j,k,positionX,positionY;
    int sum=0;
    MPI_Datatype type;
    int a[N][N];

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &taskId);
    MPI_Comm_size(MPI_COMM_WORLD, &nproc);

    MPI_Type_vector(N, BS, N, MPI_INT, &type);
    MPI_Type_commit(&type);

    //root
    if (taskId == 0) {
        srand( time(NULL) );
        //Generate two NxN matrix
        for (i=0; i<N; i++) {
            for (j=0; j<N; j++) {
                a[i][j]= rand()%10;
            }
        }

        printf("Passing 1st chunk:\n");
        // first chunk
        MPI_Send(&a[0][0], BS*N, MPI_INT,0,0, MPI_COMM_WORLD);
        MPI_Send(&a[0][0], BS*N, MPI_INT,1,1, MPI_COMM_WORLD);
        printf("Passing 2nd Chunk:\n");
        //second chunk
        MPI_Send(&a[BS][0], BS*N, MPI_INT,2,2, MPI_COMM_WORLD);
        MPI_Send(&a[BS][0], BS*N, MPI_INT,3,3, MPI_COMM_WORLD);
    }

    //workers
    source = 0;
    MPI_Recv(&a, N*N, MPI_INT, source, taskId, MPI_COMM_WORLD, &status);

    for(int i=0;i<N;i++){
        for(int j=0;j<N;j++){
            if (a[i][j]==1) { // if found 1, pass its index i,j to search_number() function
                sum+= search_number(a,i,j,0);  // funtion will return the count of 0's shared with 1
            }
        }
    }
    //Send result to root
    MPI_Send(&sum, BS, MPI_INT, 0, 4, MPI_COMM_WORLD);

    //root receives results
    if(taskId == 0)
    {
        printf("Count: %d\n",sum);
        // printMatrix(resultFinal);
    }

    MPI_Finalize();
}

The issue I'm facing is my program gets stuck at Passing Chunk 1 line if I pass set N>255 on top. But works until 0 to 255. Can you point out my mistake?


Solution

  • The issue I'm facing is my program gets stuck at Passing Chunk 1 line if I pass set N>255 on top. But works until 0 to 255.

    As @Gilles Gouaillardet already pointed out in the comments, and more detailed on this answer:

    MPI_Send() is allowed to block until a matching receive is posted (and that generally happens when the message is "large") ... and the required matching receive never gets posted.

    A typical fix would be to issue a MPI_Irecv(...,src = 0,...) on rank 0 before the MPI_Send() (and MPI_Wait() after), or to handle 0 -> 0 communication with MPI_Sendrecv().

    Besides that your parallelization seems wrong, namely:

    MPI_Send(&a[0][0], BS*N, MPI_INT,0,0, MPI_COMM_WORLD);
    MPI_Send(&a[0][0], BS*N, MPI_INT,1,1, MPI_COMM_WORLD);
    

    to the process 0 and 1 you have send the same workload, and :

    MPI_Send(&a[BS][0], BS*N, MPI_INT,2,2, MPI_COMM_WORLD);
    MPI_Send(&a[BS][0], BS*N, MPI_INT,3,3, MPI_COMM_WORLD);
    

    with the process 2 and 3 the same issue.

    You should try to use a stencil alike approach where each process only shares the borders among them. For instance, a possible distribution, for a 4x4 matrix and 4 processes could be:

    • process 0 works with the rows 0th, 1th and 2th;
    • process 1 works with the rows 2th, 3th and 4th;
    • process 2 works with the rows 4th, 5th, 6th;
    • process 3 works with the rows 7th, 8th, 9th;

    Currently, to each process you send BS*N elements, however in:

    MPI_Recv(&a, N*N, MPI_INT, source, taskId, MPI_COMM_WORLD, &status);
    

    you specify that you are expecting to receive N*N.

    Moreover in:

       for(int i=0;i<N;i++){
            for(int j=0;j<N;j++){
                if (a[i][j]==1) { // if found 1, pass its index i,j to search_number() function
                    sum+= search_number(a,i,j,0);  // funtion will return the count of 0's shared with 1
                }
            }
        }
    

    processes are working with positions of the matrix a that they did not receive, naturally that should not be the case.

    Finally instead of

    //Send result to root
    MPI_Send(&sum, BS, MPI_INT, 0, 4, MPI_COMM_WORLD);
    

    you should actually use a MPI_Reduce i.e.,

    Reduces values on all processes to a single value