Search code examples
casynchronousbuffermpisend

MPI - Asynchronous ring communication


I am trying to implement a simple MPI program where I call a function foo() on which -n processes pass around n arrays until all arrays have been passed to all processes(n steps). The implementation is in a form of ring communication where process#0 sends to process#2 and receives from process#n-1 etc. Here is my code:

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

void foo(double* X, int n,int d,int k)
{
    int id, world_size;
    double *array = X;
    double *array_buff = malloc(n*d*sizeof(double));

    MPI_Comm_rank(MPI_COMM_WORLD,  &id);
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);
    MPI_Status status;

    array[0] = id;

    for(int step = 0 ; step < world_size ; step++)
    {
        int temp = id-step;
        if(temp<0)
        {
            temp = world_size+temp;
        }

        if(array[0]!= temp)
        {
            printf("[%d][%d] %f (start)\n", id, step, array[0]);
        }

        //even processes send first while odd process receive
        //first in order to avoid deadlock - achieve synchronization
        MPI_Request reqsend, reqrecv;
        int dst = (id+1)%world_size;
        if(id%2 == 0)
        {
            int src = id-1;
            if(id == 0)
            {
                src = world_size-1;
            }
            MPI_Isend(array, n*d, MPI_DOUBLE, dst, 0, MPI_COMM_WORLD, &reqsend);
            MPI_Irecv(array_buff, n*d, MPI_DOUBLE, src, 0, MPI_COMM_WORLD, &reqrecv);

        }
        else
        {

            MPI_Irecv(array_buff, n*d, MPI_DOUBLE, id-1, 0, MPI_COMM_WORLD, &reqrecv);
            MPI_Isend(array, n*d, MPI_DOUBLE, dst, 0, MPI_COMM_WORLD, &reqsend);

        }   

        //..some array[] related work is been done here...           

        if(array[0]!=temp)
        {
            printf("[%d][%d] %f (end)\n", id, step, array[0]);
        }

        //update array asynchronously
        MPI_Wait(&reqrecv, &status);
        array = array_buff;
    }


    free(array);

}

In order to avoid deadlock even numbered processes send first and receive afterwards while odd numbered processes receive first and send afterwards.

The initialization: array[0] = id; aims to control whether the communication is done correctly. That's why I use two prints one at the begin and one at the end of my function in order to observe whether array[] changes contents before the array = array_buff; assignment (where array[] takes the buffered value) takes place.

One of the outputs I get is:

[0][2] 1.000000 (start)
[0][2] 1.000000 (end)
[0][3] 0.000000 (start)
[0][3] 0.000000 (end)
[1][3] 1.000000 (start)
[1][3] 1.000000 (end)
[3][1] 1.000000 (end)
[3][2] 0.000000 (end)
[3][3] 3.000000 (end)

Why are the contents of array[] altered before the array_buff[] assignment?

Note: I have read that in some MPI versions it's not ok to use array[] (the Isend() buffer) before the send has been completed. But I don't think that's the case here because even if I don't use array[] at all after the Isend() Irecv() segment the program behavior is still the same.

I have tried setting the tags to match each particular send, using an identifier value which is the step value but I get a deadlock and I don't get why. I would appreciate an explanation on that. tag = step;


Solution

  • The problem is array = array_buff. While array[] gets the correct values on the first step, after that when array_buff updates on the 2nd, 3rd etc step it changes the contents of array[] too as they point to the same memory.

    The problem is solved with:

    for(int i=0; i<n*d ; i++)
        {
            array[i] = array_buff[i];
        }