c performance parallel-processing mpi hpc

How to add all the values inside an 2d array using MPI

I am trying to build a program with multi-dimensional array in C using MPI (assignment).

The below program runs but gives wrong value in 2 output lines. a is an multi-dimensional array. I do not contain any 0 value. But the second output line is partial process: values are 0 and 0. Why is it printing 0 value, there is no 0 value in my a array.

This is my basic program

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
// size of array
#define n 6

int a[6][2] = { {2,3},{51,55},{88,199},{335,34534},{678,683},{98,99} };

// Temporary array for slave process
int a2[1000][2];

int main(int argc, char* argv[])
{

    int pid, np,
        elements_per_process,
        n_elements_recieved;
    // np -> no. of processes
    // pid -> process id

    MPI_Status status;

    // Creation of parallel processes
    MPI_Init(&argc, &argv);

    // find out process ID,
    // and how many processes were started
    MPI_Comm_rank(MPI_COMM_WORLD, &pid);
    MPI_Comm_size(MPI_COMM_WORLD, &np);

    // master process
    if (pid == 0) {

        int index, i;
        elements_per_process = n / np;

        // check if more than 1 processes are run
        if (np > 1) {
            // distributes the portion of array
            // to child processes to calculate
            // their partial sums
            for (i = 1; i < np - 1; i++) {
                index = i * elements_per_process;

                MPI_Send(&elements_per_process,
                    1, MPI_INT, i, 0,
                    MPI_COMM_WORLD);
                MPI_Send(&a[index],
                    elements_per_process,
                    MPI_INT, i, 0,
                    MPI_COMM_WORLD);
            }
            // last process adds remaining elements
            index = i * elements_per_process;
            int elements_left = n - index;

            MPI_Send(&elements_left,
                1, MPI_INT,
                i, 0,
                MPI_COMM_WORLD);
            MPI_Send(&a[index],
                elements_left,
                MPI_INT, i, 0,
                MPI_COMM_WORLD);
        }

        // master process add its own sub array
        for (i = 0; i < elements_per_process; i++)
            printf("master process: values are %d and %d\n", a[i][0], a[i][1]);

        // collects partial sums from other processes
        int tmp;
        for (i = 1; i < np; i++) {
            MPI_Recv(&tmp, 1, MPI_INT,
                MPI_ANY_SOURCE, 0,
                MPI_COMM_WORLD,
                &status);
            int sender = status.MPI_SOURCE;
        }

    }
    // slave processes
    else {
        MPI_Recv(&n_elements_recieved,
            1, MPI_INT, 0, 0,
            MPI_COMM_WORLD,
            &status);

        // stores the received array segment
        // in local array a2
        MPI_Recv(&a2, n_elements_recieved,
            MPI_INT, 0, 0,
            MPI_COMM_WORLD,
            &status);

        // calculates its partial sum
        int useless_fornow = -1;
        for (int i = 0; i < n_elements_recieved; i++) {
            printf("partial process: values are %d and %d \n", a2[i][0], a2[i][1]);
        }
        // sends the partial sum to the root process
        MPI_Send(&useless_fornow, 1, MPI_INT,
            0, 0, MPI_COMM_WORLD);
    }

    // cleans up all MPI state before exit of process
    MPI_Finalize();

    return 0;
}

and this is the output :

partial process: values are 678 and 683

partial process: values are 0 and 0

master process: values are 2 and 3

master process: values are 51 and 55

partial process: values are 88 and 199

partial process: values are 0 and 0

I am running it with 3 process using this command mpiexec.exe -n 3 Project1.exe

Solution

The master sends &a[index] to the other processes, namely:

For the process 1 the index is 2, so the master is sending {88,199};
For the process 2 the index is 4, so the master is sending {678,683}.

Therefore, to send different elements you need to fix the index calculation.

In the second MPI_Recv

MPI_Recv(&a2, n_elements_recieved, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);

you specified that the position where the elements to be received should be copied to is &a2, which is the beginning of the 2D array a2. And you also stated that you expect to receive n_elements_recieved. So the master sends an array to each process, and each process expects to receive an array, so far so good. The problem comes with your logic to print the data that you have received, namely:

for (int i = 0; i < n_elements_recieved; i++) {
    printf("partial process: values are %d and %d \n", a2[i][0], a2[i][1]);
}

You are printing by columns, but you have received an 1D array not a 2D one.

IMO you can simply your approach to:

Each process receives first the total of elements that they will receive in the next MPI_Recv call:

MPI_Recv(&n_elements_recieved, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);

then they allocate an array with that size:

int *tmp = malloc(sizeof(int) * n_elements_recieved);

then they receive the data:

MPI_Recv(tmp, n_elements_recieved, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);

and finally print all the element on the array:

for(int i = 0; i < n_elements_recieved; i++)
   printf("partial process: values are %d \n", tmp[i]);

If you want for the master process to send the entire 2D array to all the others processes, you can use a MPI_Bcast:

Broadcasts a message from the process with rank "root" to all other processes of the communicator

You can take advantage of the fact that your 2D array is continuously allocated in memory, and perform a single MPI_Bcast to broadcast the 2D array, which simplifies the code greatly as you can see:

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char* argv[])
{
    int pid, np;
    MPI_Status status;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &pid);
    MPI_Comm_size(MPI_COMM_WORLD, &np);

    int rows = (pid == 0) ? 6 : 0;
    int cols = (pid == 0) ? 2 : 0;    
    MPI_Bcast(&rows, 1, MPI_INT, 0, MPI_COMM_WORLD);
    MPI_Bcast(&cols, 1, MPI_INT, 0, MPI_COMM_WORLD);
    printf("%d, %d\n",rows, cols);
 
    int a[6][2];
    if(pid == 0){
    // just simulating some data.
        int tmp[6][2] = { {2,3},{51,55},{88,199},{335,34534},{678,683},{98,99} };
        for(int i = 0; i < 6; i++)
           for(int j = 0; j < 2; j++)
              a[i][j] = tmp[i][j];
    }
    MPI_Bcast(&a, rows * cols, MPI_INT, 0, MPI_COMM_WORLD);
    MPI_Finalize();

    return 0;
}

Instead of 3 MPI_Send/MPI_Recv per process, you just need 3 MPI_Bcast for all the processes.