I am struggled to write the code to perform like the following example similar to the Up Phase part in prefix scan and not want to use the function MPI_Scan
:
WholeArray[16] = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
Processor 0 got [0 , 1 , 2 , 3] , Processor 1 got [4 , 5 , 6 , 7]
Processor 2 got [8 , 9 , 10 , 11] , Processor 3 got [12 , 13 , 14 , 15]
To send and sum the last array with 2 strides:
(stride 1)
Processor 0 send Array[3] , Processor 1 receive from Processor 0 and add to Array[3]
Processor 2 send Array[3], Processor 3 receive from Processor 2 and add to Array[3]
(stride 2)
Processor 1 sends Array[3], Processor 3 receive from Processor 1 and add to Array[3]
At last I would want to use MPI_Gather
to let the result be:
WholeArray = [0 , 1 , 2 , 3 , 4 , 5 , 6 ,10 , 8 , 9 , 10 , 11 , 12 , 13 ,14 , 36]
I find it hard to write code to let the program do like the following 4nodes example:
(1st stride) - Processor 0 send to Processor 1 and Processor 1 receive from Processor 0
(1st stride) - Processor 2 send to Processor 3 and Processor 3 receive from Processor 2
(2nd stride) - Processor 1 send to Processor 3 and Processor 3 receive from Processor 1
Here is the code that I have written so far:
int Send_Receive(int* my_input, int size_per_process, int rank, int size)
{
int key = 1;
int temp = my_input[size_per_process-1];
while(key <= size/2)
{
if((rank+1) % key == 0)
{
if(rank/key % 2 == 0)
{
MPI_Send(&temp, 1, MPI_INT, rank+key,0,MPI_COMM_WORLD);
}
else
{
MPI_Recv(&temp, 1, MPI_INT, rank-key,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);
my_input[size_per_process]+= temp;
}
key = 2 * key;
MPI_Barrier(MPI_COMM_WORLD);
}
}
return (*my_input);
}
There is some issues in your code, namely 1) it always send the same temp
variable across processes.
MPI_Send(&temp, 1, MPI_INT, rank+key,0,MPI_COMM_WORLD);
The temp
variable is initialized before the loop:
int temp = my_input[size_per_process-1];
while(key <= size/2)
{ ...}
but never updated inside the loop. This leads to wrong results, since after the first stride the last element of the my_input
array will be different for some processes. Instead you should do:
temp = localdata[size_per_process-1];
MPI_Send(&temp, 1, MPI_INT, rank+key, 0, MPI_COMM_WORLD);
Moreover, 2) the following statement
my_input[size_per_process]+= temp;
does not add temp
to the last position of the array my_input
. Instead, it should be:
my_input[size_per_process-1]+= temp;
Finally, 3) there is deadlock and infinite loop issues. For starters having a call to a collective communication routine such as MPI_barrier
inside a single conditional is typically a big red flag. Instead of:
while(key <= size/2)
{
if((rank+1) % key == 0){
...
MPI_Barrier(MPI_COMM_WORLD);
}
}
you should have:
while(key <= size/2)
{
if((rank+1) % key == 0){
...
}
MPI_Barrier(MPI_COMM_WORLD);
}
to ensure that every process calls the MPI_Barrier
.
The infinite loop happens because the while
condition depends on the update of the key
, but the key
is only update when if((rank+1) % key == 0)
evaluates to true
. Therefore, when if((rank+1) % key == 0)
evaluates to false
, the process will never update the key
, and consequently, gets stuck in an infinite loop.
I running example with all the problems fixed :
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char **argv){
int rank, mpisize, total_size = 16;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &mpisize);
int *data = NULL;
if(rank == 0){
data = malloc(total_size * sizeof(int));
for(int i = 0; i < total_size; i++)
data[i] = i;
}
int size_per_process = total_size / mpisize;
int *localdata = malloc(size_per_process * sizeof(int));
MPI_Scatter(data, size_per_process, MPI_INT, localdata, size_per_process, MPI_INT, 0, MPI_COMM_WORLD);
int key = 1;
int temp = 0;
while(key <= mpisize/2){
if((rank+1) % key == 0){
if(rank/key % 2 == 0){
temp = localdata[size_per_process-1];
MPI_Send(&temp, 1, MPI_INT, rank+key, 0, MPI_COMM_WORLD);
}
else {
MPI_Recv(&temp, 1, MPI_INT, rank-key, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
localdata[size_per_process-1]+= temp;
}
}
key = 2 * key;
MPI_Barrier(MPI_COMM_WORLD);
}
MPI_Gather(localdata, size_per_process, MPI_INT, data, size_per_process, MPI_INT, 0, MPI_COMM_WORLD);
if(rank == 0){
for(int i = 0; i < total_size; i++)
printf("%d ", data[i]);
printf("\n");
}
free(data);
free(localdata);
MPI_Finalize();
return 0;
}
Input:
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
Output:
[0,1,2,3,4,5,6,10,8,9,10,11,12,13,14,36]