Search code examples
csynchronizationmpibroadcastmpich

MPI_Bcasts hangs on slave


I am trying to input a number on one computer and then broadcast it to all other computers using MPI.

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

int main (int argc, char** argv)
{

int myid, numprocs, processor_name_length;
char processor_name[MPI_MAX_PROCESSOR_NAME];

MPI_Init (0, 0);
MPI_Comm_rank (MPI_COMM_WORLD, &myid);
MPI_Comm_size (MPI_COMM_WORLD, &numprocs);
MPI_Get_processor_name (processor_name, &processor_name_length);

int number = 0;
if (myid == 0) {
    printf ("Input number: ");
    scanf ("%d", &number);
}

MPI_Bcast(&number, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf ("Hello from process %d of %d. Number: %d [%s]\n", myid, numprocs, number, processor_name);

MPI_Finalize ();
return 0;

}

when I compile it and run it:

mpicc -o bcast bcast.c
mpiexec -hosts umaster,uslavea -n 2 ./bcast

It promts me for input on master machine, then it prints this message with printf after I input number and then it hangs..

Output:

Input number: 10
Hello from process 0 of 2. Number: 10 [umaster]

There should be message:

Hello from process 1 of 2. Number: 10 [uslavea]

EDIT:

If I run with this command:

mpiexec -hosts master -n 4 ./bcast

Everything works, also I have another example where I use MPI_Send(...) and I am getting connection refused error, however if I run that example on single computer everything works fine. I assume my configuration/cluster is not ok. Next example works fine:

#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"

int main (int argc, char** argv)
{
    int myid, numprocs;

    MPI_Init (0, 0);
    MPI_Comm_rank (MPI_COMM_WORLD, &myid);
    MPI_Comm_size (MPI_COMM_WORLD, &numprocs);
    printf ("Hello from process %d of %d.\n", myid, numprocs);
    MPI_Finalize ();
    return 0;
}

running it with:

mpiexec -hosts master,uslavea,uslaveb,uslavec -n 4 ./hello

I have created 4 virtual machines, generated dsa keys and I can login from each one to each one using ssh without asking for password. (from master: ssh mpiuser@uslavea for example). On all machines, user is mpiuser and password is same for each machine.

What could be the problem? I repeat running it only on master with -n X works fine.


Solution

  • I figured it out. Problem is that I am using name for servers instead of ip addresses. Running it

    mpiexec -hosts 192.168.100.100,192.168.100.101,192.168.100.102,192.168.100.103 -n 4 ./bcast
    

    solves the problem.