I have built up a mpich2 cluster, and the machinefile
is:
pc3@ub3:4 # this will spawn 4 process on ub3
pc1@ub1 # this will spawn 1 process on ub1
when I run the test process, it should print:
Hello from processor 0 of 8
Hello from processor 1 of 8
Hello from processor 2 of 8
Hello from processor 3 of 8
Hello from processor 4 of 8
Hello from processor 5 of 8
Hello from processor 6 of 8
Hello from processor 7 of 8
But it returned:
pc1@ub1:~$ mpiexec -n 8 -f machinefile ./mpi_hello
[proxy:0:0@ub3] launch_procs (./pm/pmiserv/pmip_cb.c:648): unable to change wdir to /home/pc1 (No such file or directory)
[proxy:0:0@ub3] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:893): launch_procs returned error
[proxy:0:0@ub3] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@ub3] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@ub1] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
[mpiexec@ub1] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@ub1] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec@ub1] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
I have successfully enable passwordless SSH so that pc1 can connect passwordlessly to pc3. Though it is, I still think there is something wrong with SSH or access permission. My OS is Ubuntu 14.04 LTS 32bit
Thanks for help.
make sure all the user names are the same. So change machine file to
ub3:4 # this will spawn 4 process on ub3
ub1 # this will spawn 1 process on ub1
And copy all the compiled file to the corresponding directory.
Make sure all the hostnames all in all the nodes' /etc/hostname
file.