Search code examples
sshmpimpich

mpich cluster test error, unable to change wdir


I have built up a mpich2 cluster, and the machinefile is:

pc3@ub3:4   # this will spawn 4 process on ub3
pc1@ub1     # this will spawn 1 process on ub1

when I run the test process, it should print:

Hello from processor 0 of 8
Hello from processor 1 of 8
Hello from processor 2 of 8
Hello from processor 3 of 8
Hello from processor 4 of 8
Hello from processor 5 of 8
Hello from processor 6 of 8
Hello from processor 7 of 8

But it returned:

pc1@ub1:~$ mpiexec -n 8 -f machinefile ./mpi_hello
[proxy:0:0@ub3] launch_procs (./pm/pmiserv/pmip_cb.c:648): unable to change wdir to /home/pc1 (No such file or directory)
[proxy:0:0@ub3] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:893): launch_procs returned error
[proxy:0:0@ub3] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@ub3] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@ub1] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
[mpiexec@ub1] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@ub1] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec@ub1] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion

I have successfully enable passwordless SSH so that pc1 can connect passwordlessly to pc3. Though it is, I still think there is something wrong with SSH or access permission. My OS is Ubuntu 14.04 LTS 32bit

Thanks for help.


Solution

  • make sure all the user names are the same. So change machine file to

    ub3:4   # this will spawn 4 process on ub3
    ub1     # this will spawn 1 process on ub1
    

    And copy all the compiled file to the corresponding directory. Make sure all the hostnames all in all the nodes' /etc/hostname file.