I am trying to set up a MPI Cluster. But I have the problem that the number of CPUs added to the mpd.conf file is not correctly used. I have three Ubuntu servers. opteron with 48 Cores calc1 with 8 Cores calc2 with 8 Cores.
My mpd.hosts looks like:
opteron:46
calc1:6
calc2:6
After booting (mpdboot -n 3 -f mpd.hosts) the System is running. mpdtrace -> all three of them are listed.
But running a Programm like "mpiexec -n 58 raxmlHPC-MPI ..." causes that calc1 and calc2 get to many jobs and opteron gets to few at the same time. What am I doing wrong?
Regards
Bjoern
I found a workaround. I used the additional parameter "-machinefile /path/to/mpd.hosts" for the mpiexec command. And now, all nodes are running correctly. One problem I got was that I got following error message:
... MPIU_SHMW_Seg_create_attach_templ(671): open failed - No such file or directory ...
To fix it, I had to set the environment variable MPICH_NO_LOCAL=1