Search code examples
cluster-computingmpichmpiexec

mpiexec using wrong number of cpus


I am trying to set up a MPI Cluster. But I have the problem that the number of CPUs added to the mpd.conf file is not correctly used. I have three Ubuntu servers. opteron with 48 Cores calc1 with 8 Cores calc2 with 8 Cores.

My mpd.hosts looks like:
opteron:46
calc1:6
calc2:6

After booting (mpdboot -n 3 -f mpd.hosts) the System is running. mpdtrace -> all three of them are listed.

But running a Programm like "mpiexec -n 58 raxmlHPC-MPI ..." causes that calc1 and calc2 get to many jobs and opteron gets to few at the same time. What am I doing wrong?

Regards

Bjoern


Solution

  • I found a workaround. I used the additional parameter "-machinefile /path/to/mpd.hosts" for the mpiexec command. And now, all nodes are running correctly. One problem I got was that I got following error message:

    ... MPIU_SHMW_Seg_create_attach_templ(671): open failed - No such file or directory ...

    To fix it, I had to set the environment variable MPICH_NO_LOCAL=1