Search code examples
parallel-processingmpicluster-computingopenmpi

OpenMPI: Simple 2-Node Setup


I'm having trouble running an OpenMPI program using only two nodes (one of the nodes is the same machine that is executing the mpiexec command and the other node is a separate machine).

I'll call the machine that is running mpiexec, master, and the other node slave.

On both master and slave, I've installed OpemMPI in my home directory under ~/mpi

I have a file called ~/machines.txt on master.

Ideally, ~/machines.txt should contain:

master
slave

However, when I run the following on master:

mpiexec -n 2 --hostfile ~/machines.txt hostname

OUTPUT, I get the following error:

bash: orted: command not found

But if ~/maschines.txt only contains the name of the node that the command is running on, it works. ~/machines.txt:

master

Command:

mpiexec -n 2 --hostfile ~/machines.txt hostname

OUTPUT:

master
master

I've tried running the same command on slave, and changed the machines.txt file to contain only slave, and it worked too. I've made sure that my .bashrc file contains the proper paths for OpenMPI.

What am I doing wrong? In short, there is only a problem when I try to execute a program on a remote machine, but I can run mpiexec perfectly fine on the machine that is executing the command. This makes me believe that it's not a path issue. Am I missing a step in connecting both machines? I have passwordless ssh login capability from master to slave.


Solution

  • This error message means that you either do not have Open MPI installed on the remote machine, or you do not have your PATH set properly on the remote machine for non-interactive logins (i.e., such that it can't find the installation of Open MPI on the remote machine). "orted" is one of the helper executables that Open MPI uses to launch processes on remote nodes -- so if "orted" was not found, then it didn't even get to the point of trying to launch "hostname" on the remote node.

    Note that there might be a difference between interactive and non-interactive logins in your shell startup files (e.g., in your .bashrc).

    Also note that it is considerably simpler to have Open MPI installed in the same path location on all nodes -- in that way, the prefix method described above will automatically add the right PATH and LD_LIBRARY_PATH when executing on the remote nodes, and you don't have to muck with your shell startup files.

    Note that there are a bunch of FAQ items about these kinds of topics on the main Open MPI web site.