Search code examples
parallel-processingcluster-computingopenmpi

openmpi runtime error: Hello World run on hosts


I'm trying to setup a cluster. Up to now I'm testing it only with 1 master and 1 slave. Running the script from the master it starts printing the HelloWorld, but then I get the following error:

Primary job  terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.

it keeps printing HelloWorld and after a while:

mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: 
Process name: [[62648,1],2]
Exit code:    2

Then the code stops. By chance I tried to run the script from the slave and it works. I can't figure out why. I've set passwordless SSH and running a file located in a nfs-mounted folder. Can you help me?

Thanks


Solution

  • SOLVED: I've parsed all configurations files I've modified and finally there was a mistake in /etc/hosts. This is about the program working if launched from the node to the master and not viceversa. Regarding the program stopping, it is somehow related to the node not able to find the file to run. Fixed this setting up again the nfs. Thanks for your help, hope this can be useful to other users.