Search code examples
mpicluster-computingdistributedintel-mklmpiexec

Program does not finish when two nodes are used


When I run my program in one computer, with 4 processes, it will terminate almost instantly. When I try to run it in a cluster of two computers (the cluster was checked and was OK), it just won't finish!

I had done a run a long time ago in the cluster and I remember that it was slower than in the run on the one pc, but it would terminate!

Here is my run.sh:

#!/bin/bash

start=100
end=100
for ((i = $start; i <= $end; ++i )) ; 
do
        mpiexec -f machinefile -n 4 ./test ../../l_matrices/Lmat_755.mtx 1 755 755 $i $i 2 2 0 0
done

and I did check that two processes are spawned in every node.

Here is my machinefile:

hostname1.gr:2
hostname2.gr:2

What is happening?


Solution

  • (Presuming that the script is correct and the start and end variable values are intentional, this will not do anything meaningful as mpiexec will execute two copies of the same file with the same arguments once)

    Check the paths - You have used relative paths, that can lead to problems as the execution happens in the default directory after login, typically, your home directory.