Search code examples
pythonmpiworker

MPIRUN is not executing on Worker node despite hostfile and SSH access


I am executing simple demo code of helloworld.py on my main node with only one worker (VM) introduced in machinefile. I have installed mpirun on worker as well and also placed the script there (not sure where exactly to place it, /home/user/mpirun-master/demo).

MPI do check for ssh access to worker node before executing but it is only running on my main node and no process outcome come from the worker.

This is content of my machinefile

[email protected] # main node
[email protected] # worker

And this is the output I am getting

mpirun -np 2 --machinefile machinefile python3 helloworld.py
Invalid MIT-MAGIC-COOKIE-1 keyHello, World! I am process 1 of 2 on dell-MS-7A70.
Hello, World! I am process 0 of 2 on dell-MS-7A70

Both are running on dell-MS-7A70 (main-machine device name), how can I make process to run on worker node. Is this problem arising due to worker machine being a virtual one?


Solution

  • The issue was resolved when I created account with same name on my worker node and fixed slot numbers in machinefile for master and nodes as my script was preferring master eachtime.

    Now my machinefile looks like:

    172.16.197.129 max_slots=3 # worker 
    172.16.197.1 max_slots=1 # master