Search code examples
pythongoogle-compute-enginedistributed-computinggoogle-kubernetes-enginempi4py

Distributed Programming on Google Cloud Engine using Python (mpi4py)


I want to do distributed programming with python using the mpi4py package. For testing reasons, I set up a 5-node cluster via Google container engine, and changed my code accordingly. But now, what are my next steps? How do I get my code running and working on all 5 VMs?

I tried to just ssh-connect into one VM from my cluster and run the code, but it was obvious that the code was not getting distributed, but instead stayed on the same machine :( [see example below]

.

Code:

from mpi4py import MPI

size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()

print("Hello, World! I am process/rank {} of {} on {}.\n".format(rank, size,name))

.

Output:

mpiexec -n 5 python 5_test.py

Hello, World! I am process/rank 0 of 5 on gke-cluster-1-000000cd-node-mgff.

Hello, World! I am process/rank 1 of 5 on gke-cluster-1-000000cd-node-mgff.

Hello, World! I am process/rank 2 of 5 on gke-cluster-1-000000cd-node-mgff.

Hello, World! I am process/rank 3 of 5 on gke-cluster-1-000000cd-node-mgff.

Hello, World! I am process/rank 4 of 5 on gke-cluster-1-000000cd-node-mgff.


Solution

  • So, I figured out what I got wrong, and I think I should post the answer for someone who might has a similar question.

    Turns out, I should have read the documentation of mpi4py better :D

    The command mpirun -np 5 python 5_test.py is for running the program an a single, multi-core host on different processes.

    However, I wanted to distribute the task across various host. Therefore I needed the command mpirun --hostfile <hostfile> python 5_test.py. And <hostfile> must be a file looking like this:

    -- hostfile --
    
    host1   slots=4
    
    host2   slots=4
    
    host3   slots=4
    
    '--------------
    
    
    .
    

    Useful Link: https://github.com/jbornschein/mpi4py-examples