Search code examples
raspberry-pimpichodroidmpic++

Raspberry pi and other board having issues communicating with MPICH


I installed mpich on an Odroid n2+ and a Raspberry Pi. The Odroid n2+ is a more powerful board that I want to run the majority of my program, and I have the Raspberry Pi as a separate server in order to communicate with the program. I have installed mpich on both boards using the command apt install mpich. The Odroid board is running Ubuntu 20.04.4 LTS while the raspberry pi is running 2020-08-20-raspios-buster-armhf-lite. On both I ran the code posted at example mpi send recieve

If I run the code explicitly on the Odroid board, it works fine, if I run it on the raspberry pi I get an exit code 11. Therefore, I reinstalled mpich on both boards manually from https://www.mpich.org/static/downloads/3.3/mpich-3.3.tar.gz

When this happens, and I compile the code, it works on both boards individually. I can also run the command mpirun -f machinefile -n 2 hostname and it prints the hostname of each board. This tells me that I setup ssh correctly and mpich is able to login to both boards. However, when I run the code above, that sends and receives messages, it pauses when MPI_Ssend and MPI_Wait(&request, MPI_STATUS_IGNORE); are called in the code.

I compiled it with the commands mpic++ test_sendRecv.cpp -Wall -Werror -o test_sendRecv, then ran using mpirun -f machinefile -n 2 ./test_sendRecv. If I use mpiexec instead of mpirun I get the same issue.

If I am correct, this means that the boards are unable to pass messages between each other?

Is there a way to remedy this?


Solution

  • I had this same issue with other boards. I never played around with it enough to really figure out what was causing the problem or come up with a solution. However, I did create a repository that can use sockets as a more pseudo-MPI. They only pass string messages, but I'm sure it would be pretty straight forward to add extra types, and even file passing. I put the repository here