I am trying to run a simple MPI example on two computing nodes node1
and node2
, which are virtual machines I just created on Oracle Cloud. (It is the first time I used Oracle Cloud...) The system is Ubuntu 20.04. What I've done include:
node1
and node2
have the correct MPI environment (OpenMPI-4.1.0) under the same path. $PATH
and $LD_LIBRARY_PATH
have also been set. I can successfully run the MPI example on a single node.node1
and node2
has been setup. I can use ssh node1
and ssh node2
to connect one node to another.hosts2
) on the two nodes under the same path ($HOSTFILE_PATH/hosts2
) containingnode1 slots=1
node2 slots=1
test
) are under the same path ($EXE_PATH/test
).Then I ran $(which mpirun) -n 2 -hostfile $HOSTFILE_PATH/hosts2 $EXEC_PATH/test
, and I did not get any return. So I can only terminate the execution with ctrl+c. After I few minutes, I got the output:
------------------------------------------------------------
A process or daemon was unable to complete a TCP connection
to another process:
Local host: instance-1-632783
Remote host: instance-1
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
try again.
------------------------------------------------------------
Is the problem related to the firewall? I tried sudo ufw status
and got Status: inactive
. I also tried sudo iptables -L
, and got:
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
ACCEPT icmp -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT udp -- anywhere anywhere udp spt:ntp
ACCEPT tcp -- anywhere anywhere state NEW tcp dpt:ssh
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
Chain FORWARD (policy ACCEPT)
target prot opt source destination
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
InstanceServices all -- anywhere link-local/16
Chain InstanceServices (1 references)
target prot opt source destination
ACCEPT tcp -- anywhere 169.254.0.2 owner UID match root tcp dpt:iscsi-target /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
ACCEPT tcp -- anywhere 169.254.2.0/24 owner UID match root tcp dpt:iscsi-target /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
ACCEPT tcp -- anywhere 169.254.4.0/24 owner UID match root tcp dpt:iscsi-target /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
ACCEPT tcp -- anywhere 169.254.5.0/24 owner UID match root tcp dpt:iscsi-target /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
ACCEPT tcp -- anywhere 169.254.0.2 tcp dpt:http /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
ACCEPT udp -- anywhere 169.254.169.254 udp dpt:domain /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
ACCEPT tcp -- anywhere 169.254.169.254 tcp dpt:domain /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
ACCEPT tcp -- anywhere 169.254.0.3 owner UID match root tcp dpt:http /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
ACCEPT tcp -- anywhere 169.254.0.4 tcp dpt:http /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
ACCEPT tcp -- anywhere 169.254.169.254 tcp dpt:http /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
ACCEPT udp -- anywhere 169.254.169.254 udp dpt:bootps /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
ACCEPT udp -- anywhere 169.254.169.254 udp dpt:tftp /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
ACCEPT udp -- anywhere 169.254.169.254 udp dpt:ntp /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */
REJECT tcp -- anywhere link-local/16 tcp /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */ reject-with tcp-reset
REJECT udp -- anywhere link-local/16 udp /* See the Oracle-Provided Images section in the Oracle Cloud Infrastructure documentation for security impact of modifying or removing this rule */ reject-with icmp-port-unreachable
Then I tried sudo iptables -F
, after that, sudo iptables -L
showed:
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain InstanceServices (0 references)
target prot opt source destination
But it seems that sudo iptables -F
temporarily deletes the policies. When I reboot the system, sudo iptables -L
shows the former output. So how can I solve the firewall problem? Should I permanently delete the policies? And how?
Even if the VMs are in the same subnet you still have to allow traffic between them.
So open the required ports in the security list of the subnet you are using (https://docs.oracle.com/en-us/iaas/Content/Network/Concepts/securitylists.htm#Security_Lists)
If you don't know the needed ports you can open all ports (This is not a good practice for production environments).