Search code examples
apache-sparkhadoopdockerpysparknetwork-interface

Spark: is using wrong network interface


I am using a hadoop cluster inside a docker containers (I am using overlay network)

I have 2 containers in the same host (master and slave2) and another in a different host (slave1)

The containers can access a local network that is only used by them 10.0.0.0

The master and slave2 container can also access another network that is shared with the host 172.18.0.0

Slave1 can access a different network 172.18.0.0 that is shared with its host

the network 172.18.0.0 in the two hosts are independent.

So to resume every container has two ip address

  • master: 10.0.0.2 and 172.18.0.2
  • salve2: 10.0.0.3 and 172.18.0.3
  • salve3; 10.0.0.4 and 172.18.0.2

The tree container must communicate trow 10.0.0.0 network. but there is something strange happening.

when I run this script in pyspark

def getIP(iter):
  import socket
  s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
  s.connect(("8.8.8.8", 80))
  return [s.getsockname()[0]]


rdd = sc.parallelize(range(3),3)
hosts = rdd.mapPartitions(getIP).collect()
for h in hosts:
  print(h)

the output of this problem is

172.18.0.2
172.18.0.3
172.18.0.3

which is wrong because the containers can only communicate throw 10.0.0.0 network interface.

this is my yarn-site.xml file

<configuration>
    <property>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>
            yarn.nodemanager.aux-services.mapreduce_shuffle.class
        </name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop-master</value>
    </property>
</configuration>

hadoop-master is a hostname for the address 10.0.0.2

How can I tell spark to use 10.0.0.2 interface.

Thank you.


Solution

  • I don't think your spark uses wrong network, otherwise you cannot even run this program.

    Your getIP method returns ip the host used to connect 8.8.8.8. As you described, all hosts have two networks while 10.0.0.* is only shared by these 3 nodes. So the hosts use 172.18.0.* to connect 8.8.8.8.

    You can try to see the difference between s.connect(("8.8.8.8", 80)), s.connect(("10.0.0.2", 80)) and s.connect(("localhost", 80)).