Search code examples
apache-sparkhdfshadoop-yarnhadoop2

Spark Remote execution to Cluster fails - HDFS connection Refused at 8020


I am having issues submitting a spark-submit remote job from a machine outside from the Spark Cluster running on YARN.

Exception in thread "main" java.net.ConnectionException: Call from remote.dev.local/192.168.10.65 to target.dev.local:8020 failed on connection exception: java.net.ConnectionException: Connection Refused

In my core-site.xml:

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://target.dev.local:8020</value>
<property>

Also at my hdfs-site.xml in the cluster I have disbled permissions checking for HDFS:

<property>
  <name>dfs.permissions.enabled</name>
  <value>false</value>
<property>

Also, when I telnet from the machine outside the cluster:

telnet target.dev.local 8020

I am getting

telnet: connect to address 192.168.10.186: Connection Refused

But, when I

telnet target.dev.local 9000

it says Connected.

Also when I ping target.dev.local it works.

My spark-submit script from the remote machine is:

export HADOOP_CONF_DIR=/<path_to_conf_dir_copied_from_cluster>/

spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 5g \
--executor-memory 50g \
--executor-cores 5 \
--queue default \
<path to jar>.jar \
10

What am I missing here?


Solution

  • Turns out I had to change

    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://target.dev.local:8020</value>
    <property>
    

    to

    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://0.0.0.0:8020</value>
    <property>
    

    to allow connections form the outside since target.dev.local sits in a private network switch.