Search code examples
clouderaapache-sparkhadoop-yarn

running spark app on cloudera 5 in YARN mode


I have been trying to to spark-submit to my cloudera cluster for a few weeks. I really hope someone out there know how this works.

I created a script which calls spark-submit with all the required arguments. The screen dumps out the following lines

Using properties file: null
Using properties file: null
Parsed arguments:
  master                  yarn
  deployMode              cluster
  executorMemory          null
  executorCores           null
  totalExecutorCores      null
  propertiesFile          null
  driverMemory            null
  driverCores             null
  driverExtraClassPath    /home/bruce/workspace1/spark-cloudera/yarn/stable/target/spark-yarn_2.10-1.0.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-yarn-client/2.3.0-cdh5.1.0/hadoop-yarn-client-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-common/2.3.0-cdh5.1.0/hadoop-common-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-yarn-api/2.3.0-cdh5.1.0/hadoop-yarn-api-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-yarn-common/2.3.0-cdh5.1.0/hadoop-yarn-common-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/org/apache/hadoop/hadoop-auth/2.3.0-cdh5.1.0/hadoop-auth-2.3.0-cdh5.1.0.jar:/home/bruce/.m2/repository/com/google/protobuf/protobuf-java/2.5.0/protobuf-java-2.5.0.jar
  driverExtraLibraryPath  null
  driverExtraJavaOptions  null
  supervise               false
  queue                   null
  numExecutors            null
  files                   null
  pyFiles                 null
  archives                null
  mainClass               org.apache.spark.examples.SparkPi
  primaryResource         file:/home/bruce/workspace1/spark-cloudera/examples/target/scala-2.10/spark-examples-1.0.0-cdh5.1.0-hadoop2.3.0-cdh5.1.0.jar
  name                    org.apache.spark.examples.SparkPi
  childArgs               [10]
  jars                    null
  verbose                 true


log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

the call gets stuck for a very long time then quits with Connection refused.

What I don't understand is the argument specifies using YarnClient, but no where does it indicate it knows how to contact the yarn resource manager, not the ip, not the port. The submission is made on my lap top, the cluster is on the neighboring subnet. How does spark-submit figure out how to contact the yarn service?


Solution

  • From the Spark Documentation

    Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster. These configs are used to write to the dfs and connect to the YARN ResourceManager.