Search code examples
pythonapache-sparkdockercontainershadoop-yarn

Sending spark-submit inside a docker container to a YARN cluster


I have spark 1.6.1 installed in a docker container. I can run my spark python application locally, but when I try to submit it into a yarn cluster outside my host (spark-submit --master yarn myapp.py) It stays into an ACCEPTED state. If I go into the stderr logs from my application I have the following:

16/10/26 11:07:25 INFO ApplicationMaster: Waiting for Spark driver to be     reachable.
16/10/26 11:08:28 ERROR ApplicationMaster: Failed to connect to driver at 172.18.0.4:50229, retrying ...
16/10/26 11:09:31 ERROR ApplicationMaster: Failed to connect to driver at 172.18.0.4:50229, retrying ...
16/10/26 11:09:32 ERROR ApplicationMaster: Uncaught exception: 
org.apache.spark.SparkException: Failed to connect to driver!
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:501)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:362)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:204)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:672)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:670)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:697)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)

The driver at 172.18.0.4:50229 it's my container. As my container is in a host machine with IP 10.xx.xx.xx I find it normal that it cannot reach it. How can I specify that spark has to try to connect to the host machine and not to the container? Or does anyone havea solution for this?

Ps: I have checked the following link: Making spark use /etc/hosts file for binding in YARN cluster mode, which is really similar to my problem. But as the issue from spark says it won't fix it


Solution

  • So to answer my question I had to run my containers on the host network. If you are behind a proxy be careful to use the right virtual interface (eth1) for the SPARK_LOCAL_IP (env variable) and spark.driver.host (conf option).

    Yarn cluster was having troubles contacting the driver since the container ip was set according to it's network.

    As the containers are in the host network, any service deployed by a container will be automatically exposed, no need to expose or bind.

    Ps: I was deploying my application on client mode.