Search code examples
scalaapache-sparkhadoop-yarn

Support multiple Spark distributions on Yarn cluster


I run multiple spark jobs on a cluster via $SPARK_HOME/bin/spark-submit --master yarn --deploy-mode cluster.

When a new version of Spark goes live I'd like to somehow roll out a new distribution over the cluster alongside with the old one and then gradually migrate all my jobs one by one.

Unfortunately, Spark relies on $SPARK_HOME global variable so I can't figure out how to achieve it. It would be especially useful when Spark for Scala 2.12 is out.


Solution

  • It is possible to run any number of Spark distributions on YARN cluster. I've done it a lot of times on my MapR cluster, mixing 1-3 different versions, as well as setting up official Apache Spark there.

    All you need is to tweak conf/spark-env.sh (rename spark-env.sh.template) and just add a line:

    export SPARK_HOME=/your/location/of/spark/spark-2.1.0