Search code examples
hadoopapache-sparkcluster-computinghortonworks-data-platformambari

installation for spark 2.1.0 with Ambari 2.4.2.0


I am relatively new on cluster installations for Spark along with Ambari. Recently, I got a task for installing Spark 2.1.0 on a cluster which pre-installed Ambari with Spark 1.6.2 with HDFS & YARN 2.7.3.

My task is to have Spark 2.1.0 installed since it is the newest version with better compacity with RSpark and more. I searched over the internet for couple days, only found some installation guide on either AWS or Spark 2.1.0 alone.

such as following: http://data-flair.training/blogs/install-deploy-run-spark-2-x-multi-node-cluster-step-by-step-guide/ and http://spark.apache.org/docs/latest/building-spark.html.

But none of them mentioning the interference of different versions of Spark. Since I need to keep this cluster running, I would like to know some potential threat for the cluster.

Is there some proper way to do this installation? Thanks a lot!


Solution

  • If you want to have your SPARK2 installation managed by Ambari then SPARK2 must be provisioned by Ambari.

    HDP 2.5.3 does NOT support Spark 2.1.0, it does however come with a technical preview of Spark 2.0.0.

    Your options are:

    • Install Spark 2.1.0 manually and not have it managed by Ambari
    • Use Spark 2.0.0 instead of Spark 2.1.0 which is provided by HDP 2.5.3
    • Use a different stack. ie. IBM Open Platform (IOP) 4.3, slated to release in 2017, it will ship with Spark 2.1.0 support. You can get started using it today with the technical preview release.
    • Upgrade HDP (2.6) which supports Spark 2.1.
    • Extend the HDP 2.5 stack to support Spark 2.1.0. You can see how to customize and extend ambari stacks on the wiki. This would let you used Spark 2.1.0 and have it managed by ambari. However, this would be a lot of work to implement and being that you're new to Ambari it would be rather difficult.