Search code examples
apache-sparksbtsbt-assembly

How to install Spark 2.1.0 on Windows 7 64-bit?


I'm on Windows 7 64-bit and am following this blog to install Spark 2.1.0.

So I tried to build Spark from the sources that I'd cloned from https://github.com/apache/spark to C:\spark-2.1.0.

When I run sbt assembly or sbt -J-Xms2048m -J-Xmx2048m assembly, I get:

[info] Loading project definition from C:\spark-2.1.0\project
[info] Compiling 3 Scala sources to C:\spark-2.1.0\project\target\scala-2.10\sbt-0.13\classes...
java.lang.StackOverflowError
at java.security.AccessController.doPrivileged(Native Method)
at java.io.PrintWriter.<init>(Unknown Source)
at java.io.PrintWriter.<init>(Unknown Source)
at scala.reflect.api.Printers$class.render(Printers.scala:168)
at scala.reflect.api.Universe.render(Universe.scala:59)
at scala.reflect.api.Printers$class.show(Printers.scala:190)
at scala.reflect.api.Universe.show(Universe.scala:59)
at scala.reflect.api.Printers$class.treeToString(Printers.scala:182)
...

I adapted the memory settings of sbt as suggested, which are ignored anyway. Any ideas?


Solution

  • The linked blog post was "Posted on April 29, 2015" that's 2 years old now and should only be read to learn how things have changed since (I'm not even going to link the blog post to stop directing people to the site).

    The 2017 way of installing Spark on Windows is as follows:

    1. Download Spark from http://spark.apache.org/downloads.html.
    2. Read the official documentation starting from Downloading.

    That's it.

    Installing Spark on Windows

    Windows is known to give you problems due to Hadoop's requirements (and Spark does use Hadoop API under the covers).

    You'll have to install winutils binary that you can find at https://github.com/steveloughran/winutils repository.

    TIP: You should select the version of Hadoop the Spark distribution was compiled with, e.g. use hadoop-2.7.1 for Spark 2.1.0.

    Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin and define HADOOP_HOME to include c:\hadoop.

    See Running Spark Applications on Windows for further steps.