I'm on Windows 7 64-bit and am following this blog to install Spark 2.1.0.
So I tried to build Spark from the sources that I'd cloned from https://github.com/apache/spark to C:\spark-2.1.0
.
When I run sbt assembly
or sbt -J-Xms2048m -J-Xmx2048m assembly
, I get:
[info] Loading project definition from C:\spark-2.1.0\project
[info] Compiling 3 Scala sources to C:\spark-2.1.0\project\target\scala-2.10\sbt-0.13\classes...
java.lang.StackOverflowError
at java.security.AccessController.doPrivileged(Native Method)
at java.io.PrintWriter.<init>(Unknown Source)
at java.io.PrintWriter.<init>(Unknown Source)
at scala.reflect.api.Printers$class.render(Printers.scala:168)
at scala.reflect.api.Universe.render(Universe.scala:59)
at scala.reflect.api.Printers$class.show(Printers.scala:190)
at scala.reflect.api.Universe.show(Universe.scala:59)
at scala.reflect.api.Printers$class.treeToString(Printers.scala:182)
...
I adapted the memory settings of sbt as suggested, which are ignored anyway. Any ideas?
The linked blog post was "Posted on April 29, 2015" that's 2 years old now and should only be read to learn how things have changed since (I'm not even going to link the blog post to stop directing people to the site).
The 2017 way of installing Spark on Windows is as follows:
That's it.
Windows is known to give you problems due to Hadoop's requirements (and Spark does use Hadoop API under the covers).
You'll have to install winutils
binary that you can find at https://github.com/steveloughran/winutils repository.
TIP: You should select the version of Hadoop the Spark distribution was compiled with, e.g. use hadoop-2.7.1 for Spark 2.1.0.
Save winutils.exe
binary to a directory of your choice, e.g. c:\hadoop\bin
and define HADOOP_HOME
to include c:\hadoop
.
See Running Spark Applications on Windows for further steps.