Long story short: I have an app that uses Spark dataframes and machine learning, AND ScalaFX for the front-end. I'd like to create a massive 'fat' jar so that it runs in any machine with a JVM.
I am familiar with the assembly sbt plugin, having researched ways of assembling a jar for hours. Below is my build.sbt:
lazy val root = (project in file(".")).
settings(
scalaVersion := "2.11.8",
mainClass in assembly := Some("me.projects.MyProject.Main"),
assemblyJarName in assembly := "MyProject_2.0.jar",
test in assembly := {}
)
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" withSources() withJavadoc()
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" withSources() withJavadoc()
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.0.2" withSources() withJavadoc()
libraryDependencies += "joda-time" % "joda-time" % "2.9.4" withJavadoc()
libraryDependencies += "org.scalactic" %% "scalactic" % "3.0.1" % "provided"
libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.1" % "test"
libraryDependencies += "org.scalafx" %% "scalafx" % "8.0.92-R10" withSources() withJavadoc()
libraryDependencies += "net.liftweb" %% "lift-json" % "2.6+" withSources() withJavadoc()
EclipseKeys.withSource := true
EclipseKeys.withJavadoc := true
// META-INF discarding
assemblyMergeStrategy in assembly := {
case PathList("org","aopalliance", xs @ _*) => MergeStrategy.last
case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
case PathList("org", "apache", xs @ _*) => MergeStrategy.last
case PathList("com", "google", xs @ _*) => MergeStrategy.last
case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
case "about.html" => MergeStrategy.rename
case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
case "META-INF/mailcap" => MergeStrategy.last
case "META-INF/mimetypes.default" => MergeStrategy.last
case "plugin.properties" => MergeStrategy.last
case "log4j.properties" => MergeStrategy.last
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
oldStrategy(x)
}
This runs fine on my Linux machine, which has spark installed and configured. Before I've taken ScalaFX assembled jars and opened them in a Windows machine with no issues. However, this application, which also uses Spark, gives the following:
ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: System memory 259522560 must be at least 471859200. Please increase the heap size using the --driver-memory option or spark.driver.memory in Spark configuration.
Things I have tried:
To set different values for spark.executor.driver/memory when creating the SparkConf (in the scala code), like this:
.set("spark.executor.memory", "12g") .set("spark.executor.driver", "5g") .set("spark.driver.memory","5g")
The application works fine otherwise (when run in Scala IDE, when run using spark-submit, when opening the assembled jar in linux).
Please let me know if this is possible. This is a small project that uses a GUI (ScalaFX) to run a couple of machine learning operations on some data (Spark). Hence the dependencies above.
Again, I am not looking to set up a cluster or anything of the like. I'd like to access the Spark functionality just by running the jar on any computer with JRE. This a small project to-be-showcased.
Turns out it was a rather generic JVM issue. Instead of just adding runtime parameters, I have solved this by adding a new environment variable to the Windows system:
name: _JAVA_OPTIONS
value: -Xms512M -Xmx1024M