I am trying to build a fat jar to send to spark-submit using sbt assembly. However, I cannot seem to get the build process right.
My current build.sbt is as follows
name := "MyAppName"
version := "1.0"
scalaVersion := "2.10.6"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.0" % "provided",
"org.apache.spark" %% "spark-mllib" % "1.6.0" % "provided",
"org.scalanlp" %% "breeze" % "0.12",
"org.scalanlp" %% "breeze-natives" % "0.12"
)
resolvers ++= Seq(
"Sonatype Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/"
)
Running sbt-sssembly produces a jar. However, after submitting the jar to spark-submit
spark-submit MyAppName-assembly-1.0.jar
(there's already a main class specified so I'm assuming its ok I don't specify a class), the following exception gets thrown:
java.lang.NoSuchMethodError: breeze.linalg.DenseVector.noOffsetOrStride()Z
at breeze.linalg.DenseVector$canDotD$.apply(DenseVector.scala:629)
at breeze.linalg.DenseVector$canDotD$.apply(DenseVector.scala:626)
at breeze.linalg.ImmutableNumericOps$class.dot(NumericOps.scala:98)
at breeze.linalg.DenseVector.dot(DenseVector.scala:50)
at RunMe$.cosSimilarity(RunMe.scala:103)
at RunMe$$anonfun$4.apply(RunMe.scala:35)
at RunMe$$anonfun$4.apply(RunMe.scala:33)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:30)
at org.spark-project.guava.collect.Ordering.leastOf(Ordering.java:658)
at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$29.apply(RDD.scala:1377)
at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$29.apply(RDD.scala:1374)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I'm relatively new to the world of scala and sbt, so any help would be greatly appreciated!
So it turns out the issue is that breeze is already included in spark. The issue was that spark contained a newer Breeze version with methods that my version didn't have.
My reference: Apache Spark - java.lang.NoSuchMethodError: breeze.linalg.DenseVector