Search code examples
scalaapache-sparkjodatimejoda-convert

JodaTime issues with scala and spark when invoking spark-submit


I am having trouble using JodaTime in a spark scala program. I tried the solutions posted in the past in Stackoverflow and they don't seem to fix the issue for me.

When I try to spark-submit it comes back with an error like the following:

   15/09/04 17:51:57 INFO Remoting: Remoting started; listening on addresses :        
                     [akka.tcp://[email protected]:56672]
   Exception in thread "main" java.lang.NoClassDefFoundError: org/joda/time/DateTimeZone
      at com.ttams.xrkqz.GenerateCsv$.main(GenerateCsv.scala:50)
      at com.ttams.xrkqz.GenerateCsv.main(GenerateCsv.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

After sbt package, which seems to work fine, I invoke spark-submit like this ... ~/spark/bin/spark-submit --class "com.ttams.xrkqz.GenerateCsv" --master local target/scala-2.10/scala-xrkqz_2.10-1.0.jar

In my build.sbt file, I have

scalaVersion := "2.10.4"

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.1"

libraryDependencies ++= Seq ("joda-time" % "joda-time" % "2.8.2",
                         "org.joda" % "joda-convert" % "1.7"
                       )

I have tried multiple versions of joda-time and joda-convert but am not able to use spark-submit from the command line. However, it seems to work when i run within the ide (scalaide).

Let me know if you have any suggestions or ideas.


Solution

  • It seems that you are missing the dependencies in your class path. You could do a few things: One is you could manually add the joda time jars to spark submit with the --jars argument, the other is you could use the assembly plugin and build an assembly jar (you will likely want to mark spark-core as "provided" so it doesn't end up in your assembly) which will have all of your dependencies.