Search code examples
scalaapache-sparksbtspark-streaming

Spark + Scala: How to add external dependencies in build.sbt


I'm new to Spark (using v2.4.5), and am still trying to figure out the correct way to add external dependencies. When trying to add Kafka streaming to my project, my build.sbt looked liked this:

name := "Stream Handler"

version := "1.0"

scalaVersion := "2.11.12"

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "2.4.5" % "provided",  
    "org.apache.spark" % "spark-streaming_2.11" % "2.4.5" % "provided",
    "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.4.5"
)

This builds successfully, but when running with spark-submit, I get a java.lang.NoClassDefFoundError with KafkaUtils.

I was able to get my code working by passing in the dependency thru the --packages option like this:

$ spark-submit [other_args] --packages "org.apache.spark:org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.5"

Ideally I would like to set up all the dependencies in the build.sbt, but I'm not sure what I'm doing wrong. Any advice would be appreciated!


Solution

  • your "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.4.5" is wrong.

    change that to below like mvnrepo.. https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10

    libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-10" % "2.4.5"