Search code examples
scalaapache-sparkcassandrasbtspark-cassandra-connector

spark streaming with sbt + cassandra connector dependency issue


Folks,

I am trying to integrated cassandra with spark streaming. Below is the sbt file:

 scalaVersion := "2.11.8"

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-streaming" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-sql" % "1.6.1",
"com.datastax.spark" %% "spark-cassandra-connector" % "1.6.2",
"com.datastax.cassandra" % "cassandra-driver-core" % "3.0.0",
("org.apache.spark" %% "spark-streaming-kafka" % "1.6.0").
exclude("org.spark-project.spark", "unused")
)

I added below line(error line mentioned below) for cassandra integration:

val lines = KafkaUtils.createDirectStream[
String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, topics)

//Getting errors once I add below line in program 
lines.saveToCassandra("test", "test", SomeColumns("key", "value"))

lines.print()

Once I add above line, I see below error in IDE:

enter image description here

I see similar error if i try to package this project from command prompt:

enter image description here

FYR, I am using below versions:

scala - 2.11

kafka - kafka_2.11-0.8.2.1

java - 8

cassandra - datastax-community-64bit_2.2.8

Please help to resolve the issue.


Solution

  • As expected, it was dependency issue which is resolved by updating sbt file as below:

    scalaVersion := "2.11.8"
    
    libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "2.0.0" % "provided",
    "org.apache.spark" %% "spark-streaming" % "2.0.0" % "provided",
    "org.apache.spark" %% "spark-sql" % "2.0.0",
    "com.datastax.spark" %% "spark-cassandra-connector" % "2.0.0-RC1",
    "com.datastax.cassandra" % "cassandra-driver-core" % "3.0.0",
    ("org.apache.spark" %% "spark-streaming-kafka" % "1.6.0").
    exclude("org.spark-project.spark", "unused")
    )