Search code examples
scalaapache-sparkcassandraspark-cassandra-connector

Spark Cassandra Join ClassCastException


I am trying to join two Cassandra tables with: t1.join(t2, Seq("some column"), "left") I am getting the below error message:

Exception in thread "main" java.lang.ClassCastException: scala.Tuple8 cannot be cast to scala.Tuple7 at org.apache.spark.sql.cassandra.execution.CassandraDirectJoinStrategy.apply(CassandraDirectJoinStrategy.scala:27)

I am using cassandra v3.11.13 and Spark 3.3.0. The code dependencies:

  libraryDependencies ++= Seq(
      "org.scalatest" %% "scalatest" % "3.2.11" % Test,
      "com.github.mrpowers" %% "spark-fast-tests" % "1.0.0" % Test,
      "graphframes" % "graphframes" % "0.8.1-spark3.0-s_2.12" % Provided,
      "org.rogach" %% "scallop" % "4.1.0" % Provided,
      "org.apache.spark" %% "spark-sql" % "3.1.2" % Provided,
      "org.apache.spark" %% "spark-graphx" % "3.1.2" % Provided,
      "com.datastax.spark" %% "spark-cassandra-connector" % "3.2.0" % Provided)

Your help is greatly appreciated


Solution

  • The Spark Cassandra connector does not support Apache Spark 3.3.0 yet and I suspect that is the reason it's not working though I haven't done any verification myself.

    Support for Spark 3.3.0 has been requested in SPARKC-686 but the amount of work required is significant so stay tuned.

    The latest supported Spark version is 3.2 using spark-cassandra-connector 3.2. Cheers!