Search code examples
apache-sparkcassandraspark-cassandra-connectordata-extraction

Spark read data from Cassandra error org.apache.spark.unsafe.types.UTF8String is not a valid external type for schema of string


I have a Cassandra table that is created as the following(in cqlsh)

CREATE TABLE blog.session( id int PRIMARY KEY, visited text);

I write data to Cassandra and it looks like this

id  | visited
1   |  Url1-Url2-Url3

I then try to read it using spark Cassandra connector(2.5.1).

val sparkSession = SparkSession.builder()
    .master("local")
    .appName("ReadFromCass")
    .config("spark.cassandra.connection.host", "localhost")
    .config("spark.cassandra.connection.port", "9042")
    .getOrCreate()

  import sparkSession.implicits._
  val readSessions = sparkSession.sqlContext
    .read
    .cassandraFormat("table1", "keyspace1").load().show()

However, it seems to be unable to read the visited since it is a text object with dashes in between words. The error occurs as

org.apache.spark.unsafe.types.UTF8String is not a valid external type for schema of string

any ideas on why spark is unable to read this and how to fix it?


Solution

  • The error seemed to be the version of the spark-cassandra-connector. Instead of using "2.5.1" use "3.0.0-beta"