Search code examples
apache-sparkcassandraapache-spark-sqlspark-structured-streamingspark-cassandra-connector

How to write a Dataset into Cassandra table using spark-cassandra-connector?


I am trying to save Structured Steaming Dataset into given Cassandra table.

I am using datastax cassandra connector version spark-cassandra-connector_2-11.jar

While I try to save dataSet like below

dataSet
    .writeStream()
    .format("org.apache.spark.sql.cassandra")
    .option("table",table)
    .option("keyspace", keyspace)
    .outputMode("append")
    .start();

Throwing error :

Data source org.apache.spark.sql.cassandra does not support streamed writing

What should be done and how to handle this?


Solution

  • There are several options regarding it:

    1. With Spark Cassandra Connector (SCC) version 2.x, Spark < 2.4, and OSS Cassandra, the only choice is to implement custom forEach operation, like it's done here;
    2. With Spark Cassandra Connector version 2.x, Spark >= 2.4, and OSS Cassandra, we can use forEachBatch with just normal write operation, like here;
    3. For DSE, we can just use data.writeStream().format("org.apache.spark.sql.cassandra"), as DSE Analytics has custom SCC;
    4. Starting with SCC 2.5, DSE-specific functionality is open for OSS Cassandra as well, so we can use it same way as for DSE, as shown in the docs.