Search code examples
apache-sparkcassandrapysparkdatastaxspark-structured-streaming

Writing Spark Structure Streaming data into Cassandra


I want to write Structure Streaming Data into Cassandra using Pyspark API.

My data flow is like below:

Nifi -> Kafka -> Spark Structure Streaming -> Cassandra

I have tried below way:

query = df.writeStream\
  .format("org.apache.spark.sql.cassandra")\
  .option("keyspace", "demo")\
  .option("table", "test")\
  .start()

But getting below error message: "org.apache.spark.sql.cassandra" does not support streaming write.

Also another approach I have tried: [Source - DSE 6.0 Administrator Guide]

query = df.writeStream\
   .cassandraFormat("test", "demo")\
   .start()

But got exception: AttributeError: 'DataStreamWriter' object has no attribute 'cassandraFormat'

Can anyone give me some idea how I can proceed further ?

Thanks in advance.


Solution

  • After upgrading DSE 6.0 (latest version) I am able to write structured streaming data into Cassandra. [Spark 2.2 & Cassandra 3.11]

    Reference Code:

    query = fileStreamDf.writeStream\
     .option("checkpointLocation", '/tmp/check_point/')\
     .format("org.apache.spark.sql.cassandra")\
     .option("keyspace", "analytics")\
     .option("table", "test")\
     .start()
    

    DSE documentation URL: https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/spark/structuredStreaming.html