I want to write Structure Streaming Data into Cassandra using Pyspark API.
My data flow is like below:
Nifi -> Kafka -> Spark Structure Streaming -> Cassandra
I have tried below way:
query = df.writeStream\
.format("org.apache.spark.sql.cassandra")\
.option("keyspace", "demo")\
.option("table", "test")\
.start()
But getting below error message: "org.apache.spark.sql.cassandra" does not support streaming write.
Also another approach I have tried: [Source - DSE 6.0 Administrator Guide]
query = df.writeStream\
.cassandraFormat("test", "demo")\
.start()
But got exception: AttributeError: 'DataStreamWriter' object has no attribute 'cassandraFormat'
Can anyone give me some idea how I can proceed further ?
Thanks in advance.
After upgrading DSE 6.0 (latest version) I am able to write structured streaming data into Cassandra. [Spark 2.2 & Cassandra 3.11]
Reference Code:
query = fileStreamDf.writeStream\
.option("checkpointLocation", '/tmp/check_point/')\
.format("org.apache.spark.sql.cassandra")\
.option("keyspace", "analytics")\
.option("table", "test")\
.start()
DSE documentation URL: https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/spark/structuredStreaming.html