I'm working with Scala and trying to save my calendar information from Spark to Cassandra.
I started with creating the same schema with Cassandra:
session.execute("CREATE TABLE calendar (DateNum int, Date text, YearMonthNum int, ..., PRIMARY KEY (datenum,date))")
and then imported my data from spark to Cassandra:
.write
.format("org.apache.spark.sql.cassandra")
.options(Map("table" -> "calendar", "keyspace" -> "ks"))
.mode(SaveMode.Append)
.save()
But once I try to read the data I retrieved from Spark on Cassandra, the rows appear so mixed up together, while I want to keep the same order my calendar has.
An example of a row I have:
20090111 | 1/11/2009 | 200901 |...
Select/Order don't seem to fix the problem too.
An answer to this was adding a new column with a common value over all the database (E.g: "1") using Spark and making that column the partition key in the Cassandra table, that way you get one partition for the whole table and your information keeps ordered.