Search code examples
scalaapache-sparkcassandraspark-cassandra-connector

Saving information to Cassandra keeps no order


I'm working with Scala and trying to save my calendar information from Spark to Cassandra.

I started with creating the same schema with Cassandra:

session.execute("CREATE TABLE calendar (DateNum int, Date text, YearMonthNum int, ..., PRIMARY KEY (datenum,date))")

and then imported my data from spark to Cassandra:

        .write
        .format("org.apache.spark.sql.cassandra")
        .options(Map("table" -> "calendar", "keyspace" -> "ks"))
        .mode(SaveMode.Append)
        .save()

But once I try to read the data I retrieved from Spark on Cassandra, the rows appear so mixed up together, while I want to keep the same order my calendar has.

An example of a row I have:

20090111 | 1/11/2009 | 200901 |...

Select/Order don't seem to fix the problem too.


Solution

  • An answer to this was adding a new column with a common value over all the database (E.g: "1") using Spark and making that column the partition key in the Cassandra table, that way you get one partition for the whole table and your information keeps ordered.