Search code examples
apache-sparkcassandraspark-cassandra-connector

Spark RDD write to Cassandra


I have a below Cassandra Table schema.

ColumnA Primary Key
ColumnB Clustering Key
ColumnC
ColumnD

Now, I have a Spark RDD with columns ordered as RDD[ColumnC, ColumnA, ColumnB, ColumnD]

So, when I am writing to the Cassandra Table, I need to make sure the ordering is correct. So, I am having specify the column ordering using SomeColumns

rdd.saveToCassandra(keyspace,table,SomeColumns("ColumnA","ColumnB","ColumnC","ColumnD))

Is there any way I Can pass all the column names as a list instead? I am asking that Cause I have around 140 Columns in my target table and cannot give all the names as part of SomeColumns. So, looking for a more cleaner approach.

PS: I cannot write it from a DataFrame, I Am looking only for solution based on RDD's.


Solution

  • You can use following syntax to explode sequence into list of arguments:

    SomeColumns(names_as_sequence: _*)
    

    Update:

    If you have a sequence of column names as strings, then you need to do:

    SomeColumns(names_as_string_seq.map(x => x.as(x)): _*)