CASSANDRA_TABLE has (some_other_column, itemid) as primary key.
val cassandraRdd: CassandraTableScanRDD[CassandraRow] = sparkSession.sparkContext
.cassandraTable(cassandraKeyspace, cassandraTable)
cassandraRdd.take(10).foreach(println)
This cassandraRdd has all columns read from my cassandra table
val temp1: CassandraTableScanRDD[((String), CassandraRow)] = cassandraRdd
.select("itemid", "column2", "column3")
.keyBy[(String)]("itemid")
val temp2: CassandraTableScanRDD[((String), CassandraRow)] = cassandraRdd
.keyBy[(String)]("itemid")
temp1.take(10).foreach(println)
temp2.take(10).foreach(println)
Both temp1 and temp2 are not retaining all columns after that keyBy operation
((988230014),CassandraRow{itemid: 988230014})
How can I keyBy on certain column and have CassandraRow retain all columns?
To retain partitioner and get selected rows I have to read cassandra rows something like this below
val cassandraRdd: CassandraTableScanRDD[((String, String), (String, String, String))] = {
sparkSession.sparkContext
.cassandraTable[(String, String, String)](cassandraKeyspace, cassandraTable)
.select("some_other_column" as "_1", "itemid" as "_2", "column3" as "_3", "some_other_column", "itemid")
.keyBy[(String, String)]("some_other_column", "itemid")
}