Search code examples
javaapache-sparkcassandraspark-cassandra-connector

How to get range of rows using spark in Cassandra


I have a table in cassandra whose structure is like this

CREATE TABLE dmp.Table (

pid text PRIMARY KEY,
day_count map<text, int>, 
first_seen map<text, timestamp>, 
last_seen map<text, timestamp>, 
usage_count map<text, int>
}

Now I'm trying to query it using spark-cassandra driver , So is there any where I can get the chunks of data. As in if I have 100 rows , I should be able to get 0-10 rows then 10 -20 and so on.

 CassandraJavaRDD<CassandraRow> cassandraRDD = CassandraJavaUtil.javaFunctions(javaSparkContext).cassandraTable(keySpaceName, tableName);

I'm asking this as there is no column in my table where I can Query using IN clause to get range of rows.


Solution

  • You can add an auto-incrementing ID coloumn -- see my DataFrame-ified Zip With Index solution. Then you can query by the newly-created id column:

    SELECT ... WHERE id >= 0 and id < 10;
    

    Etc.