Given a Cassandra database, is there a mechanism for fetching records in a FIFO manner such that records can be read in the ascending order of their insertion time. I basically need to read N oldest rows in batches, process them and delete the batch once it is processed.
As far as my understanding goes, Columns are sorted by their type (as specified by CompareWith), and rows are sorted by their partitioner.
Can I use OrderPreservingPartitioner to sort my rows in the ascending order of insertion time? I am running Cassandra on a single node so I am not really worried about the distribution of keys. If OrderPreservingPartitioner can be used, how do I configure the sort criteria for my keys so that the records are maintained in the ascending order of insertion?
Alternately, does Hector provide a mechanism to always fetch rows such that the oldest rows are fetched first?
Edit :
After reading rs_atl's post, I have some more doubts :
If I have understood this correctly, I will create a column family with TimeUUIDType as the comparator. I will then have to use timestamps for column names. The immediate question that comes to my mind is how do I define the sort order for the column names as ascending or descending? Can I do this at column family creation time or I have to do this through the client API?
If I decide to use 'hours' as my shard interval i.e, if I append hours to my keys, how do I retrieve the row for the oldest hour?
There are a number of things to consider when attempting such a solution with Cassandra:
Hector doesn't determine ordering at all; this happens on insert and is based on the comparator you've chosen. If you want a specific ordering you have to write the data that way (see point 3 above).
Regarding the additional information in your edit:
I wouldn't use TimeUUIDType as your comparator, just a long value that's either the Unix epoch or a numeric representation of time in the form of YYYYMMDDxx to the level of precision you need. You can decide at query time whether you want the values in normal (ascending) or reversed (descending) order.
You can ask for all keys and simply take the smallest one, which could work fine or be a terrible idea depending on how many you have and your latency requirements. Alternatively (and certainly more efficient), you could actually write the oldest key somewhere (a file, another CF, in memory, whatever makes sense).