Search code examples
sortingcassandrahector

How to get sorted rows out of cassandra when using RandomPartioner and Hector as Client?


To improve my skills on Hector and cassandra I'm trying diffrent methods to query data out of cassandra.

Currently I'm trying to make a simple message system. I would like to get the posted messages in chronological order with the last posted message first.

In plain sql it is possible to use 'order by'. I know it is possible if you use the OrderPreservingPartitioner but this partioner is deprecated and less-efficient than the RandomPartioner. I thought of creating an index on a secondary column with a timestamp als value, but I can't figure out how to obtain the data. I'm sure that I have to use at least two queries.

My column Family looks like this:

create column family messages
with comparator = UTF8Type
and key_validation_class=LongType
and compression_options =
{sstable_compression:SnappyCompressor, chunk_length_kb:64}
and column_metadata = [
{column_name: message, validation_class: UTF8Type}
{column_name: index, validation_class: DateType, index_type: KEYS}
];

I'm not sure if I should use DataType or long for the index column, but I think that's not important for this question.

So how can I get the data sorted? If possible I like to know hows its done white the CQL syntax and whitout.

Thanks in advance.


Solution

  • I don't think there's a completely simple way to do this when using RandomPartitioner.

    The columns within each row are stored in sorted order automatically, so you could store each message as a column, keyed on timestamp.

    Pretty soon, of course, your row would grow large. So you would need to divide up the messages into rows (by day, hour or minute, etc) and your client would need to work out which rows (time periods) to access.

    See also Cassandra time series data and http://rubyscale.com/2011/basic-time-series-with-cassandra/ and https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/ and http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/