Search code examples
cassandra-2.0cql3datastaxcqlsh

How to get the last "row" in a cassandra's long row


In Cassandra, a row can be very long and store units of time relevant data. For example, one row could look like the following:

RowKey: "weather"
name=2013-01-02:temperature, value=90, 
name=2013-01-02:humidity, value=23, 
name=2013-01-02:rain, value=false",
name=2013-01-03:temperature, value=91, 
name=2013-01-03:humidity, value=24, 
name=2013-01-03:rain, value=false",
name=2013-01-04:temperature, value=90, 
name=2013-01-04:humidity, value=23, 
name=2013-01-04:rain, value=false".

9 columns of 3 days' weather info. time is a primary key in this row. So the order of this row would be time based.

My question is, is there any way for me to do a query like: what is the last/first day's humidity value in this row? I know I could use a Order By statement in CQL but since this row is already sorted by time, there should be some way to just get the first/last one directly, instead of doing another sort. Or is cassandra optimizing it already with Order By statement under the hood?

Another way I could think of is, store another column in this row called "last_time_stamp" that always updates itself as new data is inserted in. But that would require one more update every time I insert new weather data.

Thanks for any suggestion!:)


Solution

  • Without seeing more of your actual table, I suggest using a timestamp (or timeuuid if there is a possibility for collisions) as the second component in a compound primary key. Using this, you can get the last "row" by selecting ORDER BY t DESC LIMIT 1.

    You could also change the clustering order in your schema to order it naturally for "last N" queries.

    Please see examples and linked resource in this answer.