In Cassandra, a row can be very long and store units of time relevant data. For example, one row could look like the following:
RowKey: "weather"
name=2013-01-02:temperature, value=90,
name=2013-01-02:humidity, value=23,
name=2013-01-02:rain, value=false",
name=2013-01-03:temperature, value=91,
name=2013-01-03:humidity, value=24,
name=2013-01-03:rain, value=false",
name=2013-01-04:temperature, value=90,
name=2013-01-04:humidity, value=23,
name=2013-01-04:rain, value=false".
9 columns of 3 days' weather info. time is a primary key in this row. So the order of this row would be time based.
My question is, is there any way for me to do a query like: what is the last/first day's humidity value in this row? I know I could use a Order By statement in CQL but since this row is already sorted by time, there should be some way to just get the first/last one directly, instead of doing another sort. Or is cassandra optimizing it already with Order By statement under the hood?
Another way I could think of is, store another column in this row called "last_time_stamp" that always updates itself as new data is inserted in. But that would require one more update every time I insert new weather data.
Thanks for any suggestion!:)
Without seeing more of your actual table, I suggest using a timestamp (or timeuuid if there is a possibility for collisions) as the second component in a compound primary key. Using this, you can get the last "row" by selecting ORDER BY t DESC LIMIT 1.
You could also change the clustering order in your schema to order it naturally for "last N" queries.
Please see examples and linked resource in this answer.