I have a table for messages, like this:
messages (
user_id uuid,
id uuid,
message blob,
created_at timestamp,
updated_at timestamp,
PRIMARY KEY (user_id, id)
)
I can create a MV for sort and select messages by updated_at
. But when I need to select last updated messages from last time when client sync (e.g. select where updated_at > 1555602962006 and created_at < 1555602962006
) - only way is to select all messages by updated_at
and filter rows in code? Is this a normal practice in production?
Maybe in this case, it's possible to create some token to sort by concatenating created_at
and updated_at
or something?
So the tricky part about this, is that Cassandra will not allow a range query on multiple, different columns. Secondly, range queries only work within a partition, so you'll need to come up with a time "bucketing" strategy.
One way that I have modeled around this in the past, was to have one column for a timestamp, but different rows for the type of time. I'll put together an example here, using your table above. For a time bucket, I'll use month; essentially, I know I'll only ever query for messages created/updated in the last month (that may or may not work for you).
CREATE TABLE messages_by_month (
month_bucket int,
event_time timestamp,
user_id uuid,
id uuid,
message text,
event text,
PRIMARY KEY (month_bucket, event_time, id));
After INSERTing some data, I can now query for a range of created and updated event times, like this:
SELECT id,event_time,event,message
FROM messages_by_month
WHERE month_bucket=201905
AND event_time > 1558619000000
AND event_time < 1558624900000;
id | event_time | event | message
--------------------------------------+---------------------------------+---------+---------
a4d60c29-ad4e-4023-b869-edf1ea6207e2 | 2019-05-23 14:00:00.000000+0000 | CREATED | hi!
66e78a1e-dbcb-4f64-a0aa-6d5b0e64d0ed | 2019-05-23 14:20:00.000000+0000 | CREATED | hi!
f1c59bf4-1351-4527-a24b-80bb6e3a2a5c | 2019-05-23 15:00:00.000000+0000 | UPDATED | hi2!
a4d60c29-ad4e-4023-b869-edf1ea6207e2 | 2019-05-23 15:20:00.000000+0000 | UPDATED | hi3!
(4 rows)
While this exact example may not work for you, the idea is to think about organizing your data differently to support your query.
Note: With this example, you'll need to watch your partition sizes. You may end up having to add another partition key if this table gets more than 50k-100k messages in a month.