Design schema to query select beetwen two separate dates

I have a table for messages, like this:

messages (
    user_id uuid,
    id uuid,
    message blob,
    created_at timestamp,
    updated_at timestamp,
    PRIMARY KEY (user_id, id)
)

I can create a MV for sort and select messages by updated_at. But when I need to select last updated messages from last time when client sync (e.g. select where updated_at > 1555602962006 and created_at < 1555602962006) - only way is to select all messages by updated_at and filter rows in code? Is this a normal practice in production?

Maybe in this case, it's possible to create some token to sort by concatenating created_at and updated_at or something?

Solution

So the tricky part about this, is that Cassandra will not allow a range query on multiple, different columns. Secondly, range queries only work within a partition, so you'll need to come up with a time "bucketing" strategy.

One way that I have modeled around this in the past, was to have one column for a timestamp, but different rows for the type of time. I'll put together an example here, using your table above. For a time bucket, I'll use month; essentially, I know I'll only ever query for messages created/updated in the last month (that may or may not work for you).

CREATE TABLE messages_by_month (
  month_bucket int,
  event_time timestamp,
  user_id uuid,
  id uuid,
  message text,
  event text,
  PRIMARY KEY (month_bucket, event_time, id));

After INSERTing some data, I can now query for a range of created and updated event times, like this:

SELECT id,event_time,event,message
FROM messages_by_month 
WHERE month_bucket=201905
  AND event_time > 1558619000000
  AND event_time < 1558624900000;

 id                                   | event_time                      | event   | message
--------------------------------------+---------------------------------+---------+---------
 a4d60c29-ad4e-4023-b869-edf1ea6207e2 | 2019-05-23 14:00:00.000000+0000 | CREATED |     hi!
 66e78a1e-dbcb-4f64-a0aa-6d5b0e64d0ed | 2019-05-23 14:20:00.000000+0000 | CREATED |     hi!
 f1c59bf4-1351-4527-a24b-80bb6e3a2a5c | 2019-05-23 15:00:00.000000+0000 | UPDATED |    hi2!
 a4d60c29-ad4e-4023-b869-edf1ea6207e2 | 2019-05-23 15:20:00.000000+0000 | UPDATED |    hi3!

(4 rows)

While this exact example may not work for you, the idea is to think about organizing your data differently to support your query.

Note: With this example, you'll need to watch your partition sizes. You may end up having to add another partition key if this table gets more than 50k-100k messages in a month.