Assume that I have the following table CQL (well a fragment of the table):
CREATE TABLE order (
order_id UUID PRIMARY KEY,
placed timestamp,
status text,
)
Now if status could be one of PLACED, SHIPPED, or DELIVERED as an enum, I want to find all of the orders that are in PLACED status to process them. Given there are millions of orders and all orders are ultimately ending up in DELIVERED status a materialized view doesn't feel like the right solution to the problem. I am wondering what ideas there are to solve the problem of this low cardinality index without passing through the whole data set. Ideas?
I would recommend a table like
CREATE TABLE order_by_status (
order_id UUID,
placed timestamp,
status text,
PRIMARY KEY ((status), order_id)
)
Then you can iterate through the query to SELECT * FROM order_by_status WHERE status = 'PLACED';
. Millions shouldnt be too much of an issue but it would be good to prevent it from getting too large by partitioning by some date window.
CREATE TABLE order_by_status (
order_id UUID,
placed timestamp,
bucket text,
status text,
PRIMARY KEY ((status, bucket), order_id)
)
Where bucket is a string generated from timestamp like 2017-10
from the YYYY-MM. You might wanna stay away from MV's for a little bit yet, it has some bugs in current version. I would also recommend against secondary indexes for this model, using a 2nd table and issuing inserts to both is going to be your best solution.