I have one table customer_info
in a Cassandra DB & it contains one column as billing_due_date
, which is date field (dd-MMM-yy
ex. 17-AUG-21). I need to fetch the certain fields from customer_info
table based on billing_due_date
where billing_due_date
should be equal to system date +1.
Can anyone suggest a Cassandra DB query for this?
fetch the certain fields from
customer_info
table based onbilling_due_date
transaction_id
is primarykey , It is just generated through uuid()
Unfortunately, there really isn't going to be a good way to do this. Right now, the data in the customer_info
table is distributed across all nodes in the cluster based on a hash of the transaction_id
. Essentially, any query based on something other than transaction_id
is going to read from multiple nodes, which is a query anti-pattern in Cassandra.
In Cassandra, you need to design your tables based on the queries that they need to support. For example, choosing transaction_id
as the sole primary key may distribute well, but it doesn't offer much in the way of query flexibility.
Therefore, the best way to solve for this query, is to create a query table containing the data from customer_info
with a key definition of PRIMARY KEY (billing_date,transaction_id)
. Then, a query like this should work:
> SELECT * FROM customer_info_by_date
WHERE billing_due_date = toDate(now()) + 2d;
billing_due_date | transaction_id | name
------------------+--------------------------------------+---------
2021-08-20 | 2fe82360-e314-4d5b-aa33-5deee9f03811 | Rinzler
2021-08-20 | 92cb9ee5-dee6-47fe-b372-0829f2e384cd | Clu
(2 rows)
Note that for this example, I am using the system date plus 2 days out. So in your case, you'll want to adjust the "duration" aspect from 2d
down to 1d
. Cassandra 4.0 allows date arithmetic, so this should work just fine if you are on that version. If you are not, you'll have to do the "system date plus one" calculation on the app side.
Another way to go about this, would be to create a secondary index on billing_due_date
, but I don't recommend that path as it will query multiple nodes to build the result set.