Search code examples
cassandrainsertconditional-statementscql

Cassandra CQL: How to insert only records, which are not older than 3 years?


I have some table like this:

CREATE TABLE events (
   id int,
   eventdate timestamp,
   PRIMARY KEY (id)
);

What I'm trying to do is conditional insert, which going to verify if eventdate is not older than 3 years and insert data if the condition is met.

In SQL something similar could be achieved by DATEADD

How to handle it in Cassandra?


Solution

  • select * from events and iterate (paging) through the result set. Issue an insert for everything older than 3 years. A quick python script and giving it a day or two to run will accomplish it in less time than more elaborate things. Particularly if this is a one off thing. If you need to do it regularly I would recommend writing a spark job to do it. You can get more efficient if you dont want to use spark and want to run it locally by splitting up token ranges on the select statement to be the ring boundaries.

    Cassandra wont support large bulk operations that require reads before writes that must read entire data set. It wouldn't work on clusters its designed to support (think petabytes across many data centers).