Search code examples
cassandradata-modelingdatastaxdatastax-enterprise

cassandra compaction strategy for data which gets updated


I am trying to come up with a compaction strategy for following use case.

We have a table with ttl of 3 years. Most of the data in our scenario will be updated with in 1 month of its insertion.

So essentially all the updates to a record will happen within a month and on average within 2 weeks.

There might be some outliers which can potentially have update after a month but would be rare.

Now i am thinking of using TWCS with a window of 1 month(or may be 2 weeks) I know our use case is not a perfect time series data. but after a month most of the data will never receive an update and will reside in one sstable.

However i am not sure if using a window size of 1 month will have any side-effect.

Also if an update comes out of the window size(i.e after a month) will this create any major problem?

Please let me know what can be the best strategy for the above scenario?


Solution

  • TWCS might be a good pick. But it depends on the data size. If you have a massive data size you would get massive sstables after 1 Month. I would think it would be more reasonable to have Weekly/Biweekly SStables.

    But this takes us to the next question: "What happens with out-of-order updates?" The problem is that the sstable would not be dropped, even if it all expired, because of a "shadow" of the data in another sstable. So files would linger around in your hard drive longer than you expect. Also, since TWCS compacts the data ONCE after the window is done, so your data would spread over several sstables and potentially impacting your read performance.

    You have 2 options here:

    1. Start with TWCS and see how it goes, but knowing the potential drawbacks.
    2. Start with STCS, and create a node with either write-survey mode, or change in a single node the compaction strategy via JMX.

    If you an excellent article about TWCS, tombstones and shadowing here: http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

    Always remember you can change your compaction strategy later, it is not for "free" or "painless", but can be done.