When I delete a row from cassandra and the data still lives in memtable (no SSTable has been created yet), it looks like that deleted row is never getting cleaned up in memtable since the tombstone cleanup is only done by compaction and compaction only applies to SSTables. Is there anyway I can completely cleanup that deleted row from memtable itself, before flushing it to SSTable? Updates are in place but looks like deletes are not.
We are using Cassandra 2.0.8.
Thanks
If you have RF > 1, the tombstones still need to be persisted to disk to ensure that the deletion was safely transmitted to all replicas. For example, consider the following:
RF = 3 N = 3
You have a table of employees to fire at the end of the month. You add John Smith to the list of employees to terminate. Two minutes later, John Smith does something awesome, and you want to remove him from the list. You delete his entry, but one of the 3 nodes is offline - John Smith is still in the list of employees to fire for that offline node.
When the memtable flushes on one of the "up" nodes, it will persist the tombstone indicating that John Smith should not be fired, because when that offline server comes up, it needs to know that John Smith's job is safe.
Compaction will eventually remove the tombstone after gc_grace_seconds, but the underlying behavior is correct: if you write a cell, and then immediately delete it, you still have to save the tombstone to disk in order to make sure all replicas properly delete that cell.