Search code examples
cassandrascylla

Disk space requirement for compaction on a token range in scylla/cassandra


I am using SizeTieredCompaction strategy in Scylla db. I deleted half of my data in a specific token range (let's say x to y). My gc_grace_seconds is set to 6 hours. I want to get rid of all the tombstones that are created in this token range. If I run nodetool compact --start-token x --end-token y keyspace table on all the nodes in cluster after gc_grace_seconds has passed, what would happen? will it delete the tombstones and how much disk space will it consume? Will it be same as nodetool compact major compaction that needs 50% more space?


Solution

  • Scylla's documentation of nodetool compact (see https://docs.scylladb.com/operating-scylla/nodetool-commands/compact/) doesn't even the token range option, unfortunately. But the Cassandra documentation (https://cassandra.apache.org/doc/latest/operating/compaction/index.html) explains what the so-called sub-range compaction does:

    It is possible to only compact a given sub range - this could be useful if you know a token that has been misbehaving - either gathering many updates or many deletes. (nodetool compact -st x -et y) will pick all sstables containing the range between x and y and issue a compaction for those sstables. For STCS this will most likely include all sstables but with LCS it can issue the compaction for a subset of the sstables.

    With STCS the common case is that all sstables have tokens from all over the token ring, so your nodetool compact call will usually invoke a full major compaction of all sstables. The token range option will likely not exempt any of the sstables from being compacted. So the temporary disk space overhead will be as usual with STCS: At the end of the compaction, you have both the old sstables, and the new one. You assumed the new ones have only half of the original data, so the new sstable will be around half the total size of the old sstable, so this is probably the "50%" you asked about.