What flavors of compaction (e.g., size-tier/level compactions) are supported? And what parameters control the behavior of compactions?
YugabyteDB Compactions Overview:
YugabyteDB's compactions are size tiered. Size tier compactions have the advantage of lower disk write (IO) amplification when compared to level compactions. There's sometimes concern that size-tiered compactions have higher space amplification (that it needs 50% space head room). This is not true in YugabyteDB because each table is broken into several shards, and the number of concurrent compactions across shards are throttled to a certain maximum (~4; but exact number depends on number of cores). So if a node has N shards, then the amount of extra space needed is only (4 / N + 4). Therefore the typical space amplification in YugabyteDB tends to be in the 10-20% range.
By default compactions are triggered automatically as new data arrives and memstores are flushed to create SSTable files. The default policy makes sure that doing a compaction is worthwhile -- for example, the algorithm tries to make sure that the files being compacted are somewhat in the similar size ballpark. For example, it does not make sense to compact a 100GB file with a 1GB file to produce a 101GB file -- that would be a lot of unnecessary IO for less gain. These knobs guide this selection:
--rocksdb_universal_compaction_min_merge_width (default 4)
--rocksdb_universal_compaction_size_ratio (default 20)
By default, compactions run only if there are at least 4 eligible files and their running total (summation of size of files considered so far) is within 20% of the next file in consideration to be included into the same compaction.
--rocksdb_max_background_compactions (e.g, 4)
--rocksdb_compact_flush_rate_limit_bytes_per_sec (e.g., 268435456)
• Using yb-admin tool, YugabyteDB also allows manual compactions to be externally triggered on a table. This can be useful for cases when new data is not coming into the system for a table anymore, but user wants reclaim disk space due to overwrites/deletes that have already happened or due to TTL expiry.