Search code examples
databasedistributed-databaseyugabytedb

Working of compactions work in YugaByte DB


What flavors of compaction (e.g., size-tier/level compactions) are supported? And what parameters control the behavior of compactions?


Solution

  • YugabyteDB Compactions Overview:

    • YugabyteDB's compactions are size tiered. Size tier compactions have the advantage of lower disk write (IO) amplification when compared to level compactions. There's sometimes concern that size-tiered compactions have higher space amplification (that it needs 50% space head room). This is not true in YugabyteDB because each table is broken into several shards, and the number of concurrent compactions across shards are throttled to a certain maximum (~4; but exact number depends on number of cores). So if a node has N shards, then the amount of extra space needed is only (4 / N + 4). Therefore the typical space amplification in YugabyteDB tends to be in the 10-20% range.

    • By default compactions are triggered automatically as new data arrives and memstores are flushed to create SSTable files. The default policy makes sure that doing a compaction is worthwhile -- for example, the algorithm tries to make sure that the files being compacted are somewhat in the similar size ballpark. For example, it does not make sense to compact a 100GB file with a 1GB file to produce a 101GB file -- that would be a lot of unnecessary IO for less gain. These knobs guide this selection:

    --rocksdb_universal_compaction_min_merge_width (default 4)
    --rocksdb_universal_compaction_size_ratio (default 20)
    

    By default, compactions run only if there are at least 4 eligible files and their running total (summation of size of files considered so far) is within 20% of the next file in consideration to be included into the same compaction.

    • YugabyteDB also provides a way to control how much system resources the compaction process is allowed to take in the system overall. It automatically picks some settings based on the number of CPUs, but users can modify this explicitly based on the disk bandwidth available as well. The flags governing this setting are:
    --rocksdb_max_background_compactions (e.g, 4)
    --rocksdb_compact_flush_rate_limit_bytes_per_sec (e.g., 268435456)
    
    • In addition to throttling controls for compactions, YugabyteDB does a variety of internal optimizations to minimize impact of compactions on foreground latencies. One such is a prioritizated queue to give priority to small compactions over large compactions to make sure the number of SSTable files for any tablet stays as low as possible.

    • Using yb-admin tool, YugabyteDB also allows manual compactions to be externally triggered on a table. This can be useful for cases when new data is not coming into the system for a table anymore, but user wants reclaim disk space due to overwrites/deletes that have already happened or due to TTL expiry.