Search code examples
cassandrascylla

How should I choose parameters for size tiered compaction strategy?


I have these 2 particular use cases:

  1. Streaming jobs, writing 30mb every 5 seconds
  2. Batch jobs, writing 500 gb every morning

The TTL of my tables in 1,5 years.

These writes can contain many updates, so, according to this table right here:

enter image description here

I should use the SizeTieredCompactionStrategy. However, how do I choose the correct parameters for it?

It has several parameters:

bucket_high

bucket_low

min_sstable_size

min_threshold

max_threshold


Solution

  • As a general proposition, it is very rare for operators to have to configure the size-tiered compaction sub-properties.

    Unless you're very experienced with Cassandra, there just isn't any reason to reconfigure the defaults for STCS. That is why it is default compaction strategy out-of-the-box and is suitable for majority of workloads.

    The exceptions are using TWCS for true time-series use cases and LCS for very read-heavy with hardly any writes. Cheers!