Search code examples
cassandra

How to understand bloom_filter_fp_chance and read_repair_chance in Cassandra


Bloom Filters

When data is requested, the Bloom filter checks if the row exists before doing disk I/O.

Read Repair

Read Repair perform a digest query on all replicas for that key

My confusion is how to set this value between 0 to 1,. What happens when the value varies?

Thanks in advance,.


Solution

  • The bloom_filter_fp_chance and read_repair_chance control two different things. Usually you would leave them set to their default values, which should work well for most typical use cases.

    bloom_filter_fp_chance controls the precision of the bloom filter data for SSTables stored on disk. The bloom filter is kept in memory and when you do a read, Cassandra will check the bloom filters to see which SSTables might have data for the key you are reading. A bloom filter will often give false positives and when you actually read the SSTable, it turns out that the key does not exist in the SSTable and reading it was a waste of time. The better the precision used for the bloom filter, the fewer false positives it will give (but the more memory it will need).

    From the documentation:

    0 Enables the unmodified, effectively the largest possible, Bloom filter
    1.0 Disables the Bloom Filter
    The recommended setting is 0.1. A higher value yields diminishing returns.
    

    So a higher number gives a higher chance of a false positive (fp) when reading the bloom filter.

    read_repair_chance controls the probability that a read of a key will be checked against the other replicas for that key. This is useful if your system has frequent downtime of the nodes resulting in data getting out of sync. If you do a lot of reads, then the read repair will slowly bring the data back into sync as you do reads without having to run a full repair on the nodes. Higher settings will cause more background read repairs and consume more resources, but would sync the data more quickly as you do reads.

    See documentation on these settings here.