Search code examples
data-structureshashtablecomputer-scienceprobabilitybloom-filter

How to measure the rate of false positives in a Bloom Filter


You have a bloom filter, and you want to measure the rate of false positives, practically (not theoretically).

How do you go about it?

Do you insert N elements into it and count the number of hash collisions and divide by N, and that's it?

Or do you insert N elements and then do membership tests for all other elements which were not inserted (which is infinite in general)?

Or something else?


Solution

  • Imagine that your calculations say you'll have a false positive rate of X when there are N items in the Bloom filter.

    Generate 2N unique random keys. Place half of them into the Bloom filter. Now test with the other half. You know the keys are unique, so any "positive" hit you get will be a false positive.

    Compare your experimental result with the computed result.