Search code examples
pythonapproximationbloom-filter

How can I get the size of bloom filter set while using union or intersection function?


I'm trying to get the size of bloom filter set while using bloom filter's union & intersection functions with python package(https://github.com/jaybaird/python-bloomfilter.git)

I though that after conducting the function 'union' or 'intersection', then I could get the result by adding len() function, but it just print out only '0' output.

from pybloom import BloomFilter
bf1 = BloomFilter(1000)
bf2 = BloomFilter(1000)

# After adding some elements to bf1 and bf2
print(len(bf1.union(bf2)))
# expected max(len(bf1), len(bf2)) but the result was 0

After I find the document page, I realized that the len() option become disabled after 'union' function and its actual result len() was 0.

Instead, I want to approximate the size of bloom filter set somehow. Do you have any idea in order to calculate the size of it?


Solution

  • The implementation only copies BloomFilter's bitarray, i.e. self.bitarray. The elements self.count in previous filters are not counted in.

    So it doesn't union the elements - but do a bitarray or.


    Update:

    In most cases you don't need to approximate the count. It provided a precise count of elements when you call add, and you can just call len(bf3). Unfortunately new created bf3 has not been called add so len(bf3) == 0.

    For the formula to approximate number of elements,

    - m / k * ln(1- n / m)
    

    You have

    import math.log as ln
    
    m = bf3.bitarray.length()
    n = bf3.bitarray.count()
    k = bf3.num_slices
    
    # given m=20, n=8, approximate n elements as 5.89