I'm trying to get the size of bloom filter set while using bloom filter's union & intersection functions with python package(https://github.com/jaybaird/python-bloomfilter.git)
I though that after conducting the function 'union' or 'intersection', then I could get the result by adding len() function, but it just print out only '0' output.
from pybloom import BloomFilter
bf1 = BloomFilter(1000)
bf2 = BloomFilter(1000)
# After adding some elements to bf1 and bf2
print(len(bf1.union(bf2)))
# expected max(len(bf1), len(bf2)) but the result was 0
After I find the document page, I realized that the len() option become disabled after 'union' function and its actual result len() was 0.
Instead, I want to approximate the size of bloom filter set somehow. Do you have any idea in order to calculate the size of it?
The implementation only copies BloomFilter's bitarray, i.e. self.bitarray
. The elements self.count
in previous filters are not counted in.
So it doesn't union the elements - but do a bitarray or
.
Update:
In most cases you don't need to approximate the count. It provided a precise count of elements when you call add
, and you can just call len(bf3)
. Unfortunately new created bf3
has not been called add
so len(bf3) == 0
.
For the formula to approximate number of elements,
- m / k * ln(1- n / m)
You have
import math.log as ln
m = bf3.bitarray.length()
n = bf3.bitarray.count()
k = bf3.num_slices
# given m=20, n=8, approximate n elements as 5.89