Search code examples
pythonjythonbloom-filter

Modern, high performance bloom filter in Python?


I'm looking for a production quality bloom filter implementation in Python to handle fairly large numbers of items (say 100M to 1B items with 0.01% false positive rate).

Pybloom is one option but it seems to be showing its age as it throws DeprecationWarning errors on Python 2.5 on a regular basis. Joe Gregorio also has an implementation.

Requirements are fast lookup performance and stability. I'm also open to creating Python interfaces to particularly good c/c++ implementations, or even to Jython if there's a good Java implementation.

Lacking that, any recommendations on a bit array / bit vector representation that can handle ~16E9 bits?


Solution

  • Eventually I found pybloomfiltermap. I haven't used it, but it looks like it'd fit the bill.