Search code examples
javaoperatorsguavabloom-filter

Logic of code Guava's mightContain


I was going through the code of Guava library, i was interested to understand the probabilistic match code of mightContain. could any one explain what they are doing in the code specially with the bit wise operator. here is the code....

public <T> boolean mightContain(T object, Funnel<? super T> funnel,
        int numHashFunctions, BitArray bits) {
      long hash64 = Hashing.murmur3_128().newHasher().putObject(object, funnel).hash().asLong();
      int hash1 = (int) hash64;
      int hash2 = (int) (hash64 >>> 32);
      for (int i = 1; i <= numHashFunctions; i++) {
        int nextHash = hash1 + i * hash2;
        if (nextHash < 0) {
          nextHash = ~nextHash;
        }
        // up to here, the code is identical with the previous method
        if (!bits.get(nextHash % bits.size())) {
          return false;
        }

Solution

  • Assuming this is code from the Bloomfilter class, the logic goes like this:

    Given the key, perform all of the chosen hashes on that key. Use each hash to pick a bit number and check if that bit is set. If any bits are not set in the filter at that position then this key cannot have been added.

    If all of the bits are found to be set then we can only say that the filter might have had the key added. This is because it is possible for a different key (or a combination of a number of different keys) to result in all of the checked bits being set.

    Note that the adding of a key to the filter does almost exactly the same function except that it **set**s all of the bits generated.

    A Bloom Filter object operates as follows.

    1. A number of hash functions are chosen, each will calculate the location of a bit in the filter. (see Optimal number of hash functions for discussion on how many).
    2. Hold an arbitrary length bit pattern - the length is unimportant but it should be big enough (see Probability of false positives for a discussion on what big enough means).
    3. Each time a key is added to the filter, all configured hash functions are performed on the key resulting in a number of bits being set in the pattern.
    4. To check if a key has already been added, perform all of the hash functions and check the bit found there. If any are found to be zero then this key certainly has not been added to the filter.
    5. If all bits are found to be set then then it may be that this key has been added. You will need to perform further checks to confirm.