Search code examples
pythonpython-3.xhashhashmaphashcode

What's the difference between using & vs MOD operators in calculating indexes (Hash Table implementation)?


For general hash table implementation:

  • Compute the hash of the key, hash(key)=hashcode

  • Map the hashcode to the table/array. hashcode % array_length = index

  • once we get the index, we add a node (the key, value, update next pointer) in the Linked List at that index.

So, the question is, what's the difference between:

def _get_index(self, key):

   # compute the hashcode
   hash_code = hash(key)
   array_index = hash_code & 15  # FIXME : why?
   return array_index

and

 array_index = hash_code % 15

For example: for INPUT:

hm =MyHashMap()
hm.put("1", "sachin")
hm.put("2", "sehwag")
hm.put("3", "ganguly")
print(hm.get("1"))
print(hm.get("2"))
print(hm.get("3"))

OUTPUT:

sachin
sehwag
ganguly

'&' operator instead of '%' that doesn't make sense to me? because it doesn't work always as % operator in computing the index, But, I have seenn developer using & in some implementations of Hashtable

Any suggestion?


Solution

  • array_index = hash_code & 15
    

    is equivalent to (on positive values):

    array_index = hash_code % 16
    

    it only works in the case when all significant bits of the number are ones (which is when the number is of the form 2**n - 1).

    Both remove the highest part of the number bits.

    Bit-masking is way faster than division. So it's used when possible to speed up computations. Every time you see:

    b = a % modulo
    

    with a > 0 and modulo is a power of 2 (modulo == 2**n), you can write:

    b = a & (modulo-1)
    

    instead. If the modulo is not a power of 2, then it cannot be done that way (and compiled languages optimizers often substitute power of 2 modulos or divisions/multiplications by the faster bit masking/shifting operation)

    Even if it's true that bit-masking is way faster than division/modulus in assembly language, python is interpreted and that speed optimization is not really noticeable. Anyway if the intent is to mask bits, & operator makes more sense.