Search code examples
hashtablehash-collisionload-factor

Hash Table: Should I increase the element count on collisions?


Right now my hash tables count the number of every element inserted into the hash table. I use this count, with the total hash table size, to calculate the load factor and when it reaches like 70%, I rehash it.

I was thinking that maybe I should only count the inserted elements with fills an empty slot instead of all of them. Cause the collision method I'm using is separate chaining. The factor load keeps increasing but if there can be a few collisions leaving lots of empty slots on the hash table.

You are probably thinking that if I have that many collisions, maybe I'm not using the best hashing method. But that's not the point, I'm using one of the know hashing algorithms out there, I tested 3 of them on my sample data and selected the one who produced less collisions.

My question still remains. Should I keep counting every element inserted, or just the ones that fill an empty slot in the Hash Table?


Solution

  • Rehashing is intended to reduce the likelihood of collisions, so systematically ignoring collisions to decide when to rehash seems self-defeating.

    Best might be if you kept with each entry the original full hash value (a collision of course is instead determined by the hash modulo your current size) and counted only the collisions that are due to the modulo operation -- implicitly acknowledging that if a collision is due to identical full hash values for different items, there's nothing rehashing can do to help (unless by "rehashing" you also imply switching to a different hash function, but it doesn't look like that's what you mean here;-).

    Keeping the full hash values also means cheaper rehashing since you don't need to run the hash function again (how relevant is that depends on how costly your hash function is to compute, of course).