Many books and tutorials say that the size of a hash table must be a prime to evenly distribute the keys in all the buckets. But Java's HashMap
always uses a size that is a power of two. Shouldn't it be using a prime? What's better, a "prime" or a "power of two" as the hash table size?
Using a power of two effectively masks out top bits of the hash code. Thus a poor-quality hash function might perform particularly badly in this scenario.
Java's HashMap
mitigates this by mistrusting the object's hashCode()
implementation and applying a second level of hashing to its result:
Applies a supplemental hash function to a given hashCode, which defends against poor quality hash functions. This is critical because HashMap uses power-of-two length hash tables, that otherwise encounter collisions for hashCodes that do not differ in lower bits.
If you have a good hash function, or do something similar to what HashMap
does, it does not matter whether you use prime numbers etc as the table size.
If, on the other hand, the hash function is of unknown or poor quality, then using a prime number would be a safer bet. It will, however, make dynamically-sized tables tricker to implement, since all of a sudden you need to be able to produce prime numbers instead of just multiplying the size by a constant factor.