data-structures hashmap hashtable chaining

When would you rehash a chaining hash table?

When inserting to in chaining hash table, when would you rehash? Would it be better to rehash when alpha = 1? (alpha = # of elements stored / size of hash table)

Solution

With chaining, you'll typically want to resize when any bucket contains more than a set number of items. For example, you might decide that it's okay for a bucket to contain up to 5 items. The first time you insert the sixth item into a bucket, you re-size the table.

Or, you might decide that the ideal number is 10, or 3. It's up to you how you want to balance retrieval performance with resize time. The smaller your bucket size, the faster your average lookup will be, but you'll have to resize the table more often. With a larger bucket size, you won't have to resize as often, but your average lookup time will be longer.

Worst case lookup time with a bucket size of 10 will be five times slower than with a bucket size of 2. But it will still be a whole lot faster than a sequential scan of a list, and you could get a 5x reduction in the number of times you have to re-hash. You should experiment with the optimum bucket size for your application.

One thing you can do to improve lookup time with larger bucket sizes is, on lookup, move the item that was just referenced to the head of the list in its bucket. The theory here is that some items are looked up much more often than others, so if you always move the most recently referenced thing to the head of the list, you're more likely to find it faster. This optimization doesn't do any good if items are referenced uniformly.

With chaining, you can often get good performance with load factors of 150% or 200%; sometimes even higher. Contrast that with re-hashing, which starts to degrade quickly at a load factor or 70% or 75%.