Search code examples
algorithmhashtable

Time complexity of resizing hash table implemented with linear probing


i have go through some articles but still not clear about answer of this.

suppose if i need to resize a hash table implemented with linear probing (i.e. h(x) = ((hash(x) mod hash table capacity) + 1) until empty slot found)

to resize the hash table (say if capacity reach certain % or new element cant be inserted after iterated the current hash table)

let say in the resizing operation we double the current hash table size, the steps needed i imagine would be

  1. get current data nodes in the hash table
  2. create new list/array with size 2m of empty element (m be the current capacity of the hash table before resize)
  3. for each data node we stored, do a insert operation (that would calculate the new hash by the new capacity and insert the old data node in new hash table with new hash key, worst case here for the insertion should be O(n)!? since linear probing in worst case seems need to go through the entire hash table entries)

the third step looks to me is an O(n^2) operation as it iterate data nodes (worst case with n nodes) and for each it do a insertion operation with O(n), is it a correct understanding? or the operation stated here is not optimized / differ from standard!?

most sites stated that insertion operation for hash table is O(n) but i think they are either using quadratic probing or universal hash function that assume after resize the insertion would be O(1). just want to confirm my understanding here.

thanks in advance !

(i also found this implementation online seems echo with my understanding, resize() and put() for reference) https://algs4.cs.princeton.edu/34hash/LinearProbingHashST.java


Solution

  • In step 3 of the resizing, it's possible to re-insert all the entries in O(n) time.

    First, sort the entries in the source table by target bucket. Use a linear time bucket sort :-) You can use the new array as the required temporary storage.

    Then re-insert all the entries in order of target bucket. By remembering the last insert position, you can start each search at max(target_slot, last_insert_slot+1) (be careful at the wrap-around). This bypasses all the overlapping search regions.