Search code examples
data-structureshashmaptime-complexitybig-ohashtable

Time Complexity of Hash Map Traversal


What is the best, average and worst case time complexity for traversing a hash map under the assumption that the hash map uses chaining with linked lists.

I've read multiple times that the time complexity is O(m+n) for traversal for all three cases (m=number of buckets, n=number of elements). However, this differs from my time complexity analysis: In the worst case all elements are linearly chained in the last bucket which leads to a time complexity of O(m+n). In the best case no hash collisions happen and therefore time complexity should be O(m). In the average case I assume that the elements are uniformly distributed, i.e. each bucket on average has n/m elements. This leads to a time complexity of O(m * n/m) = O(n). Is my analysis wrong?


Solution

  • In practice, a good implementation can always achieve O(n). GCC's C++ Standard Library implementation for the hash table containers unordered_map and unordered_set, for example, maintains a forward/singly linked list between the elements inserted into the hash table, wherein elements that currently hash to the same bucket are grouped together in the list. Hash table buckets contain iterators into the singly-linked list for the point where the element before that bucket's colliding elements start (so if erasing an element, the previous link can be rewired to skip over it).

    During traversal, only the singly-linked list need be consulted - the hash table buckets are not visited. This becomes especially important when the load factor is very low (many elements were inserted, then many were erased, but in C++ the table never reduces size, so you can end up with a very low load factor.

    IF instead you have a hash table implementation where each bucket literally maintains a head pointer for its own linked list, then the kind of analysis you attempted comes into play.

    You're right about worst case complexity.

    In the best case no hash collisions happen and therefore time complexity should be O(m).

    It depends. In C++ for example, values/elements are never stored in the hash table buckets (which would waste a huge amount of memory if the values were large in size and many buckets were empty). If instead the buckets contain the "head" pointer/iterator for the list of colliding elements, then even if there's no collision at a bucket, you still have to follow the pointer to a distinct memory area - that's just as bothersome as following a pointer between nodes on the same linked list, and is therefore normally included in the complexity calculation, so it's still O(m + n).

    In the average case I assume that the elements are uniformly distributed, i.e. each bucket on average has n/m elements.

    No... elements being uniformly distributed across buckets is the best case for a hash table: see above. An "average" or typical case is where there's more variation in the number of elements hashing to any given bucket. For example, if you have 1 million buckets and 1 million values and a cryptographic strength hash function, you can statistically expect 1/e (~36.8%) buckets to be empty, 1/1!e (simplifies to 1/1e) buckets to have 1 element, 1/2!e (~18.4%) buckets to have 2 colliding elements, 1/3!e (~6.1%) buckets to have 3 colliding elements and so on (the "!" is for factorial...).

    Anyway, the key point is that a naive bucket-visiting hash table traversal (as distinct from actually being able to traverse a list of elements without bucket-visiting), always has to visit all the buckets, then if you imagine each element being tacked onto a bucket somewhere, there's always one extra link to traverse to reach it. Hence O(m+n).