Search code examples
c++data-structurestime

How does `unordered_set` implement `find` in O(1) time?


The C++ documentation states that the find operation has an average-case time complexity of Θ(1), and a worst-case complexity of O(n), where n is the size of the set. The worst-case bound makes sense. I don't see why the average-case bound does. I'm not too familiar with average-case complexity, but naively, I would expect the argument of find to be uniformly distributed over the set, so that searching for it linearly would yield an expected time of n/2, for an average time of Θ(n/2) = Θ(n).

I tried looking this up on various documentation websites (GeeksForGeeks, etc.) to no avail. This question is similar, but only asks what the complexity of find operation is, not why it is what it is.

Apparently, unordered_set is implemented using buckets and hash maps. I'm not very familiar with these; perhaps they hold the key.


Solution

  • Very informal explanation:

    A hash table does not search through all the elements when you look something up.

    The table is an array of "buckets" (usually lists).

    Hashing (which is assumed to take constant time) produces a number, from that number you derive (in constant time) an array index, and array indexing takes constant time.

    Finding the correct element inside the bucket is linear in the size of the bucket, but with few enough collisions it is constant on average.
    Thus, the time complexity is constant on average (in the size of the table).

    The worst case occurs when all elements are in the same bucket, since that does involve a linear search through all the elements..