Search code examples
c++hashhashtableunordered-map

How does std::unordered_map determine the location of a specific key in a hash table?


The documentation mentions that std::unordered_map uses a hash table. How does it achieve O(1) lookup of a specific key in the hash table? The only way I can think of is to store each key at an address computed from the hash value of the data it holds. If this is the case, how does it keep all of the keys close together in memory such that it does not get used up quickly? Furthermore, what if more than one std::unordered_map is used? How does the implementation guarantee that no two maps will ever compute hashes that boil down to the same address?


Solution

  • Typically a hash map keeps an array of buckets inside. A bucket, on the other hand is a list of entries. And so something like this:

    template<class TKey, class TValue>
    class HashMap {
        vector<vector<pair<TKey, TValue>>> Buckets;
    };
    

    Then when you do a lookup, it simply takes the key, computes its hash, say hash, goes to Buckets[hash % Buckets.size], which is of type vector<pair<TKey,TValue>> and does a linear search on it. This makes the amortized (average) lookup complexity constant, while in the worst case (e.g. bad hashing function) it is linear. And indeed, this is the guarantee you get with the unordered_map.

    Note that the top level vector will eventually grow when you add elements (maybe even when you remove), in order to allow more entries and avoid collisions. Rehashing is likely to take place in such case. And so adding/removing elements is not trivial, and can be expensive from time to time.

    Also note that since vector allocates memory on the heap under the hood, then there is no issue with using more than one map. They share nothing (well, they may share something, but it doesn't affect the behaviour). But even if an implementation does not use vector (which is not mandatory, its just an example), some form of dynamic malloc has to be used to guarantee that.