Search code examples
c++hashhashsetunordered-set

When does unordered_set invoke operator==?


In my thought,

This was how unordered_set insertion was working.

  • When insert was called, hash function is called.

  • In my code below I hash first and second value in pair<string, string> each.

    -> Test(make_pair("good","james")) ( make Test object )

    -> return hasher(test.get_name().first) + hasher(test.get_name().second) ( hash function returns )

    -> 8838398861423961495 ( hash value )

  • and It tries to find same hash value in hash table.

  • If it fails to find same hash value in hash table, then it make its hash value in hash table.

So, I thought elements with different hash value should not be compared in unordered_set.

And this is my code

#include <vector>
#include <string>
#include <iostream>
#include <unordered_set>

using namespace std;

class Test
{
public:
    Test(pair<string, string> n):name(n){};
    
    pair<string,string> get_name() const { return name; };

    bool operator==(const Test &test) const
    {
        cout << "compare " << get_name().first << ',' << get_name().second << ':' << test.get_name().first << ',' << test.get_name().second << '\n';
        return (name == test.get_name());
    }

private:
    pair<string,string> name;
};

namespace std
{
    template<>
    struct hash<Test>
    {
        hash<string> hasher;
        size_t operator() (const Test& test) const noexcept
        {
            cout << test.get_name().first << ',' << test.get_name().second << ':' << (hasher(test.get_name().first) + hasher(test.get_name().second)) << '\n';
            return hasher(test.get_name().first) + hasher(test.get_name().second);
        }
    };
}


int main()
{
    unordered_set<Test> hash_set;
    hash_set.insert(Test(make_pair("song","james")));
    hash_set.insert(Test(make_pair("kim","james")));
    hash_set.insert(Test(make_pair("kim","james")));

    cout << hash_set.size() << '\n';
    return 0;
}

It tries to insert three Test object in hash_set.

First one and Second one should not have same hash value ( But It can happen in possibility )

And I made operator== cout whenever its called to look when it is invoked.

Also the hash function.

And this is the results.

song,james:1897374899324052084
kim,james:6658567258605789090
compare song,james:kim,james
kim,james:6658567258605789090
compare kim,james:kim,james
2

How can (song, james) and (kim, james) can be compared ( by calling == ) even though they are not in the same bucket ( not same hash value )?

And also, If I change "song" to "good", It changes it result.

( I use Apple clang version 14.0.3 (clang-1403.0.22.14.1) and std=c++14 )


Solution

  • Each bucket of the hash table contains many different of your hashes. Otherwise you would need to store 2^64 buckets in memory if your hash is 64bit wide, for which you obviously don't have enough memory.

    Or in other words, the std::unordered_set implementation will apply its own hash to a smaller range on top of yours (although that can be very trivial such as taking the lowest bits of your hash). The whole point of the hash is to reduce the number of possible input values to a small range, hopefully uniformly distributed.

    So even if the hashes calculated by your hash function are different, they may still be put in the same bucket and then it is up to the implementation whether or not it will immediately use == to compare the elements belonging to the same bucket, or whether it will try your hash function's output for comparison first.

    And even more generally, there is in principle nothing forbidding the implementation to do a == comparison unnecessarily, although of course that wouldn't be good for the quality of the implementation.