Search code examples
c++c++11hashunordered-set

Custom hash for std::tuple doesn't work with unordered_set


I hope you help me to understand what's wrong with my code.

Basically I need an unordered_set of tuples, but every time I call the insert function, I see that even when the hash is the same, the tuple is being inserted which means that I have tuples with the same hash in an unordered_set. To be more clear, let me share a piece of code here.

So, this is how my example looks like:

#include <iostream>
#include <tuple>
#include <unordered_set>

using namespace std;
using namespace testing;

struct MyHash {
    size_t operator()(const tuple<int, int, int>& t) const
    {
        auto [num1, num2, num3] = t;
        vector<int> v(3);
        v[0] = num1;
        v[1] = num2;
        v[2] = num3;
        sort(v.begin(), v.end());
        string key = to_string(v[0]) + "|" + to_string(v[1]) + "|" + to_string(v[2]);
        auto hashValue = hash<string>()(key);
        cout << "Hash value for " << key << "= " << hashValue << endl;
        return hashValue;
    }
};

int main(int argc, char** argv)
{
    unordered_set<tuple<int, int, int>, MyHash> s;
    s.insert(make_tuple(1, 2, 3));
    s.insert(make_tuple(1, 3, 2));
    s.insert(make_tuple(3, 2, 1));

    cout << "Amount of items = " << s.size() << endl;
}

and this is the output I got:

Hash value for 1|2|3= 12066275531359578498
Hash value for 1|2|3= 12066275531359578498
Hash value for 1|2|3= 12066275531359578498
Amount of items = 3

Why if the hash value for each entry is the same the amount of item inserted is 3? I was expecting having only one.

Regards


Solution

  • It is perfectly normal for two unequal elements to have the same hash (a situation known as "hash collision"). After all, there's an infinite number of, say, strings, but only a finite number of values representable in size_t. Observe that std::unordered_set takes two separate template parameters - one to compute the hash, and another to check whether two elements are equal.

    If you want to treat make_tuple(1, 2, 3) and make_tuple(1, 3, 2) as equivalent, you need to pass a suitable implementation of equality comparison as a third template parameter of std::unodered_set, in addition to the suitable hash implementation.

    In short: equal elements must hash to the same value, but the converse is not true - elements hashing to the same value are not necessarily equal.