Search code examples
c++hashmapunordered-mapuser-defined-typeshash-collision

C++ Hash Table - How is collision for unordered_map with custom data type as keys resolved?


I have defined a class called Point which is to be used as a key inside an unordered_map. So, I have provided an operator== function inside the class and I have also provided a template specialization for std::hash. Based on my research, these are the two things I found necessary. The relevant code is as shown:

class Point
{
    int x_cord = {0};
    int y_cord = {0};
public:
    Point()
    {

    }
    Point(int x, int y):x_cord{x}, y_cord{y}
    {

    }
    int x() const
    {
        return x_cord;
    }
    int y() const
    {
        return y_cord;
    }
    bool operator==(const Point& pt) const
    {
        return (x_cord == pt.x() && y_cord == pt.y());
    }
};

namespace std
{
    template<>
    class hash<Point>
    {
    public:
        size_t operator()(const Point& pt) const
        {
            return (std::hash<int>{}(pt.x()) ^ std::hash<int>{}(pt.y()));
        }
    };
}

// Inside some function
std::unordered_map<Point, bool> visited;

The program compiled and gave the correct results in the cases that I tested. However, I am not convinced if this is enough when using a user-defined class as key. How does the unordered_map know how to resolve collision in this case? Do I need to add anything to resolve collision?


Solution

  • That's a terrible hash function. But it is legal, so your implementation will work.

    The rule (and really the only rule) for Hash and Equals is:

    • if a == b, then std::hash<value_type>(a) == std::hash<value_type>(b).

    (It's also important that both Hash and Equals always produce the same value for the same arguments. I used to think that went without saying, but I've seen several SO questions where unordered_map produced unexpected results precisely because one or both of these functions depended on some external value.)

    That would be satisfied by a hash function which always returned 42, in which case the map would get pretty slow as it filled up. But other than the speed issue, the code would work.

    std::unordered_map uses a chained hash, not an open-addressed hash. All entries with the same hash value are placed in the same bucket, which is a linked list. So low-quality hashes do not distribute entries very well among the buckets.

    It's clear that your hash gives {x, y} and {y, x} the same hash value. More seriously, any collection of points in a small rectangle will share the same small number of different hash values, because the high-order bits of the hash values will all be the same.