Search code examples
c++c++11hashunordered-mapstdhash

std::hash variations of object with arbitrary number of attributes of fundamental type


Discussion:

Let's say I have a struct/class with an arbitrary number of attributes that I want to use as key to a std::unordered_map e.g.,:

struct Foo {
  int    i;
  double d;
  char   c;
  bool   b;
};

I know that I have to define a hasher-functor for it e.g.,:

struct FooHasher {
  std::size_t operator()(Foo const &foo) const;
};

And then define my std::unordered_map as:

std::unordered_map<Foo, MyValueType, FooHasher> myMap;

What bothers me though, is how to define the call operator for FooHasher. One way to do it, that I also tend to prefer, is with std::hash. However, there are numerous variations e.g.,:

std::size_t operator()(Foo const &foo) const {
  return std::hash<int>()(foo.i)    ^
         std::hash<double>()(foo.d) ^
         std::hash<char>()(foo.c)   ^
         std::hash<bool>()(foo.b);
}

I've also seen the following scheme:

std::size_t operator()(Foo const &foo) const {
  return std::hash<int>()(foo.i)           ^
         (std::hash<double>()(foo.d) << 1) ^
         (std::hash<char>()(foo.c)   >> 1) ^
         (std::hash<bool>()(foo.b)   << 1);
}

I've seen also some people adding the golden ratio:

std::size_t operator()(Foo const &foo) const {
  return (std::hash<int>()(foo.i)    + 0x9e3779b9) ^
         (std::hash<double>()(foo.d) + 0x9e3779b9) ^
         (std::hash<char>()(foo.c)   + 0x9e3779b9) ^
         (std::hash<bool>()(foo.b)   + 0x9e3779b9);
}

Questions:

  1. What are they trying to achieve by adding the golden ration or shifting bits in the result of std::hash.
  2. Is there an "official scheme" to std::hash an object with arbitrary number of attributes of fundamental type?

Solution

  • A simple xor is symmetric and behaves badly when fed the "same" value multiple times (hash(a) ^ hash(a) is zero). See here for more details.

    This is the question of combining hashes. boost has a hash_combine that is pretty decent. Write a hash combiner, and use it.

    There is no "official scheme" to solve this problem.

    Myself, I typically write a super-hasher that can take anything and hash it. It hash combines tuples and pairs and collections automatically, where it first hashes the count of elements in the collection, then the elements.

    It finds hash(t) via ADL first, and if that fails checks if it has a manually written hash in a helper namespace (used for std containers and types), and if that fails does a std::hash<T>{}(t).

    Then my hash for Foo support looks like:

    struct Foo {
      int    i;
      double d;
      char   c;
      bool   b;
      friend auto mytie(Foo const& f) {
        return std::tie(f.i, f.d, f.c, f.b);
      }
      friend std::size_t hash(Foo const& f) {
        return hasher::hash(mytie(f));
      }
    };
    

    where I use mytie to move Foo into a tuple, then use the std::tuple overload of hasher::hash to get the result.

    I like the idea of hashes of structurally similar types having the same hash. This lets me act as if my hash is transparent in some cases.

    Note that hashing unordered meows in this manner is a bad idea, as an asymmetric hash of an unordered meow may generate spurious misses.

    (Meow is the generic name for map and set. Do not ask me why: Ask the STL.)