While looking for some deterministic (multiple-runs, multiple-machines) hasher for integers, I stumbled upon boost::hash_combine(size_t & seed, T const& v)
. Unfortunately, in the documentation it is stated that
This hash function is not intended for general use, and isn't guaranteed to be equal during separate runs of a program - so please don't use it for any persistent storage or communication.
However, when looking through the implementation I did not see any suspicious code that may cause divergent behaviour in separate runs - just some multiplying and adding (with possible overflows), bit-shifting and xor operations, everything using constants. What is more, hasher behaved consistently when executed several times.
So where is the actual problem that forbids to guarantee determinism across the runs?
Pasting below the most interesting pieces:
template <class T>
inline void hash_combine(std::size_t& seed, T const& v)
{
boost::hash<T> hasher;
return boost::hash_detail::hash_combine_impl(seed, hasher(v));
}
template <typename T>
typename boost::hash_detail::basic_numbers<T>::type hash_value(T v)
{
return static_cast<std::size_t>(v);
}
inline void hash_combine_impl(boost::uint64_t& h,
boost::uint64_t k)
{
const boost::uint64_t m = UINT64_C(0xc6a4a7935bd1e995);
const int r = 47;
k *= m;
k ^= k >> r;
k *= m;
h ^= k;
h *= m;
// Completely arbitrary number, to prevent 0's
// from hashing to 0.
h += 0xe6546b64;
}
The reason is that hashes are frequently used in hash tables. A malicious user trying to attack a service (that uses C++ code involving hash tables) may massively degrade its performance by forcing hash collisions for items inserted into a hash table (with performance for common operations going from O(1) to O(N)). By having every run use a different hash function, this becomes a lot harder to do.
std::hash
is standardized like that too. To cite https://en.cppreference.com/w/cpp/utility/hash:
Hash functions are only required to produce the same result for the same input within a single execution of a program; this allows salted hashes that prevent collision DoS attacks. (since C++ 14)