Search code examples
c++hashstdfloating

Fair assumptions about std::hash implementations


We use several forms of hashing in a research database project. E.g., for radix clustering where we use the n least significant bits to determine the cluster id. We use std::hash for hashing, which is sufficient for us.

However, while we are aware that most implementations use identity for hashing integers, we stumbled over the fact that float hashing (whether this makes sense or not is another discussion) is differently implemented on differently platforms.

Are there any fair assumptions that we can make about std::hash?

MacOS: clang version 6.0.1 (tags/RELEASE_601/final) std::hash<float>{}(1.0f): 0000000000000000000000000000000000111111100000000000000000000000 std::hash<double>{}(1.0): 0011111111110000000000000000000000000000000000000000000000000000

Ubuntu: clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) std::hash<float>{}(1.0f): 0101001111100101011001010000100100010100111101010010111101001101 std::hash<double>{}(1.0): 0111010001100001101001000101000001001110110011100111101110011011


Solution

  • The only things you can assume are defined by the standard (see cppreference).

    This means:

    In particular, they define an operator() const that:

    1. Accepts a single parameter of type Key.

    2. Returns a value of type size_t that represents the hash value of the parameter.

    3. Does not throw exceptions when called.

    4. For two parameters k1 and k2 that are equal, std::hash()(k1) == std::hash()(k2).

    5. For two different parameters k1 and k2 that are not equal, the probability that std::hash()(k1) == std::hash()(k2) should be very small, approaching 1.0/std::numeric_limits::max().

    So you can have different values on different platforms, on the same platform with a different compiler version, or even from one run to another. In your case, it seems that in one case you may be using libc++ and in the other case libstdc++.