Search code examples
c#hashbit-shiftgethashcode

How does bit shifting values in this GetHashCode method improve hashing?


I found two data types with hash code methods in a code base I'm working on, which I don't fully understand why they were chosen:

public override int GetHashCode()
{
    return x.GetHashCode() ^ y.GetHashCode() << 2;
}

public override int GetHashCode()
{
    return x.GetHashCode() ^ y.GetHashCode() << 2 ^ z.GetHashCode() >> 2;
}

How does the bit shifting operation make these hash values any better?


Solution

  • Let's say you have a Point data structure represented by a x and y variable. Without the bit shifting the value of hash code for (1,0) would be 1, and the hash code for (0,1) would also be 1. Now do the same thing with the bit shifting, for (1,0) we get a hash code of 1, but for (0,1) we now get a hash code of 4

    What the bit shifting provides is if you have the same inputs but in a different order you want to get different hash codes, that way (1,0) and (0,1) don't end up falling in to the same hash bucket and degrading your hashset/dictionary performance.

    Normally you would do a much larger offset than just left shifting twice. Bitshifting also causes data to get truncated if dealing with hash codes near Int32.MaxValue. Here is the pattern I normally use

    public override int GetHashCode()
    {
        unchecked
        {
            var hashCode = X;
            hashCode = (hashCode*397) ^ Y;
            hashCode = (hashCode*397) ^ Z;
            return hashCode;
        }
    }
    

    (this is the default implementation that comes with Resharper's "Insert Comparison Method" feature. To add more fields you just keep doing hashCode = (hashCode*397) ^ XXXXXXX)

    By using * with unchecked instead of << any value that is larger than Int32.MaxValue just overflows without error.