I found two data types with hash code methods in a code base I'm working on, which I don't fully understand why they were chosen:
public override int GetHashCode()
{
return x.GetHashCode() ^ y.GetHashCode() << 2;
}
public override int GetHashCode()
{
return x.GetHashCode() ^ y.GetHashCode() << 2 ^ z.GetHashCode() >> 2;
}
How does the bit shifting operation make these hash values any better?
Let's say you have a Point
data structure represented by a x
and y
variable. Without the bit shifting the value of hash code for (1,0)
would be 1
, and the hash code for (0,1)
would also be 1
. Now do the same thing with the bit shifting, for (1,0)
we get a hash code of 1
, but for (0,1)
we now get a hash code of 4
What the bit shifting provides is if you have the same inputs but in a different order you want to get different hash codes, that way (1,0)
and (0,1)
don't end up falling in to the same hash bucket and degrading your hashset/dictionary performance.
Normally you would do a much larger offset than just left shifting twice. Bitshifting also causes data to get truncated if dealing with hash codes near Int32.MaxValue
. Here is the pattern I normally use
public override int GetHashCode()
{
unchecked
{
var hashCode = X;
hashCode = (hashCode*397) ^ Y;
hashCode = (hashCode*397) ^ Z;
return hashCode;
}
}
(this is the default implementation that comes with Resharper's "Insert Comparison Method" feature. To add more fields you just keep doing hashCode = (hashCode*397) ^ XXXXXXX
)
By using *
with unchecked
instead of <<
any value that is larger than Int32.MaxValue
just overflows without error.