Search code examples
c#ienumerableiequalitycomparer

Correct usage of GetHashCode in IEqualityComparer


Let's say I have this example:

public class Player
{
    public string Username { get; set; }
    
    private sealed class PlayerEqualityComparer : IEqualityComparer<Player>
    {
        public bool Equals(Player x, Player y)
        {
            if (ReferenceEquals(x, y)) return true;
            if (ReferenceEquals(x, null)) return false;
            if (ReferenceEquals(y, null)) return false;
            if (x.GetType() != y.GetType()) return false;
            return x.Username == y.Username;
        }

        public int GetHashCode(Player obj)
        {
            return (obj.Username != null ? obj.Username.GetHashCode() : 0);
        }
    }

    public static IEqualityComparer<Player> Comparer { get; } = new PlayerEqualityComparer();
 }

I have a doubt about GetHashCode: its returned value depends on the hash of Username but we know that even if two strings contain the same value, their hash is computed by their reference, generating a different Hash.

Now if I have two Players like this:

Player player1 = new Player {Username = "John"};
Player player2 = new Player {Username = "John"};

By Equals they're the same, but by GetHashCode they are likely not. What happens when I use this PlayerEqualityComparer in a Except or Distinct method then? Thank you


Solution

  • Of course it is guaranteed that two strings with the same "value" have the same hashcode, otherwise string.GetHashCode was broken and someone would have noticed it already.

    but we know that even if two strings contain the same value, their hash is computed by their reference

    I don't understand what you mean here, but it's wrong. The hashcode is derived from the string itself, so the "value". The documentation states:

    Important

    If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code.

    The hash code itself is not guaranteed to be stable. Hash codes for identical strings can differ across .NET implementations, across .NET versions, and across .NET platforms (such as 32-bit and 64-bit) for a single version of .NET. In some cases, they can even differ by application domain. This implies that two subsequent runs of the same program may return different hash codes.

    As a result, hash codes should never be used outside of the application domain in which they were created, they should never be used as key fields in a collection, and they should never be persisted.

    In general following must apply:

    • If two objects are equal, the GetHashCode method must return the same value.
    • if two objects are not equal, the GetHashCode method does not have to return different values (but usually they are different, it's just not so important)