Search code examples
c#gethashcode

How to implement GetHashCode for this situation?


I'm trying to implement an IEqualityComparer<string> which basically compares two strings in a way that,(let's assume we have two strings x and y) if x starts with y or y starts with x they should be treated as equal.

public bool Equals(string x, string y)
{
    return x.StartsWith(y) || y.StartsWith(x);
}

public int GetHashCode(string obj)
{
    return obj.GetHashCode();
}

Ofcourse implementing the Equals method is pretty easy.But the GetHashCode is not, I couldn't think any way to implement it correctly.I have written a test program like this:

string[] values = {"hell", "hello", "foo", "fooooo"};

var result = values.Distinct(new StringComparer());

foreach(var x in result)
   Console.WriteLine(x);

And I get the wrong result because of GetHashCode:

hell
hello
foo
fooooo

Obviously I can force calling Equals method by returning same value from the GetHashCode for all values but I wanna know if there is another way to implement it because the performance is critical. Is there a way to implement GetHashCode method correctly for my situation ?

Note: I know it is vague but I couldn't find a better title, if you have a better idea you are free to edit.


Edit: I'm going to use this logic with web urls. In my situation first 20 characters are equal. For example:

http://www.foo.com/bar?id=3
http://www.foo.com/bar?id=3&fooId=23

Solution

  • The issue is in your definition of equality: Equality must be transitive. But it is not in your case. Take the following three values:

    * f
    * freeze
    * foo
    

    Then f == freeze, and foo == f, but freeze != foo.

    See also MSDN on Implementing the Equals Method, which says:

    (x.Equals(y) && y.Equals(z)) returns true if and only if x.Equals(z) returns true.

    A proper definition of equality produces distinct sets of values that are considered equal. If you had those, you could define a "canonical" representation for each set and calculate the hash of the canonical value so each set would have its hash code. But this only works with an operation that is transitive (as well as commutative and reflexive, these two properties are covered by your definition).

    Since your definition of equality is not transitive you cannot define such sets so you can't find a proper hash code either.

    But it raises other questions, too. Taking your example:

    string[] values = { "hell", "hello", "foo", "fooooo" };
    var result = values.Distinct(new StringComparer());
    

    Which values do you expect to go into your result? Do you always want the shortest version? This will not be guaranteed by your code, the result will depend on the internal implementation of Distinct.

    Implementing an EqualityComparer might possibly be a sub-optimal approach to your actual issue. What are you trying to achieve?