Search code examples
c#equalsgethashcodeiequatable

How should I go about implementing Object.GetHashCode() for complex equality?


Basically, I have the following so far:

class Foo {
    public override bool Equals(object obj)
    {
        Foo d = obj as Foo ;
        if (d == null)
            return false;

        return this.Equals(d);
    }

    #region IEquatable<Foo> Members

    public bool Equals(Foo other)
    {
        if (this.Guid != String.Empty && this.Guid == other.Guid)
            return true;
        else if (this.Guid != String.Empty || other.Guid != String.Empty)
            return false;

        if (this.Title == other.Title &&
            this.PublishDate == other.PublishDate &&
            this.Description == other.Description)
            return true;

        return false;
    }
}

So, the problem is this: I have a non-required field Guid, which is a unique identifier. If this isn't set, then I need to try to determine equality based on less accurate metrics as an attempt at determining if two objects are equal. This works fine, but it make GetHashCode() messy... How should I go about it? A naive implementation would be something like:

public override int GetHashCode() {
    if (this.Guid != String.Empty)
        return this.Guid.GetHashCode();

    int hash = 37;
    hash = hash * 23 + this.Title.GetHashCode();
    hash = hash * 23 + this.PublishDate.GetHashCode();
    hash = hash * 23 + this.Description.GetHashCode();
    return hash;
}

But what are the chances of the two types of hash colliding? Certainly, I wouldn't expect it to be 1 in 2 ** 32. Is this a bad idea, and if so, how should I be doing it?


Solution

  • I don't think there is a problem with the approach you have chosen to use. Worrying 'too much' about hash collisions is almost always an indication of over-thinking the problem; as long as the hash is highly likely to be different you should be fine.

    Ultimately you may even want to consider leaving out the Description from your hash anyway if it is reasonable to expect that most of the time objects can be distinguished based on their title and publication date (books?).

    You could even consider disregarding the GUID in your hash function altogether, and only use it in the Equals implementation to disambiguate the unlikely(?) case of hash clashes.