Search code examples
c#performancelinqcontainshashset

C# Hashset.Contains with custom EqualityComparer never calls GetHashCode()


I have a very large (hundreds of thousands) hashset of Customer objects in my database. Then I get a newly imported hashset of customer objects and have to check for every new object, if it is contained in the existing hashset. Performance is very important.

I cannot use the default Equalitycomparer as it needs to be compared based on only three properties. Also, I can't override the Equals and GetHashCode functions of the Customer class for other reasons. So I aimed for a custom EqualityComparer (I tried implementing IEqualityComparer or inheriting from EqualityComparer and overriding like you see below - both with the same end result).

public class CustomerComparer : EqualityComparer<Customer>
    {
        public CustomerComparer(){ }

        public override bool Equals(Customer x, Customer y)
        {
            return x != null &&
                   y != null &&
                   x.Name == y.Name &&
                   x.Description == y.Description &&
                   x.AdditionalInfo == y.AdditionalInfo
        }

        public override int GetHashCode(Customer obj)
        {
            var hashCode = -1885141022;
            hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(obj.Name);
            hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(obj.Description);
            hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(obj.AdditionalInfo);
            return hashCode;
        }
    }

Now to my problem: When I use the default EqualityComparer, generally only the GetHashCode method of Customer is called and the performance for my use case is very good (1-2 seconds). When I use my custom EqualityComparer, the GetHashCode method is never called but always the Equals method. The performance for my use case is horrible (hours). See code below:

public void FilterImportedCustomers(ISet<Customer> dataBase, IEnumerable<Customer> imported){

    var equalityComparer = new CustomerComparer();
    foreach (var obj in imported){
        
        //great performance, always calls Customer.GetHashCode
        if (!dataBase.Contains(obj){
        //...
        }

        //awful performance, only calls CustomerComparer.AreEqual
        if (!dataBase.Contains(obj, equalityComparer))
        //...
        }            
    }
}

Does anyone have an idea, how I can solve this problem? That would be amazing, I'm really stuck trying to solve this huge performance problem.

EDIT :

I solved it by passing my EuqalityComparer when initializing the hashset! By using the constructor overload that takes an IEqualityComparer so var database = new HashSet(new CustomerComparer())

Thank you, guys!


Solution

  • I solved it by passing my EqualityComparer when initializing the hashset! Is used the constructor overload that takes an IEqualityComparer so var database = new HashSet(new CustomerComparer())

    Thanks to Lee and NetMage who commented under my original post.