Search code examples
linqc#-3.0distinctcustom-compare

LinQ distinct with custom comparer leaves duplicates


I've got the following classes:

public class SupplierCategory : IEquatable<SupplierCategory>
{
    public string Name { get; set; }
    public string Parent { get; set; }

    #region IEquatable<SupplierCategory> Members

    public bool Equals(SupplierCategory other)
    {
        return this.Name == other.Name && this.Parent == other.Parent;
    }

    #endregion
}

public class CategoryPathComparer : IEqualityComparer<List<SupplierCategory>>
{
    #region IEqualityComparer<List<SupplierCategory>> Members

    public bool Equals(List<SupplierCategory> x, List<SupplierCategory> y)
    {
        return x.SequenceEqual(y);
    }

    public int GetHashCode(List<SupplierCategory> obj)
    {
        return obj.GetHashCode();
    }

    #endregion
}

And i'm using the following linq query:

CategoryPathComparer comparer = new CategoryPathComparer();
List<List<SupplierCategory>> categoryPaths = (from i in infoList
                                                          select
                                                            new List<SupplierCategory>() { 
                                                             new SupplierCategory() { Name = i[3] },
                                                             new SupplierCategory() { Name = i[4], Parent = i[3] },
                                                             new SupplierCategory() { Name = i[5], Parent = i[4] }}).Distinct(comparer).ToList();

But the distinct does not do what I want it to do, as the following code demonstrates:

comp.Equals(categoryPaths[0], categoryPaths[1]); //returns True

Am I using this in a wrong way? why are they not compared as I intend them to?

Edit: To demonstrate the the comparer does work, the following returns true as it should:

List<SupplierCategory> list1 = new List<SupplierCategory>() {
    new SupplierCategory() { Name = "Cat1" },
    new SupplierCategory() { Name = "Cat2", Parent = "Cat1" },
    new SupplierCategory() { Name = "Cat3", Parent = "Cat2" }
};
List<SupplierCategory> list1 = new List<SupplierCategory>() {
    new SupplierCategory() { Name = "Cat1" },
    new SupplierCategory() { Name = "Cat2", Parent = "Cat1" },
    new SupplierCategory() { Name = "Cat3", Parent = "Cat2" }
};
CategoryPathComparer comp = new CategoryPathComparer();
Console.WriteLine(comp.Equals(list1, list2).ToString());

Solution

  • Your problem is that you didn't implement IEqualityComparer correctly.

    When you implement IEqualityComparer<T>, you must implement GetHashCode so that any two equal objects have the same hashcode.

    Otherwise, you will get incorrect behavior, as you're seeing here.

    You should implement GetHashCode as follows: (courtesy of this answer)

    public int GetHashCode(List<SupplierCategory> obj) {
        int hash = 17;
    
        foreach(var value in obj)
            hash = hash * 23 + obj.GetHashCode();
    
        return hash;
    }
    

    You also need to override GetHashCode in SupplierCategory to be consistent. For example:

    public override int GetHashCode() {
        int hash = 17;
        hash = hash * 23 + Name.GetHashCode();
        hash = hash * 23 + Parent.GetHashCode();
        return hash;
    }
    

    Finally, although you don't need to, you should probably override Equals in SupplierCategory and make it call the Equals method you implemented for IEquatable.