Search code examples
c#linqgroup-byhashsetiequalitycomparer

GroupBy HashSet not grouping, whilst SetEquals is true


I have a situation where I need a collection to be GroupBy on a HashSet<myClass> where myClass overrides Equals(myClass), Equals(object), GetHashCode(), ==, and !=.

When I perform the GroupBy() the results are however not grouped. The same occurs for Distinct(). It is created in a large LINQ query which calls ToHashSet() on values of myClass. The result is then used where the resulting HashSet itself is the key to a Dictionary<HashSet<myClass>, someOtherCollection>.

I have distilled the problem down to the simplest case, where two HashSet<myClass>, myHashSet1 and myHashSet2, both contain only the same single element. If I call myHashSet1.Equals(myHashSet2) it returns false, while myHashSet1.SetEquals(myHashSet2) returns true.

What am doing wrong here? What can I do to make GroupBy group HashSets when all elements match?

Possibly one step along the way is HashSet<T>.CreateSetComparer() cannot specify IEqualityComparer<T>, is there an alternative? which explains how to override a default IEqualityComparer for HashSet. But IF this is part of the answer, the critical remaining questions becomes how do I let GroupBy know to use this equality comparer?

I assume I should be feeding it when I call ToHashSet() , maybe ToHashSet(myHashSetEqualityComparer<myClass>), but it only takes a ToHashSet(IEqualityComparer<myClass>), not a ToHashSet(IEqualityComparer<HashSet<myClass>>)

Here's the code of myClass distilled to the essentials:

public class myClass : myBaseClass, IEquatable<myClass>
{   
    public string Prop1 { get; set; }
    public string Prop2 { get; set; }
    public Guid Prop3 { get; set; }

    public override bool Equals(myClass other)
    {    
      if (Equals(other, null)) return false;

      return (Prop1 == other.Prop1 && Prop2 == other.Prop2 && Prop3 == other.Prop3);
    }

    public override bool Equals(object obj)
    {
      if (Equals(obj, null) || !(obj is myClass))
        return false;

     return Equals((myClass)obj);
    }

    public static bool operator ==(myClass left, myClass right)
    {
     if (Object.Equals(left, null))
        return (Object.Equals(right, null)) ? true : false;
     else
        return left.Equals(right);
    }

    public static bool operator !=(myClass left, myClass right)
    {
     return !(left == right);
    }

    public override int GetHashCode()
    {
     return Prop3.GetHashCode() + 31 * (Prop2.GetHashCode() +
         31 * Prop1.GetHashCode());
    }
 }

Per request in comment this is what I am doing:

var myGroupedResult = myUngroupedCollection.
       GroupBy(x => x.Value).
       ToDictionary(x => x.Key, x => x.ToList());
// myUngroupedCollection is an IEnumerable<KeyValuePair<someClass, HashSet<myClass>>>,
// produced by LINQ 
// myGroupedResult is a Dictionary<HashSet<myClass>, List<someClass>>

I expect the result to produce a dictionary where the keys are HashSet<myClass> and the values are List<someClass>. If I have 5 distinct hashsets each with 10 occurrences of someClass, I expect a Dictionary with 5 keys, each with a value that is a List with 10 elements. Instead I get a Dictionary with 50 keys each with a value being a List that has 1 element.


Solution

  • I was able to solve my issue. Posting an answer here in case anyone else runs into the same issue.

    The solution has two steps. First create a generic IEqualityComparer<HashSet<T>> (from the link in the question):

       public class HashSetEqualityComparerBySetEquals<T> : IEqualityComparer<HashSet<T>>
       {
          public bool Equals(HashSet<T> x, HashSet<T> y)
          {
             if (ReferenceEquals(x, null))
                return false;
    
             return x.SetEquals(y);
          }
    
          public int GetHashCode(HashSet<T> set)
          {
             int hashCode = 0;
    
             if (set != null)
             {
                foreach (T t in set)
                {
                   hashCode = hashCode ^
                       (set.Comparer.GetHashCode(t) & 0x7FFFFFFF);
                }
             }
    
             return hashCode;
          }    
       } 
    

    Then provide it in the GroupBy() (hint came from here: GroupBy on complex object (e.g. List<T>), which works on List, but not as-is on HashSet, which needs an additional elementSelector as second parameter):

    HashSetEqualityComparerBySetEquals<myClass> comparer = new HashSetEqualityComparerBySetEquals<myClass>();
    
    var myGroupedResult = myUngroupedCollection.
           GroupBy(x => x.Value, x => x.Key, comparer).
           ToDictionary(x => x.Key, x => x.ToList());
    
    // myUngroupedCollection is an IEnumerable<KeyValuePair<someClass, HashSet<myClass>>> produced by LINQ but could be a Dictionary or another collection.
    // myGroupedResult is a Dictionary<HashSet<myClass>, List<someClass>>
    

    The same IEqualityComparer can also be used when performing other LINQ operations that check for equality, such as Distinct() and FirstOrDefault():

    var thisWorksAsExpected = myGroupedResult.FirstOrDefault(x => comparer.Equals(x.Key, aHashSetWithSameElements));
    var thisAlsoWorks = myGroupedResult.FirstOrDefault(x => x.Key.SetEquals(aHashSetWithSameElements));
    
    var thisDoesNotWork = myGroupedResult.FirstOrDefault(x => x.Key == aHashSetWithSameElements);
    // thisDoesNotWork returns null sometimes even when all elements match