Search code examples
c#linqiequalitycomparer

How to iterate only distinct string values by custom substring equality


Similar to this question, I'm trying to iterate only distinct values of sub-string of given strings, for example:

List<string> keys = new List<string>()
{
    "foo_boo_1",
    "foo_boo_2,
    "foo_boo_3,
    "boo_boo_1"
}

The output for the selected distinct values should be (select arbitrary the first sub-string's distinct value):

foo_boo_1 (the first one)
boo_boo_1

I've tried to implement this solution using the IEqualityComparer with:

public class MyEqualityComparer : IEqualityComparer<string>
{
    public bool Equals(string x, string y)
    {            
        int xIndex = x.LastIndexOf("_"); 
        int yIndex = y.LastIndexOf("_");
        if (xIndex > 0 && yIndex > 0)
            return x.Substring(0, xIndex) == y.Substring(0, yIndex);
        else
            return false;
    }

    public int GetHashCode(string obj)
    {
        return obj.GetHashCode();
    }
}

foreach (var key in myList.Distinct(new MyEqualityComparer()))
{
    Console.WriteLine(key)    
}

But the resulted output is:

foo_boo_1
foo_boo_2
foo_boo_3
boo_boo_1

Using the IEqualityComparer How do I remove the sub-string distinct values (foo_boo_2 and foo_boo_3)?

*Please note that the "real" keys are a lot longer, something like "1_0_8-B153_GF_6_2", therefore I must use the LastIndexOf.


Solution

  • Your current implementation has some flaws:

    1. Both Equals and GetHashCode must never throw exception (you have to check for null)
    2. If Equals returns true for x and y then GetHashCode(x) == GetHashCode(y). Counter example is "abc_1" and "abc_2".

    The 2nd error can well cause Distinct return incorrect results (Distinct first compute hash).

    Correct code can be something like this

    public class MyEqualityComparer : IEqualityComparer<string> {
      public bool Equals(string x, string y) {            
        if (ReferenceEquals(x, y))
          return true;
        else if ((null == x) || (null == y))
          return false;
    
        int xIndex = x.LastIndexOf('_'); 
        int yIndex = y.LastIndexOf('_');
    
        if (xIndex >= 0)         
          return (yIndex >= 0) 
            ? x.Substring(0, xIndex) == y.Substring(0, yIndex)
            : false;
        else if (yIndex >= 0)         
          return false;
        else
          return x == y; 
      }
    
      public int GetHashCode(string obj) {
        if (null == obj)  
          return 0;
    
        int index = obj.LastIndexOf('_');
    
        return index < 0 
          ? obj.GetHashCode() 
          : obj.Substring(0, index).GetHashCode();
      }
    }
    

    Now you are ready to use it with Distinct:

       foreach (var key in myList.Distinct(new MyEqualityComparer())) {
         Console.WriteLine(key)    
       }