Search code examples
c#iequalitycomparer

C# Remove Duplicates Only Checking on The First Element of The String Array


I have a list of string arrays. I want to remove duplicates and empty strings by doing a check only on the first element of the string array. I have seen some SO posts using IEqualityComparer to achieve removing duplicates comparing whole string arrays which I think makes it look more elegant and potentially more efficient. However I failed to make it to check it only on the first element of the string array to remove unwanted ones because IEqualityComparer confuses me. How can I achieve this more elegantly? My current non-elegant & non-efficient working code:

void method(List<string[]> contactAndNumber)
{
    List<string[]> contactAndNumberSanitized = new List<string[]>();
    contactAndNumberSanitized.Clear();
    bool rem = false;
    List<int> remList = new List<int>();
    for (int i = 0; i < contactAndNumber.Count; i++)
    {
        contactAndNumberSanitized.Add(new string[] { contactAndNumber[i][0], contactAndNumber[i][1] });
        for (int j = 0; j < contactAndNumberSanitized.Count; j++)
            if (i != j)
                if (contactAndNumber[i][0] == contactAndNumberSanitized[j][0])
                {
                    rem = true;
                    break;
                }
        if (rem || string.IsNullOrEmpty(contactAndNumber[i][0]))
            remList.Add(i);
        rem = false;
    }
    for (int i = remList.Count - 1; i >= 0; i--)
        contactAndNumberSanitized.RemoveAt(remList[i]);
}

And this is the non-working code I tried to implement to do a check on string array's first item only:

sealed class EqualityComparer: IEqualityComparer<string[]>
{
    public bool Equals(string[] x, string[] y)
    {
        if (ReferenceEquals(x[0], y[0]))
            return true;

        if (x == null || y == null)
            return false;

        return x[0].SequenceEqual(y[0]);
    }

    public int GetHashCode(string[] obj)
    {
        if (obj == null)
            return 0;

        int hash = 17;

        unchecked
        {
            foreach (string s in obj)
                hash = hash*23 + ((s == null) ? 0 : s.GetHashCode());
        }

        return hash;
    }
}

By calling this under some method:

var result = list.Distinct(new EqualityComparer());

Solution

  • Your code can be vastly simplified:

    var input = new List<string[]> { new[] { "a", "b" }, new[] { "a", "c" }, new[] { "c", "d" }};
    var result = input.GroupBy(l => l.FirstOrDefault()).Select(g => g.First());
    

    This will give you the unique arrays, using the first element of each array to determine uniqueness.

    However, since you're using the first element of the array to determine uniqueness, there is an edge case for an empty set being seen as equivalent to { null }. Depending on how you want to treat empty sets, you'll need to modify the code to filter the input, or change the GroupBy