Search code examples
c#listlinq

Checking and Remove duplicates in a List of Objects C#


I'm looking for a way to check for duplicates and remove them in a list of objects using Linq query expression.

First of all, I have two Objects:

public class ErrorData
{
   public string Severity { get; set; }
   public string Category { get; set; }
   public LocalizationListData Localisations { get; set; }
}

public class LocalizationListData
{
   public List<KeyValuePair<string, string>> LstLocalizationData { get; set; }
}

An ErrorData object that has two strings and a LocalizationListData object, which contains a List of KeyValuePair. (Sorry, It's a wired structure.)

Now, in my main, I have a List of ErrorData. Basically, I need to make sure each ErrorData in the List has unique combination of Severity, Category, and Localisations (unique list of Keys and Values). Here is what I did:

List<ErrorData> errList = new List<ErrorData>();

var groupErr = errList.GroupBy(x => new { x.Severity, x.Category, x.Localisations });

bool hasDups = groupErr.Any(g => g.Count() > 1); //Check for Duplicates

if (hasDups)
{
    errList = groupErr.Select(g => g.First()).ToList(); //Remove Duplication
}

However, this doesn't seem to work cause I think it doesn't compare the List<KeyValuePair<string, string>> in Localisations. So can someone teach me how to modify my LINQ query expression to check for duplicates and remove them? Thank you in advance.


Solution

  • Try convert Localisations to string. Then removing can be simplified:

    List<ErrorData> errList = new List<ErrorData>();
    
    var duplicates = errList.GroupBy(x => new { 
            x.Severity, 
            x.Category, 
            Localisation = string.Join(';', x.Localisations.LstLocalizationData.OrderBy(x => x.Key).Select(x => $"{x.Key}={x.Value}"))
        })
        .SelectMany(g => g.Skip(1))
        .ToList();
    
    errList.RemoveRange(duplicates);