Search code examples
c#linqgrouping

How to find duplicates in a LINQ Group collection


I want to write a LINQ query that will check whether there is duplication of User data in my InfoObject objects.

In the succeeding illustration, I want to group by Code (i.e., XYZ) and check whether for that group, there is a User that has more than one Type. The User model contains the Name (e.g., John) and Value (1 means selected).

My model classes are as follows:

public class InfoObject
{
    public string Code { get; set; }
    public string Type { get; set; }
    public List<User> Users { get; set; }
}

public class User
{
    public string Name { get; set; }
    public string Value { get; set; }
}

Invalid Case: John is not allowed to have both types A and B

Code          Type          John          Luke          Tim
XYZ           A             1             1             -
XYZ           B             1             -             1

Valid Case: Each user can only have one type

Code          Type          John          Luke          Tim
XYZ           A             -             1             -
XYZ           B             1             -             1

What is the proper LINQ query to achieve this? Thanks!


Solution

  • You can flatten the InfoObject hierarchy and then group by user name and keep items having more than 1 unique code:

    IEnumerable<InfoObject> data = ...;
    
    var invalidUserNames = data
        // flatten with SelectMany:
        .SelectMany(o => o.Users
            .Where(user => user.Value == "1") // assuming you want to check only "selected"
            .Select(user => (user.Name, Obj: o))) 
        .GroupBy(t => t.Name) // assuming Name uniquely identifies user
        .Select(gr => (Name: gr.Key, Count: gr.Select(t => t.Obj.Code).Distinct().Count()))
        .Where(t => t.Count > 1)
        .Select(t => t.Name)
        .ToArray();
    

    You can manipulate Select to expose only needed data for your final result.