Search code examples
linqlinq-to-sql

Entity Framework LINQ Multiple Column Count(Distinct)


I'm not entirely sure how to translate the following SQL into LINQ Fluent API. Any help would be great

select count(distinct thread.Id) as ThreadCount, count(distinct segment.Id) as SegmentCount
from Segment 
inner join SegmentCommunicator as SegmentCommunicator
    on Segment.Id = SegmentCommunicator.SegmentId
inner join Thread
    on Thread.Id = Segment.ThreadId
where SegmentCommunicator.CommunicatorId in (94, 3540, 6226, 10767, 20945)

Currently, I know how to do this in 2 queries but for the life of me I can't figure out how to condense down into one. Any help would be much appreciated

        var aggregate1 = _threadProvider
                .AsQueryable()
                .SelectMany(t => t.Segments)
                .Where(s => s.SegmentCommunicators.Any(sc => communicatorIds.Contains(sc.CommunicatorId)))
                .Select(s => s.ThreadId)
                .Distinct()
                .Count();

        var aggregate2 = _threadProvider
            .AsQueryable()
            .SelectMany(t => t.Segments)
            .Where(s => s.SegmentCommunicators.Any(sc => communicatorIds.Contains(sc.CommunicatorId)))
            .Select(s => s.Id)
            .Distinct()
            .Count();

Solution

  • You can use one "base" query but the two distinct counts will be separate queries, hydrating the query once with as small of a data set as possible:

    var query = _threadProvider
                .AsQueryable()
                .SelectMany(t => t.Segments)
                .Where(s => s.SegmentCommunicators.Any(sc => communicatorIds.Contains(sc.CommunicatorId)))
                .Select(s => new {s.ThreadId, s.Id})
                .Distinct()
                .ToList();
    
    var aggregate1 = query.Select(s => s.ThreadId)
                          .Distinct()
                          .Count();
    
    var aggregate2 = query.Select(s => s.Id)
                          .Distinct()
                          .Count();    
    

    You might be able to use GroupBy to do it in one query:

    var query = _threadProvider
                .AsQueryable()
                .SelectMany(t => t.Segments)
                .Where(s => s.SegmentCommunicators.Any(sc => communicatorIds.Contains(sc.CommunicatorId)))
                .GroupBy(s => 1)  
                .Select(g => new 
                    {
                        ThreadCount  = g.Select(s => s.ThreadId.Distinct().Count()), 
                        SegmentCount = g.Select(s => s.Id.Distinct().Count()), 
                    });
    

    but I doubt that the underlying query provider will support it (at best it will turn it into two sub-queries).

    Note that neither query will likely perform as fast as the raw SQL, since SQL can optimize the query before returning the results to Linq.