Search code examples
c#sql-serverentity-frameworklinqlinq-to-entities

How to make EF efficiently call an aggregate function?


I'm trying to write a LINQ-to-entities query that will take an ICollection navigation property of my main object and attach some metadata to each of them which is determined through joining each of them to another DB table and using an aggregate function. So the main object is like this:

public class Plan
{
    ...
    public virtual ICollection<Room> Rooms { get; set; }
}

And my query is this:

var roomData = (
    from rm in plan.Rooms
    join conf in context.Conferences on rm.Id equals conf.RoomId into cjConf
    select new {
        RoomId = rm.Id,
        LastUsedDate = cjConf.Count() == 0 ? (DateTime?)null : cjConf.Max(conf => conf.EndTime)
    }
).ToList();

What I want is for it to generate some efficient SQL that uses the aggregate function MAX to calculate the LastUsedDate, like this:

SELECT
    rm.Id, MAX(conf.EndTime) AS LastUsedDate
FROM
    Room rm
LEFT OUTER JOIN
    Conference conf ON rm.Id = conf.RoomId
WHERE
    rm.Id IN ('a967c9ce-5608-40d0-a586-e3297135d847', '2dd6a82d-3e76-4441-9a40-133663343d2b', 'bb302bdb-6db6-4470-a24c-f1546d3e6191')
GROUP BY
    rm.id

But when I profile SQL Server it shows this query from EF:

SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[RoomId] AS [RoomId], 
    [Extent1].[ProviderId] AS [ProviderId], 
    [Extent1].[StartTime] AS [StartTime], 
    [Extent1].[EndTime] AS [EndTime], 
    [Extent1].[Duration] AS [Duration], 
    [Extent1].[ParticipantCount] AS [ParticipantCount], 
    [Extent1].[Name] AS [Name], 
    [Extent1].[ServiceType] AS [ServiceType], 
    [Extent1].[Tag] AS [Tag], 
    [Extent1].[InstantMessageCount] AS [InstantMessageCount]
    FROM [dbo].[Conference] AS [Extent1]

So it is selecting everything from Conference and doing the Max() calculation in memory, which is very inefficient. How can I get EF to generate the proper SQL query with the aggregate function in?


Solution

  • The equivalent LINQ to Entities query which closely translates to the SQL query you are after is like this:

    var roomIds = plan.Rooms.Select(rm => rm.Id);
    
    var query =
        from rm in context.Rooms
        join conf in context.Conferences on rm.Id equals conf.RoomId
        into rmConf from rm in rmConf.DefaultIfEmpty() // left join
        where roomIds.Contains(rm.Id)
        group conf by rm.Id into g
        select new
        {
            RoomId = g.Key,
            LastUsedDate = g.Max(conf => (DateTime?)conf.EndTime)
        };
    

    The trick is to start the query from EF IQueryable, thus allowing it to be fully translated to SQL, rather than from plan.Rooms as in the query in question which is IEnumerable and makes the whole query execute in memory (context.Conferences is treated as IEnumerable and causes loading the whole table in memory).

    The SQL IN clause is achieved by in memory IEnumerable<Guid> and Contains method.

    Finally, there is no need to check the count. SQL naturally handles nulls, all you need is to make sure to call the nullable Max overload, which is achieved with the (DateTime?)conf.EndTime cast. There is no need to check conf for null as in LINQ to Objects because LINQ to Entities/SQL handles that naturally as well (as soon the receiver variable is nullable).