Search code examples
c#sql-serverlinqentity-framework-core

EF Core - Cannot perform an aggregate function on an expression containing an aggregate or a subquery


I am trying to write a query that gives me the average time between status changes of an database entity. On each update an entry is created in an audit table, this table contains a Status and PreviousStatus field to allow me to easily query the changes. I am able to get the output I want via the following query but the issue is that this is being executed in memory and not in the database which is not the best solution.

//this is the best one so far but it does the average function in memory
private async Task<double> GetAvgBetweenStatus()
{
    return (await _context.Set<Audit_Opportunity>().Where(x => x.Status == OpportunityStatus.Solutioned && x.PreviousStatus == OpportunityStatus.Qualified).Select(x => new
    {
        Date1 = x.AuditDate,
        Date2 = _context.Set<Audit_Opportunity>().Where(o => o.OpportunityId == x.OpportunityId && o.Id < x.Id && o.PreviousStatus == OpportunityStatus.Identified).OrderByDescending(o => o.Id).First().AuditDate
    })
    .Select(x =>
        new
        {
            Duration = EF.Functions.DateDiffSecond(x.Date2, x.Date1)
        })
    .ToListAsync() // <--- here it is doing the query  ) 
    .Average(x => x.Duration); // <--- this bit is done in memory / not good!
}

If I try and remove the ToListAsync() method and use AverageAsync() for example I get a "Cannot perform an aggregate function on an expression containing an aggregate or a subquery" exception.

It produces the following invalid SQL statement

    SELECT AVG(CAST(DATEDIFF(second, (
    SELECT TOP(1) [a0].[AuditDate]
    FROM [Audit_Opportunity] AS [a0]
    WHERE [a0].[OpportunityId] = [a].[OpportunityId] AND [a0].[Id] < [a].[Id] AND [a0].[PreviousStatus] = 1), [a].[AuditDate]) AS float))
FROM [Audit_Opportunity] AS [a]
WHERE [a].[Status] = 3 AND [a].[PreviousStatus] = 2

What I think I need is to add a SELECT AVG(....) FROM T like this:

** I am using datediff seconds just for testing. ***

SELECT AVG(t.Duration) FROM(
 SELECT DATEDIFF(SECOND, (
   SELECT TOP(1) [a0].[AuditDate]
    FROM [Audit_Opportunity] AS [a0]
    WHERE [a0].[OpportunityId] = [a].[OpportunityId] AND [a0].[Id] < [a].[Id] AND [a0].[PreviousStatus] = 1 order by [a0].Id DESC), [a].[AuditDate]) AS [Duration]
    FROM [Audit_Opportunity] AS [a]
    WHERE [a].[Status] = 3 AND [a].[PreviousStatus] = 2) AS T

No matter what I try it either causes the exception or it works but produces a mess of a query which is most likely very inefficient.

Here are the main columns which are key to this query.

Id OpportunityId Status AuditDate PreviousStatus
10049 5 1 2024-08-08 13:08:11.2210321 0
10050 5 2 2024-08-08 13:08:13.2470991 1
10051 5 3 2024-08-08 13:08:17.3099538 2
10052 5 0 2024-08-08 13:08:22.7923615 3
10053 5 1 2024-08-08 13:08:29.7530627 0
10054 5 2 2024-08-08 13:08:31.5445853 1
10055 5 3 2024-08-08 13:08:32.9252342 2
10056 5 4 2024-08-08 13:08:37.0011645 3
10057 5 2 2024-08-08 13:08:48.6677023 4

Solution

  • The problem is that EF has translated your LINQ expression into a query that is syntactically (and even semantically) correct, but has run into a SQL Server limitation. The trick is to try to find an equivalent form that generates a different but supported SQL query.

    If I were to code this up in SQL Server, I would use a CROSS APPLY (SELECT TOP 1 ...) to look up the prior (starting) audit record. Something like:

    SELECT AVG(CAST(DATEDIFF(second, prior.AuditDate, a.AuditDate) AS float))
    FROM Audit_Opportunity AS a
    CROSS APPLY (
        SELECT TOP 1 a0.*
        FROM Audit_Opportunity a0
        WHERE a0.OpportunityId = a.OpportunityId
        AND a0.Id < a.Id
        AND a0.PreviousStatus = 1
        ORDER BY a0.Id DESC
    ) prior
    WHERE a.Status = 3
    AND a.PreviousStatus = 2
    

    If I use a tool (that I shall leave unnamed) to convert the above SQL into equivalent LINQ, I get the following:

    var result = _context.Set<Audit_Opportunity>()
        .Where(a => a.Status == OpportunityStatus.Solutioned
                    && a.PreviousStatus == OpportunityStatus.Qualified)
        .Select(a => new
        {
            Duration = _context.Set<Audit_Opportunity>()
                .Where(a0 => a0.OpportunityId == a.OpportunityId
                             && a0.Id < a.Id
                             && a0.PreviousStatus == OpportunityStatus.Identified)
                .OrderByDescending(a0 => a0.Id)
                .Select(a0 => EF.Functions.DateDiffSecond(a0.AuditDate, a.AuditDate))
                .FirstOrDefault()
        })
        .Select(t => (double?)t.Duration) // Convert duration to double
        .Average(); // Calculate the average
    

    However, I suspect that the above is sufficiently close to your original query, that it might suffer the same translation fate. The "good" SQL query that you included in your post yielded a similar translation.

    I found the following "How do I write this cross apply query in LINQ-to-SQL?" question that had a promising answer. Adapting that answer to your case I came up with the following LINQ queries (one using LINQ query syntax, and one using LINQ method syntax).

    var result = 
        (from a in _context.Set<Audit_Opportunity>()
         where a.Status == OpportunityStatus.Solutioned
               && a.PreviousStatus == OpportunityStatus.Qualified
         from a0 in (
             from a0 in _context.Set<Audit_Opportunity>()
             where a0.OpportunityId == a.OpportunityId
               && a0.Id < a.Id
               && a0.PreviousStatus == OpportunityStatus.Identified
             orderby a0.Id descending
             select a0
             )
             .Take(1)
         select (double?)EF.Functions.DateDiffSecond(a0.AuditDate, a.AuditDate)
        ).Average();
    
    var result = _context.Set<Audit_Opportunity>()
        .Where(a => a.Status == OpportunityStatus.Solutioned
                    && a.PreviousStatus == OpportunityStatus.Qualified)
        .SelectMany(a => _context.Set<Audit_Opportunity>()
            .Where(a0 => a0.OpportunityId == a.OpportunityId
                        && a0.Id < a.Id
                        && a0.PreviousStatus == OpportunityStatus.Identified)
            .OrderByDescending(a0 => a0.Id)
            .Take(1)
            .Select(a0 => (double?)EF.Functions.DateDiffSecond(a0.AuditDate, a.AuditDate))
        )
        .Average();
    

    These are untested, but hopefully one will translate to a workable SQL form.

    Another idea (that I haven't fleshed out) might be to add a second select value (such as a0.Id) to the subquery so that it can no longer be translated as a scalar subquery. To avoid having it optimized out, we would have to reference that value in some harmless way in the outer query, such as where inner.Id > 0. It isn't pretty, but might force a different translation.

    If all else fails, the fallback is to code up and execute a raw SQL query.

    (For reference, here is a fiddle containing data, the original posted queries, and the cross apply query from above.)