I am trying to write a query that gives me the average time between status changes of an database entity. On each update an entry is created in an audit table, this table contains a Status and PreviousStatus field to allow me to easily query the changes. I am able to get the output I want via the following query but the issue is that this is being executed in memory and not in the database which is not the best solution.
//this is the best one so far but it does the average function in memory
private async Task<double> GetAvgBetweenStatus()
{
return (await _context.Set<Audit_Opportunity>().Where(x => x.Status == OpportunityStatus.Solutioned && x.PreviousStatus == OpportunityStatus.Qualified).Select(x => new
{
Date1 = x.AuditDate,
Date2 = _context.Set<Audit_Opportunity>().Where(o => o.OpportunityId == x.OpportunityId && o.Id < x.Id && o.PreviousStatus == OpportunityStatus.Identified).OrderByDescending(o => o.Id).First().AuditDate
})
.Select(x =>
new
{
Duration = EF.Functions.DateDiffSecond(x.Date2, x.Date1)
})
.ToListAsync() // <--- here it is doing the query )
.Average(x => x.Duration); // <--- this bit is done in memory / not good!
}
If I try and remove the ToListAsync() method and use AverageAsync() for example I get a "Cannot perform an aggregate function on an expression containing an aggregate or a subquery" exception.
It produces the following invalid SQL statement
SELECT AVG(CAST(DATEDIFF(second, (
SELECT TOP(1) [a0].[AuditDate]
FROM [Audit_Opportunity] AS [a0]
WHERE [a0].[OpportunityId] = [a].[OpportunityId] AND [a0].[Id] < [a].[Id] AND [a0].[PreviousStatus] = 1), [a].[AuditDate]) AS float))
FROM [Audit_Opportunity] AS [a]
WHERE [a].[Status] = 3 AND [a].[PreviousStatus] = 2
What I think I need is to add a SELECT AVG(....) FROM T like this:
** I am using datediff seconds just for testing. ***
SELECT AVG(t.Duration) FROM(
SELECT DATEDIFF(SECOND, (
SELECT TOP(1) [a0].[AuditDate]
FROM [Audit_Opportunity] AS [a0]
WHERE [a0].[OpportunityId] = [a].[OpportunityId] AND [a0].[Id] < [a].[Id] AND [a0].[PreviousStatus] = 1 order by [a0].Id DESC), [a].[AuditDate]) AS [Duration]
FROM [Audit_Opportunity] AS [a]
WHERE [a].[Status] = 3 AND [a].[PreviousStatus] = 2) AS T
No matter what I try it either causes the exception or it works but produces a mess of a query which is most likely very inefficient.
Here are the main columns which are key to this query.
Id | OpportunityId | Status | AuditDate | PreviousStatus |
---|---|---|---|---|
10049 | 5 | 1 | 2024-08-08 13:08:11.2210321 | 0 |
10050 | 5 | 2 | 2024-08-08 13:08:13.2470991 | 1 |
10051 | 5 | 3 | 2024-08-08 13:08:17.3099538 | 2 |
10052 | 5 | 0 | 2024-08-08 13:08:22.7923615 | 3 |
10053 | 5 | 1 | 2024-08-08 13:08:29.7530627 | 0 |
10054 | 5 | 2 | 2024-08-08 13:08:31.5445853 | 1 |
10055 | 5 | 3 | 2024-08-08 13:08:32.9252342 | 2 |
10056 | 5 | 4 | 2024-08-08 13:08:37.0011645 | 3 |
10057 | 5 | 2 | 2024-08-08 13:08:48.6677023 | 4 |
The problem is that EF has translated your LINQ expression into a query that is syntactically (and even semantically) correct, but has run into a SQL Server limitation. The trick is to try to find an equivalent form that generates a different but supported SQL query.
If I were to code this up in SQL Server, I would use a CROSS APPLY (SELECT TOP 1 ...)
to look up the prior (starting) audit record. Something like:
SELECT AVG(CAST(DATEDIFF(second, prior.AuditDate, a.AuditDate) AS float))
FROM Audit_Opportunity AS a
CROSS APPLY (
SELECT TOP 1 a0.*
FROM Audit_Opportunity a0
WHERE a0.OpportunityId = a.OpportunityId
AND a0.Id < a.Id
AND a0.PreviousStatus = 1
ORDER BY a0.Id DESC
) prior
WHERE a.Status = 3
AND a.PreviousStatus = 2
If I use a tool (that I shall leave unnamed) to convert the above SQL into equivalent LINQ, I get the following:
var result = _context.Set<Audit_Opportunity>()
.Where(a => a.Status == OpportunityStatus.Solutioned
&& a.PreviousStatus == OpportunityStatus.Qualified)
.Select(a => new
{
Duration = _context.Set<Audit_Opportunity>()
.Where(a0 => a0.OpportunityId == a.OpportunityId
&& a0.Id < a.Id
&& a0.PreviousStatus == OpportunityStatus.Identified)
.OrderByDescending(a0 => a0.Id)
.Select(a0 => EF.Functions.DateDiffSecond(a0.AuditDate, a.AuditDate))
.FirstOrDefault()
})
.Select(t => (double?)t.Duration) // Convert duration to double
.Average(); // Calculate the average
However, I suspect that the above is sufficiently close to your original query, that it might suffer the same translation fate. The "good" SQL query that you included in your post yielded a similar translation.
I found the following "How do I write this cross apply query in LINQ-to-SQL?" question that had a promising answer. Adapting that answer to your case I came up with the following LINQ queries (one using LINQ query syntax, and one using LINQ method syntax).
var result =
(from a in _context.Set<Audit_Opportunity>()
where a.Status == OpportunityStatus.Solutioned
&& a.PreviousStatus == OpportunityStatus.Qualified
from a0 in (
from a0 in _context.Set<Audit_Opportunity>()
where a0.OpportunityId == a.OpportunityId
&& a0.Id < a.Id
&& a0.PreviousStatus == OpportunityStatus.Identified
orderby a0.Id descending
select a0
)
.Take(1)
select (double?)EF.Functions.DateDiffSecond(a0.AuditDate, a.AuditDate)
).Average();
var result = _context.Set<Audit_Opportunity>()
.Where(a => a.Status == OpportunityStatus.Solutioned
&& a.PreviousStatus == OpportunityStatus.Qualified)
.SelectMany(a => _context.Set<Audit_Opportunity>()
.Where(a0 => a0.OpportunityId == a.OpportunityId
&& a0.Id < a.Id
&& a0.PreviousStatus == OpportunityStatus.Identified)
.OrderByDescending(a0 => a0.Id)
.Take(1)
.Select(a0 => (double?)EF.Functions.DateDiffSecond(a0.AuditDate, a.AuditDate))
)
.Average();
These are untested, but hopefully one will translate to a workable SQL form.
Another idea (that I haven't fleshed out) might be to add a second select value (such as a0.Id
) to the subquery so that it can no longer be translated as a scalar subquery. To avoid having it optimized out, we would have to reference that value in some harmless way in the outer query, such as where inner.Id > 0
. It isn't pretty, but might force a different translation.
If all else fails, the fallback is to code up and execute a raw SQL query.
(For reference, here is a fiddle containing data, the original posted queries, and the cross apply query from above.)