I am having a query which groups data twice:
var query = (from a in Context.SetA()
from b in Context.SetB().Where(x => x.aId == a.Id) // 1-to-many
from c in Context.SetC().Where(x => x.bId == b.Id) // 1-to-many
group new {...} by new {...} into g
select new
{
g.Key.X,
g.Key.Y,
g.Sum(x => x....), // etc, lots of Sums
});
from (q in query
group q by true
select new
{
g.Key.Z,
g.Sum(x => x.....), // etc, lots of Sums
});
This all works well, and performance is acceptable. Now when I add All (or Any, makes no difference) in both groupings
g.All(x => x.Flag)
performance drops drastically. It becomes like 10x slower, and goes from 5s to 50s. Numbers are not accurate, its just to get the idea.
I am doing grouping twice, since I have three levels of 1-to-many relationships, and I do summary on all levels (Sum of parent, sum of children and sum of grandchildren values).
Any ideas what can I do to improve performance?
The problem is that there are not natural SQL GROUP BY
aggregates which can be mapped to All
/ Any
applied to the grouping, so the EF SQL translation is inefficient.
The solution is to use their aggregate equivalents.
So instead of
g.All(x => x.Flag)
you could use
g.Min(x => x.Flag ? 1 : 0) == 1
and respectively
g.Any(x => x.Flag)
can be replaced with
g.Max(x => x.Flag ? 1 : 0) == 1
Update: Strangely, EF generates 2 MIN
/ MAX
calls for the above expressions. Which is fixed by adding at the end (after == 1
) the counterintuitive ? true : false