I have a linq query that uses GroupJoin and is always evaluating locally causing to hit SQL server n times where n is # of orders.
I have 3 tables Orders, OrderStatus, OrderStatusCode.
Orders has OrderId, CustomerId, ProductId
OrderStatus has OrderId, OrderStatusId, OrderStatusCodeId,
OrderStatusCode has OrderStatusCodeId, OrderStatusCodeName
OrderId, CustomerId, ProductId
1 5000 100
2 5400 100
OrderId, OrderStatusId, OrderStatusCodeId CreatedDateTime
1 1 1 -- started 12/01/2019
1 2 2 -- completed 12/01/2019
1 3 3 -- shipped 12/03/2019
2 1 1 -- started 12/01/2019
2 2 4 -- canceled 12/01/2019
2 3 5 -- refunded 12/10/2019
OrderStatusCodeId, OrderStatusCodeName
1 started
2 completed
3 shipped
4 canceled
5 refunded
var OrderWithLatestStatus = _dbContext.Orders.Include(h =>
h.OrderStatus).ThenInclude(hs => hs.OrderStatusCode)
.Where(o => o.ProductId == "100")
.GroupJoin(_dbContext.OrderStatus,
order => order.OrderId,
status => status.OrderId,
(o, g) => new
{
Order = o,
OrderStatuses = g
})
.Select(x => new EvalWithStatus
{
OrderId = x.Order.OrderId,
CustomerId = x.Order.CustomerId,
AllStatuses = x.OrderStatuses,
LatestOrderStatusCodeName = x.OrderStatuses.Any() ?
x.OrderStatuses.Any(s =>
s.OrderStatusCode.OrderStatusCodeName.Equals("Canceled"))
? x.OrderStatuses.FirstOrDefault(s =>
s.OrderStatusCode.OrderStatusCodeName.Equals("Canceled"))
.OrderStatusCode.OrderStatusCodeName :
x.OrderStatuses.OrderByDescending(s =>
s.CreatedDateTime).FirstOrDefault()
.OrderStatusCode.OrderStatusCodeName
: "Unknown"
}
Inside select I'm expecting the latest status to show cancelled for cancelled orders and the actual latest status for completed items
OrderId, CustomerId, AllStatuses, LatestOrderStatusCodeName
1 5000 IEnuerable<OrderStatus> Shipped
2 5400 IEnuerable<OrderStatus> Canceled
The linq query says it can't evaluate x.OrderStatuses.Any() so it will be evaluated locally causing two separate calls to the database.
What could be changed in the query to make it evaluate on the server?
There are several issues with this query.
First, it's using projection, so Include
/ ThenInclude
are redundant they are ignored and normally logged as warning, but also can be configured to throw exception.
Second, and that's the main issue is mixing the manual joins and navigation properties. According to the includes, you have the proper navigation properties, so you should not use manual joins at all - reference navigation property represents a join
(left
or inner
depending on whether it's required* or *optional*) and *collection* navigation property represents a
group join`.
In your case, Order.OrderStatus
property (it should have been called OrderStatuses
) represents exactly the OrderStatuses
from your GroupJoin
.
So simply replacing the GroupJoin
with
.Select(o => new { Order = o, OrderStatuses = o.OrderStatus })
or using it directly in the final projection will solve the client evaluation.
However, you can do it better. All these Any
/ FirstOrDefault
even with server evaluation cause several SQL subqueries to related tables. They can be trimmed down to single TOP 1
subquery using a proper ordering, e.g.
var OrderWithLatestStatus = _dbContext.Orders
.Select(order => new EvalWithStatus
{
OrderId = order.OrderId,
CustomerId = order.CustomerId,
AllStatuses = order.OrderStatus.ToList(),
LatestOrderStatus = order.OrderStatus
.OrderBy(s => s.OrderStatusCode.OrderStatusCodeName == "canceled" ? 0 : 1)
.ThenByDescending(s => s.CreatedDateTime)
.Select(s => s.OrderStatusCode.OrderStatusCodeName)
.FirstOrDefault() ?? "Unknown"
});
Also note ToList()
call here
AllStatuses = order.OrderStatus.ToList(),
This is to opt-in for EF Core 2.1 Optimization of correlated subqueries and eliminates the N + 1 query problem with such data.