Search code examples
c#.netlinq.net-corelinq-to-entities

GroupJoin always evaluating locally


I have a linq query that uses GroupJoin and is always evaluating locally causing to hit SQL server n times where n is # of orders.

I have 3 tables Orders, OrderStatus, OrderStatusCode.

Orders has OrderId, CustomerId, ProductId
OrderStatus has OrderId, OrderStatusId, OrderStatusCodeId,
OrderStatusCode has OrderStatusCodeId, OrderStatusCodeName

OrderId, CustomerId, ProductId
1         5000       100
2         5400       100

OrderId, OrderStatusId, OrderStatusCodeId  CreatedDateTime
1        1              1 -- started        12/01/2019
1        2              2 -- completed      12/01/2019
1        3              3 -- shipped        12/03/2019

2        1              1 -- started        12/01/2019
2        2              4 -- canceled       12/01/2019
2        3              5 -- refunded       12/10/2019

OrderStatusCodeId, OrderStatusCodeName
1                  started
2                  completed
3                  shipped
4                  canceled
5                  refunded 




    var OrderWithLatestStatus = _dbContext.Orders.Include(h => 
                         h.OrderStatus).ThenInclude(hs => hs.OrderStatusCode)
                        .Where(o => o.ProductId == "100")
                        .GroupJoin(_dbContext.OrderStatus,
                            order => order.OrderId,
                            status => status.OrderId,
                            (o, g) => new 
                            {
                                Order = o,
                                OrderStatuses = g
                            })
                        .Select(x => new EvalWithStatus
                        {
                            OrderId = x.Order.OrderId,
                            CustomerId = x.Order.CustomerId, 
                            AllStatuses = x.OrderStatuses,
                            LatestOrderStatusCodeName = x.OrderStatuses.Any() ? 
                              x.OrderStatuses.Any(s => 
                           s.OrderStatusCode.OrderStatusCodeName.Equals("Canceled")) 
                ? x.OrderStatuses.FirstOrDefault(s => 
                   s.OrderStatusCode.OrderStatusCodeName.Equals("Canceled"))
                  .OrderStatusCode.OrderStatusCodeName :  
                   x.OrderStatuses.OrderByDescending(s => 
                    s.CreatedDateTime).FirstOrDefault()
                    .OrderStatusCode.OrderStatusCodeName 
                     : "Unknown"

}

Inside select I'm expecting the latest status to show cancelled for cancelled orders and the actual latest status for completed items

OrderId, CustomerId, AllStatuses,              LatestOrderStatusCodeName
1        5000        IEnuerable<OrderStatus>   Shipped
2        5400        IEnuerable<OrderStatus>   Canceled

The linq query says it can't evaluate x.OrderStatuses.Any() so it will be evaluated locally causing two separate calls to the database.

What could be changed in the query to make it evaluate on the server?


Solution

  • There are several issues with this query.

    First, it's using projection, so Include / ThenInclude are redundant they are ignored and normally logged as warning, but also can be configured to throw exception.

    Second, and that's the main issue is mixing the manual joins and navigation properties. According to the includes, you have the proper navigation properties, so you should not use manual joins at all - reference navigation property represents a join (left or inner depending on whether it's required* or *optional*) and *collection* navigation property represents agroup join`.

    In your case, Order.OrderStatus property (it should have been called OrderStatuses) represents exactly the OrderStatuses from your GroupJoin.

    So simply replacing the GroupJoin with

    .Select(o => new { Order = o, OrderStatuses = o.OrderStatus })
    

    or using it directly in the final projection will solve the client evaluation.

    However, you can do it better. All these Any / FirstOrDefault even with server evaluation cause several SQL subqueries to related tables. They can be trimmed down to single TOP 1 subquery using a proper ordering, e.g.

    var OrderWithLatestStatus = _dbContext.Orders
        .Select(order => new EvalWithStatus
        {
            OrderId = order.OrderId,
            CustomerId = order.CustomerId,
            AllStatuses = order.OrderStatus.ToList(),
            LatestOrderStatus = order.OrderStatus
                .OrderBy(s => s.OrderStatusCode.OrderStatusCodeName == "canceled" ? 0 : 1)
                .ThenByDescending(s => s.CreatedDateTime)
                .Select(s => s.OrderStatusCode.OrderStatusCodeName)
                .FirstOrDefault() ?? "Unknown"
        });
    

    Also note ToList() call here

    AllStatuses = order.OrderStatus.ToList(),
    

    This is to opt-in for EF Core 2.1 Optimization of correlated subqueries and eliminates the N + 1 query problem with such data.