Search code examples
c#mysqlperformanceentity-framework-coreef-core-8.0

How to select distinct items from table by a variable and then join another table in LINQ


I have a query where I need to select distinct order numbers from a table of orders. An order number can correspond to multiple entries in the table of orders but I only want distinct orders. I.E. I only want to grab one entry for a given order number. Then I want to join that with a customers table on o.CustomerKey equals c.CustomerKey. I have tried doing this in a couple different ways and haven't found a fast and concise way of doing it. My thought is to select distinct orders by OrderNumber then join the Customers table where conditional criteria is met then select my data I need to return. This is with Entity Framework Core 8.0.6

CustomerOrder is a DTO with the fields that can be seen in the query.

These are the relevant Customer table columns: [Customer_Key] ,[Customer_No],[Last_Name],[First_Name],[Address],[City],[State],[Zip_Code],[eMail],[Phone]

And these are the relevant Order columns, [Order_Number],[Customer_Key]

These tables were scaffolded to create Entities in my code.

This is one query I tried:

public async Task<List<CustomerOrder>> GetOrdersMatchingAsync(string lastName, string address, string city, string state, string zipCode)
        {
            var query = 
            from c in DimensionCustomers
            join o in (
                from o in FactDirectOrders
                select new { o.OrderNumber, o.CustomerKey }
            ).Distinct() on c.CustomerKey equals o.CustomerKey
            where
            (string.IsNullOrEmpty(lastName) || c.LastName == 
            lastName) &&
            (string.IsNullOrEmpty(address) || c.Address == address) 
             &&
            (string.IsNullOrEmpty(zipCode) || c.ZipCode == zipCode) 
             &&
            (string.IsNullOrEmpty(city) || c.City == city) &&
            (string.IsNullOrEmpty(state) || c.State == state)
            select new CustomerOrder
            {
                OrderNumber = o.OrderNumber,
                CustomerNo = c.CustomerNo,
                CustomerKey = c.CustomerKey,
                FirstName = c.FirstName,
                LastName = c.LastName,
                Address = c.Address,
                City = c.City,
                State = c.State,
                ZipCode = c.ZipCode,
            };
            return await query.AsNoTracking().ToListAsync();
        }

This currently works but I am not sure it is optimal. I have a couple queries similar to this where I am wanting to select distinct values before a join but maybe this is not best practice? Any guidance or advice would be appreciated.


Solution

  • I can see a few issues with your query. First the obvious stuff:

    This can be simplified considerably by leveraging navigation properties for the relationships between the entities. EF is not merely a Linq-enabled substitute for ADO and effectively building SQL statements.

    When projecting (using Select) whether using DTOs, ViewModels, or populating "new" entity instances, you don't need AsNoTracking(), CustomerOrder DTOs are not anything that the DbContext would be tracking, nor the entity/table referenced in the query.

    When dealing with conditional search criteria it is better to extract these out of the Linq expression and apply them into the Linq only if they are needed rather than imbedding things like #null checks into the Linq. The way you have them will result in a bunch of #null comparisons in the SQL.

    Using navigation properties I would expect that the Customer entity should have a Collection of Orders, or at the very least the Order has a navigation property back to its Customer. To get the query using navigation properties and extracting the conditional #null checks your query could be simplified to something like:

    var query = _context.DirectOrders.AsQueryable();
    
    // Each filter will only be appended (ANDed together) if the value is present.
    if (!string.IsNullOrEmpty(lastName))
        query = query.Where(o => o.Customer.LastName == lastName);
    
    if (!string.IsNullOrEmpty(address)) 
        query = query.Where(o => o.Customer.Address == address); 
    
    if (!string.IsNullOrEmpty(zipCode))
        query= query.Where( o => o.Customer.ZipCode == zipCode);
                 
    if (!string.IsNullOrEmpty(city) 
        query = query.Where*o => o.Customer.City == city));
    
    if (!string.IsNullOrEmpty(state))
        query = query.Where(o => o.Customer.State == state)
    
    var results = await query.Select(o => new CustomerOrder 
    {
        OrderNumber = o.OrderNumber,
        CustomerNo = o.Customer.CustomerNo,
        CustomerKey = o.Customer.CustomerKey,
        FirstName = o.Customer.FirstName,
        LastName = o.Customer.LastName,
        Address = o.Customer.Address,
        City = o.Customer.City,
        State = o.Customer.State,
        ZipCode = o.Customer.ZipCode,
    }).Distinct().ToListAsync();
    
    return results;
    

    The Distinct() use assumes that your DirectOrders can have multiple rows for the same OrderNumber. Navigation properties between entities can be bi-directional (Order has a Customer reference, AND Customer has an Orders collection) or uni-directional. (Order has a Customer reference OR Customer has an Orders collection) Normally I recommend using uni-directional navigation properties where one entity can serve as an aggregate root to the other. However, in some cases where two entities commonly work as top-level entities (such as a Customer and an Order) then it makes sense to use bi-directional references. If you only have Customer.Orders for example and no Order.Customers then the query changes slightly:

    var query = _context.Customers.AsQueryable();
    
    // Each filter will only be appended (ANDed together) if the value is present.
    if (!string.IsNullOrEmpty(lastName))
        query = query.Where(c => c.LastName == lastName);
    
    if (!string.IsNullOrEmpty(address)) 
        query = query.Where(c => c.Address == address); 
    
    if (!string.IsNullOrEmpty(zipCode))
        query= query.Where( c => c.ZipCode == zipCode);
                 
    if (!string.IsNullOrEmpty(city) 
        query = query.Where*c => c.City == city));
    
    if (!string.IsNullOrEmpty(state))
        query = query.Where(c => c.State == state)
    
    var results = await query.SelectMany(c => c.Orders.Select( o => new CustomerOrder 
    {
        OrderNumber = o.OrderNumber,
        CustomerNo = c.CustomerNo,
        CustomerKey = c.CustomerKey,
        FirstName = c.FirstName,
        LastName = c.LastName,
        Address = c.Address,
        City = c.City,
        State = c.State,
        ZipCode = c.ZipCode,
    })).Distinct().ToListAsync();
    
    return results;
    

    That does a query from the customer level, but then a SelectMany() to defer to the Orders, building the DTO from the combined Customer (c=>) and Order (o=>).