I've found the variety of questions related to loading using .Include
, etc, but it seems like SQL shapes that query into a huge join which means if I'm getting Customer information, and that customer owns 1000 items and I do
context.Customers.Include(c=> c.Inventory).Where(c=>c.ID == id);
//side note, WHY can't I use .Find() after Include?
I'll get 1000 rows of the same customer information duplicated along with the item information rather than 1 customer record and 1000 items in a multi-table set. This seems really inefficient and leads to some really slow queries.
So if I have a set of customers:
//get all customers in TX
var texasCustomers = context.Customers.Where(c=> c.State == "TX");
And I want to loop over them in an export to XLSX:
foreach (var c in texasCustomers) {
//compile row per item SKU
foreach(var sku in c.Inventory.GroupBy(i=>i.SKU)) {
xlsx.SetCellValue(row, col, c.Name);
//output more customer info here
xlsx.SetCellValue(row, col, sku.Key);
xlsx.SetCellValue(row, col, sku.Sum(i=>i.Quantity));
}
}
This generates a query to the 'Inventory' table PER customer. Which is a fast query, but when you do the SAME query 1000 times, it gets annoyingly slow.
So I've done things like this:
//get customer key
var customerids = texasCustomers.Select(c=> c.ID).ToArray();
var inventories = context.Inventories.Where(i=> customerids.Contains(i.CustomerID)).ToList();
...and now my export loop looks more like this... the inner loop operating on the nav property from the first example becomes an in-memory linq filter against the prebuilt list of inventory objects:
foreach (var c in texasCustomers) {
//compile row per item SKU
foreach(var sku in inventories.Where(i=> i.CustomerID == c.ID)) {
xlsx.SetCellValue(row, col, c.Name);
//output more customer info and then the sku info
xlsx.SetCellValue(row, col, sku.Key);
xlsx.SetCellValue(row, col, sku.Sum(i=>i.Quantity));
}
}
This successfully gets around the 'query per loop' issue, but has obvious downsides... and just feels wrong.
So, what am I missing? Where's the secret EF features that let me do something like:
texasCustomers.LoadAll(c=> c.Inventories);
to "populate" all of the collection member's navigational properties in one go? Or am I approaching the problem from the wrong angle?
Is there a way to struture the query to get EF to generate SQL that doesn't turn into a single giant denormalized table?
There is no secret EF feature that allows you to do exactly what you want, but there is something close called navigation property fix up, which populates the navigation properties of the materialized entity even without Include
if the related entities are already loaded in the context.
Hence you can first load the related inventories as follows:
texasCustomers.SelectMany(c => c.Inventories).Load();
and then execute and iterate the main query:
foreach (var c in texasCustomers)
{
var inventories = c.Inventories; // must be there
// ...
}
But to avoid lazy loading when accessing the navigation property, make sure the lazy loading is disabled before doing all the above by inserting the following line at the very beginning:
context.Configuration.ProxyCreationEnabled = false;
One important detail I forgot to mention is that with the aforementioned technique, if there are no related entities, the navigation property will stay null
rather than returning empty list as with normal usages, so make sure to check to include null
checks or use ?? Enumerable.Empty<Inventory>()
when accessing it.