Search code examples
c#entity-framework-corelazy-loading

Is there a way to batch load navigation properties in EF Core?


I'm using EF Core 5, suppose I have the following database model (code first, pseudo code).

Company {
  string Name,
  int PrimaryAddressId,
  Address PrimaryAddress,
  Location[] Locations
}

Address {
  string City,
  string StreetAddress
}

Location {
  int CompanyId,
  int AddressId,
  Address Address
}

I would like to load a bunch of companies (let's say 50) from the database, including all their related data.

I'm aware that I have 2 options to load related data, however, both have some caveats.

  • Using lazy loading proxies: This will result in the N+1 queries issue, it will query the related data entity by entity, so at the end of the day a huge amount of queries will be sent to the server.
  • Using eager loading with Includes: This will result in a huge single query with an anormous amount of joins, possibly fetching a lot of redundant data and most likely it will be quiet slow. Not to mention that I explicitly need to enumerate all the wanted paths with Includes which is quiet error-prone.

What I'm looking for is some tool/library, whatever, what could batch load related data for my entities, in a way similar to this kind of code:

var companies = context.Company.Take(50);
var companyIds = companies.Select(e => e.Id).ToHashSet();

var addressIds = companies.Select(e => e.PrimaryAddressId).ToHashSet();
var addressesById = context.Address.Where(e => addressIds.Contain(e.Id)).ToDictionary(e => e.Id);
companies.ForEach(e => e.PrimaryAddress = addressesById[e.PrimaryAddressId]);

var locationsByCompanyId = context.Location.Where(e => companyIds.Contains(e.CompanyId)).ToLookup(e => e.CompanyId);
companies.ForEach(e => e.Locations.AddRange(locationsByCompanyId[e.Id]));

...

This seems to me as a middle-of-the-road approach with a quiet good performance (queries will use the IN operator and most likely utilize indexes). Also, the code seems to be kind of boilerplate in the sense that one should be able to generalize/automate it somehow with some tooling, using the model metadata, foreign keys, etc.

Any ideas would be appreciated.


Solution

  • Using eager loading with Includes: This will result in a huge single query with an anormous amount of joins (1), possibly fetching a lot of redundant data and most likely it will be quiet slow (2). Not to mention that I explicitly need to enumerate all the wanted paths with Includes which is quiet error-prone.(3)

    Ad (1): Define "enormous amount of joins". In my experience, most self-respecting relational databases can handle several factors more joins than human users, without slowing down perceptibly.

    Ad (2): Don't optimize for things that might possibly happen. Measure it, and if it is slow, analyze why it is slow. This sounds like premature optimization, a known anti-pattern.

    Ad (3): This seems like the kind of low-level data-layer code that needs writing once (with ample tests!) and should be low-maintenance afterwards.

    So my suggestion would be to write it, (stress-)test it, and if there are performance problems, analyze them and look for specific solutions for actual problems.

    If the same data is retrieved repeatedly, I would focus more on a form of caching to counter potential performance issues, for instance.