Search code examples
sqlentity-framework.net-6.0

Clarification about the difference between eager loading, explicit loading and lazy loading


I have searched a lot about the difference between eager loading, explicit loading and lazy loading. I wanted to understand the behavior of the three to be able to decide when to use each one. So after a lot of searching, I just wanted to make sure that I understand things correctly. As, if I misunderstood anything I would appreciate someone to correct me.

1_eager loading:

eager loading will be retrieved within a single query because it will be translated to a sql query which contains join operation. This allow the retrieved data to be cached, which will improve the application performance. This is like a tradeoff between the memory consumption and the database trips (between the database server and the application server).

so eager loading is good for performance but it will not be a good choice if I wanted to retrieve a big amount of data: for example if I have a list of Users, each user have a list of reviews. In this situation the eager loading will not be a good choice because of the big memory consumption.

2-lazy loading

lazy loading is not translated to a sql query containing join like eager loading, instead it will be translated to a group of select queries for example if I have a list of Users, each user have a list of reviews the translated sql queries will be like this:

Select * from Users (to retrive all the users)
//for each user  
Select * from Reviews where Reviews.UserId =userId;
//will need 1+n sql quries, where n is the number of users

this will not result in a large consumption of memory, but instead will increase the database trips (between the database server and the application server). so it less than eager loading in performance, but for large amount of data, lazy loading is better to use to avoid large consumption of memory. so for the above example if the number of users are large using lazy loading is better than eager loading.

3-explicit loading:

explicit loading is like lazy loading, but it just works in a single entity. so if I wanted to use it on a list, I have to make a for loop which iterate over each entity. Also, if I used lazy loading, for every time I wanted to retrieve the entity from the data base the related data will be loaded, even If I didn't want to. So If I wanted the related data to be only loaded in specific situations, explicit loading may be a good choice.

So, for example suppose I want to get the related reviews of a user only when I pass the id of the user, The best approach is to use: _explicit loading: as the lazy loading will load the related reviews for every retrieve of the users from the database (which in this case I only need to retrieve it when the id is passed)

        public async Task<User> GetUserById(int id)
    {
        var user = _context.User.Find(id);
        var userWithReviews =context.Entry(user).
       .Collection(b => b.Reviews).load();

        return userWithReviews;
    }

_or using a simple select: as eager loading uses join (and from which I know join is more costly than select operation)

    public async Task<UserAndReviews> GetUserById(int id)
    {
        UserAndReviews userAndReviws = new UserAndReviews();
        var user = await _context.User.FindByIdAsync(id);
        var Reviews =await context.Reviews.where(r=>r.UserId =id).ToListAsync();         
        userAndReviws.user = user;
        userAndReviws.Reviews =Reviews;
        return userAndReviws;
    }

This is what I understood after a lot of searching, and I would truly appreciate any correction if I misunderstood any thing.


Solution

  • You're on the right track, the summary in MS:Learn - Loading Related Data has the simplest wording:

    • Eager loading means that the related data is loaded from the database as part of the initial query.
    • Explicit loading means that the related data is explicitly loaded from the database at a later time.
    • Lazy loading means that the related data is transparently loaded from the database when the navigation property is accessed.

    It does however miss the most important data retrieval pattern Projection Operations. Which we will discuss last...

    1. Eager Loading

    You are correct that Eager loading is a trade off, but it is one that we generally prefer over Lazy Loading. In distributed and cloud based applications the database connection and requesting data from the database is often a performance bottleneck. We are beholden to bandwidth and the overheads of establishing a connection in the first place. We reduce the number of direct requests to the database by retrieving larger sets of data in a single call and effectively caching this data in memory.

    You should only retrive filtered datasets and not try to load the entire table or databases into memory, unless that is your explicit desire. But in many cases eager can actually provide a higher net performance experience. Importantly from a development point of view once the data is loaded we can continue to process the records without needing to have access to the db context. This allows the logic to be decoupled or even de-referenced from the data access if you need to.

    1.1 Join Performance

    It is true that Eager Loading uses joins, but in many cases it is not helpful to think of joins as costly. From an SQL point of view, they are not, but from an EF point for view there is a penalty, not specifically because of the use of a JOIN but because historically the query would be retrieved in a flattened or de-normalised single result set.

    So a query that returns 3 users that each have 2 reviews would actually return 6 rows with the user fields merged into the same records as the reviews:

    user.Id user.Name review.Id review.UserId review.Notes
    1 John 1 1 John's First Review
    1 John 2 1 John's Second Review
    2 Anne 3 2 Anne's First Review
    2 Anne 4 2 Anne's Second Review
    3 Jamie 5 3 Jamie's First Review
    3 Jamie 6 3 Jamie's Second Review

    If User has a lot of fields, or there are multiple levels of nested joins, the data transferred quickly bloats out to include a lot of redundant bytes or information. Join is not inherently bad, unless you are doing a very simple SELECT *. The database itself is very efficient at querying data with joins, it is the transport back to the application logic and the de-normalising it that can have an impact on performance.

    1.2 Split Queries

    EF has a specific optimisation that can help with this performance issue, Split Queries. This simple technique will actually convert the C# expression into multiple SQL queries, one for each table. Then EF will hydrate the graph from those separate results. It doesn't work for al queries, but in many simple cases, this is the low code solution to give you the SQL and data integrity benefits of joins while minimising the EF default transport overheads.

    My personal experience is that since EF Core 7, eager loading with AsSplitQuery() provides a good compromise between performance and frequency of calls to the database. I would NEVER recommend Lazy Loading and only use Explicit Loading in explicit scenarios where the performance benefit can be measured and the additional code maintenance is worth the effort.

    2. Lazy Loading

    Only the root level entities will be retrieved from the database on the initial call.

    // C# - Select all users
    var users = dbContext.Users.ToList();
    
    SELECT * FROM Users;
    

    Then in your application logic, on a per entity basis, when you access a navigation property for the first time that property will be queried from the database directly:

    // C# - Assume there are 3 users
    foreach(var user in users.Take(3))
    {
       var reviews = users.Reviews.ToList();
    }
    
    SELECT * FROM Reviews WHERE Reviews.UserId = @userId; 
    SELECT * FROM Reviews WHERE Reviews.UserId = @userId; 
    SELECT * FROM Reviews WHERE Reviews.UserId = @userId; 
    

    This might be desirable if there are not many entities in the root collection and your logic only accesses the navigation properties in specific conditions, but in general this pattern leads to leaky performance.

    This is not an opinion, it is fact. I didn't say bad performance, though it is my experience, but in general the idea that your code will query the database from multiple places that are not immediately obvious from the code, and often in high frequency scenarios means that each first access to a lazy loaded navigation property must wait for the overhead of connecting to the database and waiting for the response before the code will continue.

    Lazy loading can end up like each line of code firing a new query at the database api without caching

    Even if you disregard the performance impact, queries to databases are highly susceptible to transient errors that may include concurrency and request throttling issues as well as threading, logical blocks and locks within the rows or tables in the server. This means that there is a risk that accessing any navigation property can fail due to a database issue, so it is advisable that you implement a transient error retry policy or be prepared to catch and handle these issues in your code directly.

    If you are using a cloud hosted database there are usually request frequency limits that you must adhere to. Logic against entities using Lazy Loading are at a much higher risk of reaching request limits than other loading options.

    3. Explicit Loading

    This is very similar to Lazy Loading except that we are controlling explicitly when to query the database and we can refine that query.

    Both Lazy Loading and Explicit Loading operate on individual entities.

    Your example is good enough, but the major long term benefit is that it is obvious when you read the code what is being loaded and when. This allows you to optimise logic and take into account the data access. If an entity with lazy loading enabled is passed to a processing function, it may not be obvious at all that accessing a property is going to cause a database query, nor would it be aware that the query might fail.

    Explicit loading provides some certainty and explicit blocks of logic to wrap with exception handling logic. It also allows us to load multiple nested levels (perhaps with eager loading) and to apply filters. Explicit loading is generally used in controller or repo patterns where an entity reference is passed in to a method but it's related properties that the method needs are not loaded or are assumed to be not loaded.

    4. Projection Operations

    The biggest issue with all of the above data loading patterns in EF is that you are loading ALL of the fields from the root and related entities into the application logic layer... so into memory. You are breaking one of the cardinal rules of SQL Application design, SELECT * is the expensive operation that you want to avoid.

    If Peter Pan is correct in saying "Every time you say you dont believe in fairies, a fairy dies." then this is also true: "Every time you query SELECT *, a DBA dies"

    Filtering the rows you need is super important, but especially if you want to load many entities and many related properties into memory then we want to make sure we only bring back the specific properties that our logic needs.

    Projections can be into anonymous or concrete types and can be more complicated to think about in EF, but the concept is effectively the same as a DTO or ViewModel. We prepare the data into a singular form that only has the fields that are needed.

    Projections make the most sense for read-only logical processing of data but they require you to think about the data requirements of the logic up-front and can be restrictive for down stream processing. So Projections can hinder extreme agile programming.

    My personal design methodology is to encourage and prefer Eager Loading first, then once the logic has been tested and is stable, if performance needs to be addressed I convert that logical process to use Projections. Sometimes you can anticipate what fields will be required, but as a process for working in a larger team adding Projections later allows us to reach our milestones quicker with less regression.