c#.net entity-framework linq lazy-loading

C# : should I use lazy loading for every related property?

I was doing research about lazy loading. I couldn't understand how lazy loading works, should I do anything to change it?

Here is my first case: I have User, User roles and Roles classes. A user has many-to-many relationship with roles. So I created user roles class to handle that relationship. User doesn't include roles for all case (don't mind Id in UserRole).

User

public class User : IUser
{
    public bool IsActive { get; set; }
    public int Id { get; private set; }
    public string Email { get; set; }
    public byte[] PasswordHash { get; set; }
    public byte[] PasswordSalt { get; set; }
    public string? Phone { get; set; }

    public List<UserRole> UserRoles { get; set; }
}

UserRole

public class UserRole : IEntity
{
    public int Id { get; private set; }

    [ForeignKey(nameof(User))]
    public int UserId { get; set; }
    public User User { get; set; }

    [ForeignKey(nameof(Role))]
    public int RoleId { get; set; }
    public Role Role { get; set; }
}

Role

public class Role : IEntity
{
    public int Id { get; private set; }
    public string Name { get; set; }
}

Here is my second case: I have an Article class which has a relationship with User. In this case, Article using Creator.Email (User) or creator name for any case which means including User for any query using Article.

Article

public class Article : IEntity
{
    public Article() => CreatedAt = DateTime.Now;

    public bool IsDeleted { get; set; }
    public int Id { get; private set; }
    public string Title { get; set; } = "Başlık";
    public string Content { get; set; } = "İçerik";
    public DateTime CreatedAt { get; set; }
    public DateTime? UpdatedAt { get; set; }
    [ForeignKey(nameof(User))] public int CreatorId { get; set; }
    [ForeignKey(nameof(User))] public int? DeletedBy { get; set; }

    public User? Creator { get; set; }
    public List<ArticleCategory> ArticleCategories { get; set; }
}

So my question is: should I do anything for lazy loading these related properties?

If yes - what should I do? If not - how does this work with EF Core?

For User example, is it returning data from the database and keeps it in memory until it's requested, or it doesn't go UserRoles and Roles table till it is requested?

If neither, how is it works actually ?

Here is example how I can fetch all data from the database in a very simple way:

 var users = context.Users
                    .Include(u => u.UserRoles)
                    .ThenInclude(ur => ur.Role);

In some cases, even if I don't request the user's roles, will they be fetched from the db, and if so, will they be kept in memory?

Solution

Lazy loading in EF gets a fairly bad wrap because when you don't understand what is going on and what EF is doing behind the scenes you can find your code relying pretty heavily on lazy loading, which results in what can be a pretty significant performance problem.

The two flip-sides of loading related data in EF is lazy loading and eager loading. Eager loading is done with Include, while lazy loading is commonly done with proxies, though EF Core offers a second alternative where your own code takes control for managing lazy requests. Each option comes with potential pitfalls to consider.

Lazy loading is a form of deferred querying. Say you want to load a list 100 Posts and each post can have 0-to-many Comments within something like a WPF application. You want to just list the Posts so you fetch that from dbContext.Posts and display them, then when you select to expand one of the Posts, you want to show its Comments. Eager Loading would load all comments for all posts up front would be pretty excessive, so Lazy loading in this case would only fetch the Comments for each post if and when they were expanded. This is done entirely behind the scenes by EF. Where this falls down and bites you is when you unintentionally, or mistakenly introduce code that "touches" lazy loaded properties. Take that example loading Posts, and you decide you want to add a column to display the Comment count using post.Comments.Count. As your rendering code iterates through each Post, it accesses the Comments, which triggers a lazy load for each and every Post. You might not notice it at development time with a test database, but in production as the system grows, it quickly starts to get noticed. This can be seen in a DB Profiler as you'd see a slew of queries being sent to the database. So for instance if you fetched 100 posts using a given date range you would see queries like:

SELECT * FROM Posts WHERE PostDate >= @p0 AND PostDate < @p1
SELECT * FROM Comments WHERE PostId = @p0 // Post ID #1
SELECT * FROM Comments WHERE PostId = @p0 // #2
SELECT * FROM Comments WHERE PostId = @p0 // #3
SELECT * FROM Comments WHERE PostId = @p0 // #4
SELECT * FROM Comments WHERE PostId = @p0 // #5
... 100 times

100 SELECT statements to get Comments for each individual Post. This is bad, and compounds with the more references that get "touched". This is the dreaded SELECT N+1.

Often when stung by lazy loading like this, the solution that is recommended is to use eager loading:

var posts = dbContext.Posts
    .Include(x => x.Comments)
    .Where(x => x.PostDate >= startDate && x.PostDate < endDate)
    .ToList();

This will load those 100 posts with their comments in a single SQL SELECT statement again. Something like:

SELECT * FROM Posts p 
INNER JOIN Comments c ON c.PostId = p.PostId
WHERE p.PostDate >= @p0 AND p.PostDate < @p1

However this introduces a potential eager loading performance pitfall, the Cartesian Product. Now in a simple example like this, there isn't necessarily a reason to be concerned, but in cases where you have several Include and nested ThenInclude things can get out of hand pretty quickly, and EF Core will warn you about it. The above example with eager loading will produce a query that does a JOIN between Posts and Comments. In order to load all posts and comments in a single query, the query has to return all columns from both tables. When loading just Posts, the total data size returned is:

Posts.Columns x Posts.Rows

When Joining on Comments, the total data size returned is:

(Posts.Columns + Comments.Columns) x Comments.Rows

The query returns a row for each comment. If each Post had an avg of 5 Comments, to get our 100 Posts, we would be returning 500 Rows, where each row has all columns from both the Comment and its Post. This can easily amount to a LOT more data coming back in a query as you join entities.

EF Core offers a solution to this in the form of AsSplitQuery() which works by running separate queries for the Posts and the Comments so you would end up with something like:

SELECT * FROM Posts WHERE PostDate >= @p0 AND PostDate < @p1
SELECT c.* FROM Comments c
INNER JOIN Posts p ON p.PostId = c.PostId
WHERE p.PostDate >= @p0 AND p.PostDate < @p1

That second SELECT looks a lot like the eager loading one, but it is just returning the columns from Comments. This can work in most cases, but EF still needs to stitch all of these related entities together which can take some time, and it does have limitations. Things like sorting and pagination, and summarizing related data can be problematic when using split queries, so it is a tool that can help, but should not be relied on as a crutch.

Going back to the example where we want a Comment count... Fetching all comments just to get a count reliably is a big waste. Even if we want a few details from related entities, eager loading those entire entities can be a waste of time and memory. This is where Projection comes in to be more selective with the data coming back. Projection is where you look at what data you actually need and use Select to populate an object to serve that need rather than returning entities. The advantages of projection is that EF can produce more efficient queries and avoids bloating up the tracking cache with tracked instances. So for instance if we want to display a list of Posts with their title, post date, author name, and comment count we might create a PostSummaryViewModel like:

public class PostSummaryViewModel
{
    public int PostId { get; set; } 
    public string Title { get; set; }
    public DateTime PostDate { get; set; }
    public string AuthorName { get; set; }
    public int CommentCount { get; set; }
}

then to read it:

var posts = dbContext.Posts
    .Where(x => x.PostDate >= startDate && x.PostDate < endDate)
    .Select(x => new PostSummaryViewModel
    {
        PostId = x.PostId,
        Title = x.Title,
        PostDate = x.PostDate,
        AuthorName = x.Author.FirstName + " " + x.Author.LastName,
        CommentCount = x.Comments.Count()
    }).ToList();

The result will be that EF generates a single query to fetch the requested data automatically joining on tables as needed but only returning the columns needed. Note that we reference the Comments and an Author reference from our Posts, but don't need to Include them, EF resolves the references automatically through those navigation properties. This does mean though that if we want to implement logic that does things like expanding a post loads its comments, or displays more information, we will need a separate call to fetch that data from the DbContext, so we cannot as easily rely on that lazy loading crutch. But overall the benefits should outweigh that loss in convenience.

Lazy loading, eager loading, and projection are all tools available in EF that all offer benefits in certain situations, but also costs to consider.