Search code examples
c#entity-frameworklinqentity-framework-core

.ToList() is not saving virtual Icollection to memory, which is causing multiple calls to the db


The project I am currently working on requires lazy loading to be done. I am attempting to create a method that gets passed in a loaded list property. Instead, it seems that the list items are passed as a proxy instead of in-memory items. I need this list to be in-memory as I am unable to call the database from within this method.

 var test = loan.LoanStatuses.ToList();

            if (test.Any(ls => ls.Status.Code == CodeTypes.LoanStatusType.Settled && ls.Active))
                messages.Add("Loan has already been settled.");

looking at the output the first test variable generates one call to the DB through EF Core. However when test is acted on with .Any() more ef queries are created!

I am under the assumption toList(); creates one single query and then saves the result in memory under that test variable. why would EF continue to query on information its already got.


Solution

  • Lazy loading should be treated as a failsafe rather than relied on for loading data like this. The issue you will be running into will be this part of the statement:

    ls.Status.Code == CodeTypes.LoanStatusType.Settled
    

    This is due to the nature of how lazy loading works. say we loaded a loan like:

    var loan = context.Loans.Single(x => x.LoanId == loanId);
    

    this builds an SQL Statement to load the loan, nothing else is eager loaded. Later you do the following:

    var test = loan.LoanStatuses.ToList();
    

    Since we access the LoanStatuses collection on the loan for the first time, this will generate a second SQL statement effectively doing something like:

    SELECT * FROM LoanStatuses WHERE LoanId = @loanId;
    

    And that in itself is fine, it is just one extra query to get the LoanStatuses. However, lets say this loaded 10 LoanStatus records, IDs 1-10 all associated to our single Loan ID of 1. We now want to access the related Status entity on each LoanStatus. This will result in 10 lazy load calls to get the status for each LoanStatus we touch due to the `Any' call:

     SELECT * FROM Statuses WHERE StatusID = @loanStatus(1).StatusId;
     SELECT * FROM Statuses WHERE StatusID = @loanStatus(2).StatusId;
     SELECT * FROM Statuses WHERE StatusID = @loanStatus(3).StatusId;
     ...
     SELECT * FROM Statuses WHERE StatusID = @loanStatus(10).StatusId;
    

    If you're going to have business logic based on entities then you need the foresight in the code to ensure that all required information is eager loaded before calling code that is dependent on those related entities.

    For example, if you are going to load a loan and will need to determine logic based on it's associated statuses then:

    var loan = context.Loans
        .Include(x => x.LoanStatuses)
            .ThenInclude(x => x.Status)
        .Single(x => x.LoanId == loanId);
    

    From here passing the loan into the method will ensure that the applicable statuses are all eager loaded and nothing should need to lazy load. Lazy Loading is a safety net to ensure anything that might not have been loaded can be loaded rather than result in a NullReferenceException, but you should be profiling the database regularly for evidence of lazy loading getting tripped and working to remove any lazy loading calls. The problem with the approach of "just remember to eager load" is that this produces a Cartesian Product with the more relationships you decide you need to add to the eager loading. Gradually developers just start "including" everything all of the time and watch as performance starts to degrade back towards what it was when lazy loading everything.

    A better approach than passing entities around is to project the information your business logic will need at the time of reading from the DbContext so that EF can build an efficient query to ensure that all data needed, and only the data that is needed is retrieved from the database. For instance, if we need a few values from the loan, and a flag to indicate whether the loan is settled or not:

    var loanDetails = context.Loans
        .Where(x => x.LoanId == loanId)
        .Select(x => new LoanDetail
        {
            LoanId = x.LoanId,
            //  ... other details.
    
            IsSettled = x.LoanStatuses.Any(s => s.IsActive 
                && s.Status.Code == CodeTypes.LoanStatusType.Settled)
        )).Single();
    

    This loads just the information we need and this can safely be passed around to methods because it is clear from the definition of this DTO class what information is loaded. There is no risk of tripping lazy loading or situations where the DbContext might be disposed so information cannot be lazy loaded. Ideally entities should never be passed around beyond the scope of the DbContext that loaded them as this leads to all kinds of problems with stale data, lazy loading costs and issues with potentially multiple tracked references. This does come at a cost of needing to define a class structure for the fields the business logic is going to need, but this is the trade off. This structure should only need to change if the business logic it serves changes. It may be tempting to "just pass the entity since it has everything" but ensuring that "everything" is accessible attracts a significant cost.