Search code examples
asp.net-corecachingentity-framework-coreef-core-6.0ncache

How does caching work in Entity Framework?


I see tons of posts about people struggling to get EF to NOT send cached data. I am sitting here wondering how they are getting it to send cached data...

Here are the details: ASP.NET Core latest version on .NET 6.0 using Entity Framework Core 6.0.6. Testing the controller methods via Swagger site page. DbContext is scoped and dependency injected. AsNoTracking is NOT being used.

The method is pretty simple _DbContext.Set.ToListAsync().

  1. I call that method.
  2. Directly update a value in the database on one of the records previously returned from that method (via Sql Management Studio, not this same service).
  3. Call the method again.

I would expect the 2nd call to have the "stale" data aka the same data as the first call. However, it has my updated data from the database!? The 2nd call is much faster which I believe is because EF has cached my query and metadata, but it doesn't appear to be caching the actual results of the query.

Based on my tests it looks like it would only cache the results in the same request. Once the request is completed, DbContext is disposed, then the results cache is gone. Correct? So, if I want to persist the results cache across different web requests then I would have to implement that caching myself either via NCache, some other package, or writing the code myself?


Solution

  • Entity Framework relies on the mechanism of tracking (see this and this). It is not a caching per se but it can have similar effects though one the main goals is to track the changes so they can be propagated to the database.

    Change tracking is "tied" to the concrete instance of the context (i.e. is not shared across different instances/globally). One of the important capabilities of the change tracking is identity resolution:

    Since a tracking query uses the change tracker, EF Core will do identity resolution in a tracking query. When materializing an entity, EF Core will return the same entity instance from the change tracker if it's already being tracked. If the result contains the same entity multiple times, you get back same instance for each occurrence. No-tracking queries don't use the change tracker and don't do identity resolution. So you get back a new instance of the entity even when the same entity is contained in the result multiple times.

    Which can lead to stale data in case if the database was updated "outside" between the requests in the same context:

    using (var ctx1 = new AppContext())
    {
        var entity = await ctx1.SomeEntity
            .Where(e => e.Id == 1)
            .FirstOrDefaultAsync();
    
        using (var ctx2 = new AppContext())
        {
            var theSameEntity = await ctx2.SomeEntity
                .Where(e => e.Id == 1)
                .FirstOrDefaultAsync();
            theSameEntity.SomeTextField = "Updated"; // assuming it had another value
            await ctx2.SaveChangesAsync();
        }
    
        var entity1 = await ctx1.SomeEntity
            .Where(e => e.Id == 1)
            .FirstOrDefaultAsync()
        var referenceEquals = object.ReferenceEquals(entity, entity1); // True
        var field = entity1.SomeTextField; // field will have the original value, not "Updated"
    }
    

    The 2nd call is much faster which I believe is because EF has cached my query and metadata

    There are multiple potential reasons for it:

    • If the first query was the first time this query was executed in this app run then EF Core can cache the query translating results to be reused in subsequent queries (across the context instances). Also current iteration of EF will not recreate and remap fetched data if entity is already tracked. Not sure if I remember correctly but previous iterations of EF/ EF Core potentially could have completely skip querying the database at all if already tracked entity was requested, but I can be wrong here.

    There are also several factors affecting "startup time" i.e. the time to perform the first operation on a DbContext when that DbContext type is used for the first time in the application.

    • Model initialization - see the compiled models section of the docs:

      creating a DbContext instance does not cause the EF model to be initialized, typical first operations that cause the model to be initialized include calling DbContext.Add or executing the first query.

    • Connection management - EF Core will need to establish the connection to the database server which is then pooled and reused (a bit more - here), which in some cases can take considerable amount of time which then will not be applicable to subsequent queries (if the underlying physical connection pool has enough connections available).
    • JIT compilation - first time some method is called runtime needs to compile it, it should be negligible in context of working with EF Core (though in theory can be noticable in combination with EF Core compiled models) but still can be worth mentioning (obviously applied to first time some particular method is called).

    but it doesn't appear to be caching the actual results of the query.

    It is partly true and partly false at the same moment. As shown before - EF will not update tracked entity if the data actually has changed. But if data was removed EF will catch that:

    using (var ctx1 = new AppContext())
    {
        var entity = await ctx1.SomeEntity
            .Where(e => e.Id == 1)
            .FirstOrDefaultAsync();
    
        using (var ctx2 = new AppContext())
        {
            var theSameEntity = await ctx2.SomeEntity
                .Where(e => e.Id == 1)
                .FirstOrDefaultAsync();
            ctx2.DeliveryComment.Remove(theSameEntity); // remove
            await ctx2.SaveChangesAsync();
        }
    
        var entity1 = await ctx1.SomeEntity
            .Where(e => e.Id == 1)
            .FirstOrDefaultAsync()
        var isNull = entity1 == null; // True
    }