Search code examples
sql-serverperformanceentity-frameworklinq-to-sqllinq-to-entities

Linq to entites (EF6) return latest records of each group using Row number in T-SQL server


I'm using Linq to entities to get most recent updated record of each group. But actually when I checked in sql profiler my Ling query generated many sub-query so that It really take too much time to complete. To solve this performance problem, I already wrote native T-Sql mentioned below so that I'm looking for solution to use Linq query that entity framework is generating the same my query using (ROW_NUMBER() OVER(PARTITION BY ...) . Below is my sample data :

  • Parent and Child tables:

Parent child relationship

  • Sample data of Parent and Child tables:

Sample data of Parent  and Child tables

  • Below is my query result:

My query result

TSQL query:

WITH summary AS (
SELECT  a.ParentId 
       ,a.Name
       ,a.Email
       ,p.Created 
       ,p.[Status], 
       ROW_NUMBER() OVER(PARTITION BY p.ParentId
                             ORDER BY p.Created DESC) AS rk
  FROM Parent a
    LEFT JOIN Child p
        ON a.ParentId = P.ParentId
    )
SELECT s.*
FROM summary s
WHERE s.rk = 1

My sample C# using Linq:

using (DbContext context = new DbContext())
{
  return context.Parents.Where(p => p.ParentId == parentId)
                .Include(a => a.Childs)
                .Select(x => new ObjectDto()
                {
                  ParentId = x.ParentId,
                  Status = x.Childs.OrderByDescending(a => a.Created).FirstOrDefault(p => p.ParentId).Status,
                  ChildName = x.Childs.OrderByDescending(a => a.Created).FirstOrDefault(p => p.ParentId).ChildName
                })
                .ToList();
}

Solution

  • There are a couple of things to improve your C# query:

    using (DbContext context = new DbContext())
    {
        return context.Parents.Where(p => p.ParentId == parentId)
                    .Include(a => a.Childs)
                    .Select(x => new ObjectDto()
                    {
                      ParentId = x.ParentId,
                      Status = x.Childs.OrderByDescending(a => a.Created).FirstOrDefault(p => p.ParentId).Status,
                      ChildName = x.Childs.OrderByDescending(a => a.ChildName).FirstOrDefault(p => p.ParentId).ChildName
                    })
                   .ToList();
    }
    

    Firstly the Include call does nothing here as you're not just returning EF entities (and thus the lazy loading semantics don't apply).

    Secondly: avoid the repeated subqueries with a let clause.

    (Also the lambda passed to FirstOrDefault must be an error as it takes a Func<T, bool> which that isn't.)

    Thus

    using (DbContext context = new DbContext()) {
        return await (from p in context.Parents
                      where p.ParentId == parentId
                      let cs = p.Childs.OrderByDescending(a => a.Created).FirstOrDefault()
                      select new ObjectDto {
                           ParentId = p.ParentId,
                           Status = cs.Status,
                           ChildName = cs.ChildName                      
                      }).ToListAsync();
    }
    

    Otherwise it looks reasonable. You would need to look at the generated query plan and see what you can do with respect to indexing.

    If that doesn't work, then use a stored procedure where you have full control. (Mechanical generation of code – without a lot of work in the code generator and optimiser – can always be beaten by hand written code).