Search code examples
c#entity-framework-coreazure-sql-databaseef-core-8.0

How to reduce the no. of columns on join when accessing a navigation property?


I have an ASP.NET Core Web API based project with Entity Framework Core and an Azure SQL database.

Here are my data models, removed extra properties: I have two tables in the database context, Parent and Children, with a parent-children relationship. I have give easy names for the type to explain the problem:

[Table("Parent")]
public record Parent
{
    [Key]
    public string ParentId { get; set; }

    public virtual ICollection<Children1> Children1s { get; set; }

    // other properties
}

[Table("Children1")]
public record Children1
{
    [Key]
    public string Children1Id { get; set; }

    [ForeignKey("ParentId")]
    public virtual Parent Parent { get; set; }

    public virtual ICollection<Children2> Children2s { get; set; }

    // many properties
}

I'm trying to get all of the Parent items and linked Children1 item where Children1. Plus, I'd also like to fetch Children2s for Children1.

I'd like to get the children1 item among other children1s for each parent ordered by desc on a date (e.g. joining date, joined last/recent).

Repository code in a method using projection via Select:

var query = context.Parents
            .AsNoTracking()
            .Select(p => new
            {
                p.ParentId,
                p.Description,
                Children1s = d.Children1s
                .OrderByDescending(c => c.JoiningDate)
                .Take(1)
                .Select(c => new
                {
                    c.ChildrenId,
                    c.Description
                    // other properties
                }).ToList()
            });

Repository code in a method using Include and ThenInclude:

var result = context.Parents
            .AsNoTracking()
            .Include(d => d.Children1s
                .OrderByDescending(l => l.JoiningDate)
                .Take(1))
            .ThenInclude(l => l.Children2s)
            .ToList();

Plus, I'm also using UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery); at global level.

My database has about 20,000 records for Parent, 45,000 records for Children1 and roughly 20,000 for Children2.

I also have indexes for each table and field that I'm using for where, orderby calls.

The Children1 has roughly 200 columns, this is an old table and I can't change the schema.

When I look at the generated SQL for both of the options ie. via projection and Include, although I use select to target just the columns I need but the generated SQL has all of the columns from Children1 for joining - LEFT JOIN.

Due to this issue, the query in Azure SQL database is very slow.

How can I control that EF Core uses only some columns for LEFT JOIN to speed up the query? Is there any other alternative to achieve this with/without navigation property?

Seems like the navigation property is good for 1:1 relation but the operation that I need to do on 1:n relationship is slow due to LEFT JOIN SQL EF Core generates.

I have tried various ways to project by using Select but I'm not able to reduce the no. of columns for joining. While I was researching online, I got to know about Single query vs Split query. So, I'm using the latter now which has improved the performance little bit.


Solution

  • The first consideration would be the reason for fetching all parents at once. Whether displaying information or even to iterate through data to process you should consider paginating the data so that you are fetching only a limited number of top-level parent records at a time, via Skip and Take.

    If you are leveraging Split Queries that will help, but you only need a few columns from Children1 and Children2 then you can also leverage projection to just fetch those details. For example:

    var query = context.Parents
            .AsNoTracking()
            .Select(p => new
            {
                p.ParentId,
                p.Description,
                Child1 = d.Children1s
                .OrderByDescending(c => c.JoiningDate)
                .Select(c => new
                {
                    c.ChildrenId,
                    c.Description
                    // other properties
                    Children2 = c.Children2.Select(c2 => new 
                    {
                       c2.Children2Id,
                       c2.Description,
                       // other properties
                    }).ToList()
                }).FirstOrDefault()
            });
    

    This assumes you want just the most recent Children1 record, and then select a set of properties from the related Children2 of that child. If you intend to send this data to a view then I would recommend populating a ViewModel class with the details rather than the anonymous types. (new { }}

    Further investigation into performance cost would involve running the query through profiling tools for inspecting the execution plan and potential indexing improvements. I wouldn't normally get too hung up on what the SQL looks like in terms of the joins it uses. EF is usually pretty good at composing effective queries but it can get hung up on parameter sniffing or stale execution plans. These usually show up when you execute the resulting SQL and it's reasonably fast but EF is significantly slower to fetch the data.

    The other key factor that can affect performance is mixing projections with entities. For instance you need to be careful if doing something like:

            .Select(p => new
            {
                p.ParentId,
                p.Description,
                Child1 = d.Children1s
                .OrderByDescending(c => c.JoiningDate)
                .Select(c => new
                {
                    c.ChildrenId,
                    c.Description
                    // other properties
                    Children2 = c.Children2
                }).FirstOrDefault()
            });
    

    Here not only would the Child.Children2 fetch all columns from Children2, but if serializing and Child2 has navigation properties, if lazy loading is enabled then this can introduce extra Select N+1 costs.

    Edit: I have checked with both AsSplitQuery() and with the UseQuerySplittingBehaviour and bot resulted in the projections being respected. Running the following test:

    var data = context.Projects.Select(x => new
    {
        x.ProjectId,
        x.Name,
        Jobs = x.Jobs.Select(j => new { j.Name })
    }).AsSplitQuery()
    .ToList();
    

    the generated SQL was two statements:

    SELECT [p].[ProjectId], [p].[Name]
    FROM [Projects] AS [p]
    ORDER BY [p].[ProjectId]
    
    SELECT [j].[Name], [p].[ProjectId]
    FROM [Projects] AS [p]
    INNER JOIN [Jobs] AS [j] ON [p].[ProjectId] = [j].[ProjectId]
    ORDER BY [p].[ProjectId]
    

    It did not return all columns from either the parent (Project) or the child (Job).

    The updated example you provided:

    var query = context.Parent             
        .AsNoTracking()             
        .Select(d => new             
        {                 
            d.ParentId,
            d.Description,
            children = d.Children.Select(l => new
            {                     
                l.ChildrenId,
                l.Type
            })             
        });
    

    ... This statement does not yet execute a query as it lacks a .ToList() or .First() etc. so it wouldn't result in SQL being generated yet.

    To help narrow down what is going on, add a .First() to that test line and set a breakpoint on that line. Run a profiler to capture the SQL then run your code up to that line with a debugger, stopping on that breakpoint. Next, flush your profiler trace to clear the window and step over that single line to capture the SQL executed for that statement. You should see only two SQL Statements being run and that the projections are being preserved. Also be sure that this is while connected to a database and not a substitute like an in-memory database provider. If there are additional SQL statements appearing after you continue execution then you have other code/operations running that are ignoring your projection code and fetching entire entities.