Search code examples
linqef-core-6.0

Load only some elements of a nested collection efficiently with LINQ


I have the following LINQ query (using EF Core 6 and MS SQL Server):

var resultSet = dbContext.Systems
            .Include(system => system.Project)
            .Include(system => system.Template.Type)
            .Select(system => new
            {
                System = system,
                TemplateText = system.Template.TemplateTexts.FirstOrDefault(templateText => templateText.Language == locale.LanguageIdentifier),
                TypeText = system.Template.Type.TypeTexts.FirstOrDefault(typeText => typeText.Language == locale.LanguageIdentifier)
            })
            .FirstOrDefault(x => x.System.Id == request.Id);

The requirement is to retrieve the system matching the requested ID and load its project, template and template's type info. The template has multiple TemplateTexts (one for each translated language) but I only want to load the one matching the requested locale, same deal with the TypeTexts elements of the template's type.

The LINQ query above does that in one query and it gets converted to the following SQL query (I edited the SELECT statements to use * instead of the long list of columns generated):

SELECT [t1].*, [t2].*, [t5].*
FROM (
    SELECT TOP(1) [p].*, [t].*, [t0].*
    FROM [Systems] AS [p]
    LEFT JOIN [Templates] AS [t] ON [p].[TemplateId] = [t].[Id]
    LEFT JOIN [Types] AS [t0] ON [t].[TypeId] = [t0].[Id]
    LEFT JOIN [Projects] AS [p0] ON [p].[Project_ProjectId] = [p0].[ProjectId]
    WHERE [p].[SystemId] = @__request_Id_1
) AS [t1]
LEFT JOIN (
    SELECT [t3].*
    FROM (
        SELECT [t4].*, ROW_NUMBER() OVER(PARTITION BY [t4].[ReferenceId] ORDER BY [t4].[Id]) AS [row]
        FROM [TemplateTexts] AS [t4]
        WHERE [t4].[Language] = @__locale_LanguageIdentifier_0
    ) AS [t3]
    WHERE [t3].[row] <= 1
) AS [t2] ON [t1].[Id] = [t2].[ReferenceId]
LEFT JOIN (
    SELECT [t6].*
    FROM (
        SELECT [t7].*, ROW_NUMBER() OVER(PARTITION BY [t7].[ReferenceId] ORDER BY [t7].[Id]) AS [row]
        FROM [TypeTexts] AS [t7]
        WHERE [t7].[Language] = @__locale_LanguageIdentifier_0
    ) AS [t6]
    WHERE [t6].[row] <= 1
) AS [t5] ON [t1].[Id0] = [t5].[ReferenceId]

which is not bad, it's not a super complicated query, but I feel like my requirement can be solved with a much simpler SQL query:

SELECT * 
FROM [Systems] AS [p]
JOIN [Templates] AS [t] ON [p].[TemplateId] = [t].[Id]
JOIN [TemplateTexts] AS [tt] ON [p].[TemplateId] = [tt].[ReferenceId]
JOIN [Types] AS [ty] ON [t].[TypeId] = [ty].[Id]
JOIN [TemplateTexts] AS [tyt] ON [ty].[Id] = [tyt].[ReferenceId]
WHERE [p].[SystemId] = @systemId and tt.[Language] = 2 and tyt.[Language] = 2

My question is: is there a different/simpler LINQ expression (either in Method syntax or Query syntax) that produces the same result (get all info in one go) because ideally I'd like to not have to have an anonymous object where the filtered sub-collections are aggregated. For even more brownie points, it'd be great if the generated SQL would be simpler/closer to what I think would be a simple query.


Solution

  • Is there a different/simpler LINQ expression (...) that produces the same result

    Yes (maybe) and no.

    No, because you're querying dbContext.Systems, therefore EF will return all systems that match your filter, also when they don't have TemplateTexts etc. That's why it has to generate outer joins. EF is not aware of your apparent intention to skip systems without these nested data or of any guarantee that these systems don't occur in the database. (Which you seem to assume, seeing the second query).

    That accounts for the left joins to subqueries.

    These subqueries are generated because of FirstOrDefault. In SQL it always requires some sort of subquery to get "first" records of one-to-many relationships. This ROW_NUMBER() OVER construction is actually quite efficient. Your second query doesn't have any notion of "first" records. It'll probably return different data.

    Yes (maybe) because you also Include data. I'm not sure why. Some people seem to think Include is necessary to make subsequent projections (.Select) work, but it isn't. If that's your reason to use Includes then you can remove them and thus remove the first couple of joins.

    OTOH you also Include system.Project which is not in the projection, so you seem to have added the Includes deliberately. And in this case they have effect, because the entire entity system is in the projection, otherwise EF would ignore them.

    If you need the Includes then again, EF has to generate outer joins for the reason mentioned above.

    EF decides to handle the Includes and projections separately, while hand-crafted SQL, aided by prior knowledge of the data could do that more efficiently. There's no way to affect that behavior though.