Search code examples
sql-serverperformanceentity-frameworkdatabase-performance

SQL Server - JOIN isn't using the optimal order in Entity Framework's generated query


I'm using Entity Framework to execute a rather complex query on my DB. The query consists of the .Union() of a couple of other IQueryables.

In an attempt to optimize said query, I've broken down the different sub-queries and tested them individually to measure performance and found out that one of them (let's call it joinRes) is quite slower than the others.

joinRes is a join between two (filtered) tables where one of the tables has a huge amount of records, while the other only has a couple of them.

var smallDataSet = RepositoryForSmallDataSet.Where(smDs => ... );

var joinRes = smallDataSet.Join(largeDataSet, 
                                 smDs => smDs.SomeID, 
                                 bigDs => bigDs.SomeID, 
                                 (smDs, bigDs) => bigDs)
                          .OrderByDescending(bigDs => bigDs.SomeDate);

// ... IQueryable joinRes will be used in some .Union() operations with other IQueryables into a largeQuery IQueryable...

var result = largeQuery.Take(10).ToList();

Looking at the actual execution plan in SQL Server, I can see that the Nested Loops step in the joinRes subquery isn't choosing the optimal order (smallDataSet first then largeDataSet). If I add a OPTION (FORCE ORDER) hint to the generated SQL, the sub query is much faster.

SQL Server Execution Plan join

The problem is that I can't seem to find a way to add that hint via Entity Framework, and simply moving the complex query (remember that joinRes is part of a large complex one) onto a stored procedure would be a major hassle (it's a heavily dynamically generated query, and it would likely require a lot of dynamic SQL).

Any suggestions on how to tackle this problem?

EDIT:

Evaldas Buinauskas put me on the right track with his answer. Poor performance was caused by Key Lookup as well as a bunch of other JOINS caused by a TPT inheritance mechanism (not mentioned in this question). Fixed the indexes (based on SQL Server's execution plan) and refactored inheritance to use a TPH approach instead.


Solution

  • Are your indexes right? Because Key Lookup indicates that your index doesn't cover join condition or it's outputting columns that are not included in your index.

    It's adviced to eliminate them (lookups). This is a very detailed article explaining how to do it.

    Key lookups occur when you have an index seek against a table, but your query requires additional columns that are not in that index. This causes SQL Server to have to go back and retrieve those extra columns.