Search code examples
sql-serversql-order-byquery-performancesql-execution-plan

SQL Server Query - ORDER BY killing query performance on small result set


I have a query in SQL Server 2008 R2 in the following form:

SELECT TOP (2147483647) *
FROM (
    SELECT *
    FROM sub_query_a
) hierarchy
LEFT JOIN (
    SELECT *
    FROM sub_query_b
) expenditure
ON hierarchy.x = expenditure.x AND hierarchy.y = expenditure.y
ORDER BY hierarchy.c, hierarchy.d, hierarchy.e

The hierarchy subquery contains UNIONS and INNER JOINS. The expenditure subquery is based on several levels of sub-subqueries, and contains UNIONS, INNER and LEFT JOINS, and ultimately, a PIVOT aggregate.

The hierarchy subquery by itself runs in 2 seconds and returns 467 rows. The expenditure subquery by itself runs in 7 seconds and returns 458 rows. Together, without the ORDER BY clause, the query runs in 11 seconds. However, with the ORDER BY clause, the query runs in 11 minutes.

The Actual Execution Plan reveals what's different. Without the ORDER BY clause, both the hierarchy and expenditure subqueries are running once each, with the results being Merge Join (Right Outer Join) joined together. When the ORDER BY clause is included, the hierarchy query is still run once, but the expenditure portion is run once per row from the hierarchy query, and the results are Nested Loops (Left Outer Join) joined together. Its as if the ORDER BY clause is causing the expenditure subquery to become a correlated subquery (which it is not).

To verify that SQL Server was actually capable of doing the query and producing a sorted result set in 11 seconds, as a test, I created a temp table and inserted the results of the query without the ORDER BY clause into it. Then I did a SELECT * FROM #temp_table ORDER BY c, d, e. The entire script took the expected 11 seconds, and returned the desired results.

I want to make the query work efficiently with the ORDER BY clause as one query--I don't want to have to create a stored procedure just to enable the #temp_table hacky solution.

Any ideas on the cause of this issue, or a fix?


Solution

  • Thanks to @MartinSmith's comment, I got looking at what could cause the major discrepancies between the estimated and actual rows delivered by the expenditure subquery in the non-ORDER BY version, even though I eventually wanted to ORDER it. I thought that perhaps if I can optimize that version a bit, perhaps that would also benefit the ORDER BY version as well.

    As I mentioned in the OP, the expenditure subquery contains a PIVOT aggregation across yet another subquery (let's call it unaggregated_expenditure). I added a layer between the PIVOT and the unaggregated_expenditure subquery which aggregated the required column before PIVOTing the same column across the required few pivot columns. This added a bit of conceptual complexity, yet was able to reduce the estimated number of rows coming from the PIVOT from 106,245,000 to 10,307. This change, when applied to the ORDER BY version of the whole query, resulted in a different Actual Execution Plan that was able to process and deliver the query within the desired 11 seconds.