Search code examples
sqlsql-servercommon-table-expressionquery-performance

"With" command slower than using temporary table Select * into #table1 from


Why is the following query a lot faster?

Query 1:

    select distinct ID mid
INTO #t1
from A_Position a where a.situationdate=@SituationDate and 
    a.Portfolio_Name=@portfolio and a.Purpose=@purpose and ID!='TOTAL'

select distinct ID gid 
INTO #t2
from B_Position a where a.situationdate=@SituationDate and 
    a.Purpose=@purpose

select @check=COUNT(mid) from #t1 A INNER JOIN #t2 B ON A.mid = 
    B.gid

Query 1 is much faster than query 2.

Query 2:

;With 
A as (
select distinct ID mid 
    from A_Position a where 
    a.situationdate=@SituationDate and a.Portfolio_Name=@portfolio and 
    a.Purpose=@purpose and ID!='TOTAL'), 

    B as(
select distinct ID gid 
    from B_Position a where 
    a.situationdate=@SituationDate and a.Purpose=@purpose)

select @check=COUNT(mid) from A INNER JOIN B ON A.mid = 
    B.gid

Query 3:

select @check=COUNT(*)  
        from (
    select distinct ID mid 
        from A_Position a where a.situationdate=@SituationDate and 
        a.Portfolio_Name=@portfolio and a.Purpose=@purpose and 
        ID!='TOTAL') A
    inner join (  select distinct ID gid 
        from B_Position a where 
        a.situationdate=@SituationDate and a.Purpose=@purpose) B on mid=gid

Basically, all three queries have the same result but query 1 only takes 1-2 seconds to execute. Query 2 or 3 on the other hand take more than 10 minutes to execute. Why is there such a huge difference in the way the code has been written? (Why is "With" slower)


Solution

  • This is an optimization issue. If you looked at the execution plans, you would see why one of them is so much faster than the others.

    First, the second two are the same. Representing subqueries as CTEs or subqueries does not change the execution plan in SQL Server.

    Why is the temporary table version faster? The simple answer is that it gets a better execution plan.

    But that begs the question. The reason is because of the algorithm used for joining the two tables. In the CTE/subquery versions, SQL Server has to guess how many rows are generated. Based on this number, it chooses what it thinks is the best algorithm.

    In the temporary table version, the data is already in a table, so SQL Server does not need to guess.

    So, temporary tables can produce better execution plans. Let me caveat a few things though. There is more overhead to using temporary tables -- the data actually needs to be stored somewhere. It also limits the possibilities for optimization (which happens to be good in this case, but may not be in other cases).

    You should be able to add a hint to speed the other versions. I would guess something like OPTION (HASH JOIN).

    You might also be able to set up indexes to optimize all three versions.