Search code examples
sqlsql-serversql-execution-planintersectexcept

Compare (estimated) query/solution perfomance. Return differences between two tables (heaps)


Two queries provide the same result.

How to define the query with better performance:

a. before execution

b. after execution

?

Example 1: following queries return difference between two heaps

1st approach:

(   SELECT * FROM TABLE1    EXCEPT
    SELECT * FROM TABLE2
)   
UNION ALL
(   SELECT * FROM TABLE2    EXCEPT
    SELECT * FROM TABLE1
) 
;

2nd approach (and 3rd with UNION ALL):

(   SELECT * FROM TABLE1    UNION --ALL --?!
    SELECT * FROM TABLE2
)   
EXCEPT
(   SELECT * FROM TABLE2    INTERSECT
    SELECT * FROM TABLE1
) 
;

Which approach is better?

Or it depends (Advantages/Disadvantages)?

Example 2: Almost the same, but in addition returns source table column

1st approach:

SELECT 'TABLE1-ONLY' AS SRC, T1.*
FROM (
      SELECT * from TABLE1
      EXCEPT
      SELECT * from TABLE2
      ) AS T1
UNION ALL
SELECT 'TABLE2-ONLY' AS SRC, T2.*
FROM (
      SELECT * from TABLE2
      EXCEPT
      SELECT * from TABLE1
      ) AS T2
;

2nd approach (and 3rd with UNION ALL):

(   SELECT SRC='TABLE1-ONLY', * FROM TABLE1    UNION --ALL --?!
    SELECT SRC='TABLE2-ONLY', * FROM TABLE2
)   
EXCEPT
(   SELECT * FROM (select SRC='TABLE1-ONLY' UNION ALL 
                   select SRC='TABLE2-ONLY'
                  ) s 
                 ,(SELECT * FROM TABLE2     INTERSECT
                   SELECT * FROM TABLE1
                  ) i
);

Which approach is better?

Or it depends (Advantages/Disadvantages)?

NB! Following questions are out of this topic:

  1. How to define Primary Key for the table
  2. How to identify duplicates in the table

P.S.

  1. The PRIMARY question is about performance comparison.
  2. Solution improvement is secondary.
  3. set statistics io on; are almost the same.
  4. set statistics time on; could provide different results after each run.
  5. Don't see a big difference in execution plans (but not professional in this field).
  6. I have a feeling of queries performance difference, but not knowledge. If someone has knowledge please share. ;)
  7. Could you please specify scenarios when one solution is definitely better than the other one and vice versa?
  8. Regarding 2nd vs 3rd approach: where duplicate removal works faster - within UNION or within EXCEPT/INTERSECT? And if duplicates are removed during UNION, will EXCEPT/INTERSECT search for it again?

Solution

  • You should test on your data.

    But absent other information, I would expect the first to be better. Each is doing three set operations on the entire table. In the first, the "intermediate" tables are smaller, so I would expect better performance.

    That is, UNION/UNION ALL is going to create a bigger table (in the second solution) which then needs to be processed.

    Note that in SQL Server, EXCEPT and INTERSECT also remove duplicates, which might add additional overhead.