Two queries provide the same result.
How to define the query with better performance:
a. before execution
b. after execution
?
Example 1: following queries return difference between two heaps
( SELECT * FROM TABLE1 EXCEPT
SELECT * FROM TABLE2
)
UNION ALL
( SELECT * FROM TABLE2 EXCEPT
SELECT * FROM TABLE1
)
;
2nd approach (and 3rd with UNION ALL
):
( SELECT * FROM TABLE1 UNION --ALL --?!
SELECT * FROM TABLE2
)
EXCEPT
( SELECT * FROM TABLE2 INTERSECT
SELECT * FROM TABLE1
)
;
Which approach is better?
Or it depends (Advantages/Disadvantages)?
Example 2: Almost the same, but in addition returns source table column
SELECT 'TABLE1-ONLY' AS SRC, T1.*
FROM (
SELECT * from TABLE1
EXCEPT
SELECT * from TABLE2
) AS T1
UNION ALL
SELECT 'TABLE2-ONLY' AS SRC, T2.*
FROM (
SELECT * from TABLE2
EXCEPT
SELECT * from TABLE1
) AS T2
;
2nd approach (and 3rd with UNION ALL
):
( SELECT SRC='TABLE1-ONLY', * FROM TABLE1 UNION --ALL --?!
SELECT SRC='TABLE2-ONLY', * FROM TABLE2
)
EXCEPT
( SELECT * FROM (select SRC='TABLE1-ONLY' UNION ALL
select SRC='TABLE2-ONLY'
) s
,(SELECT * FROM TABLE2 INTERSECT
SELECT * FROM TABLE1
) i
);
Which approach is better?
Or it depends (Advantages/Disadvantages)?
NB! Following questions are out of this topic:
P.S.
set statistics io on;
are almost the same.set statistics time on;
could provide different results after each run.UNION
or within EXCEPT
/INTERSECT
? And if duplicates are removed during UNION
, will EXCEPT
/INTERSECT
search for it again?You should test on your data.
But absent other information, I would expect the first to be better. Each is doing three set operations on the entire table. In the first, the "intermediate" tables are smaller, so I would expect better performance.
That is, UNION
/UNION ALL
is going to create a bigger table (in the second solution) which then needs to be processed.
Note that in SQL Server, EXCEPT
and INTERSECT
also remove duplicates, which might add additional overhead.