Search code examples
sqlsql-serverperformancedatabase-performancesql-tuning

SQL Server query: Union vs Distinct union all performance


Does SQL have a difference in performance between these two statements?

SELECT distinct 'A' as TableName, Col1, Col2, Col3 FROM A 
UNION ALL 
SELECT distinct 'B' as TableName, Col1, Col2, Col3 from B

versus

SELECT 'A' as TableName, Col1, Col2, Col3 FROM A 
UNION 
SELECT 'B' as TableName, Col1, Col2, Col3 from B

The difference between this and similar questions such as UNION vs DISTINCT in performance is that I can confirm ahead of time that the individual tables I'm using won't have any duplicate records between them, only within the individual tables.

The execution plans look the same to me, in that it sorts the individual tables before concatenating. However, if I remove the scalar from them both, the plan of the UNION ALL stays basically the same but the UNION changes to concatenating before the distinct. I'll be concatenating about 20 tables together, and it's not clear whether doing 20 individual DISTINCTs is faster than doing one large DISTINCT at the end, since I can still confirm that the tables would not share any duplicates between them (only within the same table).


Solution

  • DISTINCT is not necessarily implemented by sort, it can also be implemented by hashing.

    Both of these are memory consuming operations and reducing the size of data being distinctified can help reduce the amount of required memory which is good for concurrency.

    The algorithmic complexity of sorting is n log n, meaning that the work required grows linearitmically as n grows. On that basis sorting 10 smaller sets of size s should generally be fast than sorting one larger set of size 10*s.