Search code examples
sql-serverperformancedatabase-performancequery-performance

Find rows from table A without record in joined table B


I have two tables called Employee (columns: Id, Name) and DataSource (columns: Id, EmployeeId, DataSourceName).

Each employee can be exported to zero or multiple datasources and imagine following situation:

Employee table

+----+-------------+
| Id | Name        |
+----+-------------+
| 1  | Ivan        |
| 2  | Adam        |
+----+-------------+

DataSource table:

+----+---------------------------------+
| Id | EmplpoyeeId   | DataSourceName  |
+----+---------------------------------+
| 1  | 1             | Source1         |
| 2  | 1             | Source2         |
| 3  | 2             | Source2         |
+----+---------------------------------+

I need a query to determine which employee is not exported to 'Source1' (the result should be 'Adam' in this case as he is exported to 'Source2' only).

Tables Employee and DataSource can have large amount of records (thousands).

There are a few techniques to determine it and we need to find the one with best performance. There are few which came to my mind:

Left JOIN:

SELECT Employee.Id 
FROM Employee 
LEFT JOIN DataSource ON DataSource.EmployeeId = Employee.Id AND DataSource.DataSourceName = 'Source1'
WHERE DataSource.Id IS NULL

INNER SELECT:

SELECT Employee.Id
FROM Employee
WHERE NOT EXIST (SELECT NULL FROM DataSource WHERE DataSource.EmployeeId = Employee.Id AND DataSource.DataSourceName = 'Source1')

EXCEPTION:

SELECT Employee.ID 
FROM Employee

EXCEPT

SELECT Employee.Id 
FROM Employee 
INNER JOIN DataSource ON DataSource.EmployeeId = Employee.Id AND DataSource.DataSourceName = 'Source1'

Before starting benchmarking them I'd like to ask whether there are more ways which I should take into account (and might perform well). Could you please share your ideas for the best performant query.


Solution

  • If you require further reading on the subject this article is good;

    http://www.sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/

    It suggests that the NOT EXISTS will perform better as it doesn't need to complete the full join (does an Anti-Semi join instead of a Semi Join);

    "That’s the major difference between these two. When using the LEFT OUTER JOIN … IS NULL technique, SQL can’t tell that you’re only doing a check for nonexistance. Optimiser’s not smart enough (yet). Hence it does the complete join and then filters. The NOT EXISTS filters as part of the join."