I have two tables called Employee (columns: Id, Name) and DataSource (columns: Id, EmployeeId, DataSourceName).
Each employee can be exported to zero or multiple datasources and imagine following situation:
Employee table
+----+-------------+
| Id | Name |
+----+-------------+
| 1 | Ivan |
| 2 | Adam |
+----+-------------+
DataSource table:
+----+---------------------------------+
| Id | EmplpoyeeId | DataSourceName |
+----+---------------------------------+
| 1 | 1 | Source1 |
| 2 | 1 | Source2 |
| 3 | 2 | Source2 |
+----+---------------------------------+
I need a query to determine which employee is not exported to 'Source1' (the result should be 'Adam' in this case as he is exported to 'Source2' only).
Tables Employee and DataSource can have large amount of records (thousands).
There are a few techniques to determine it and we need to find the one with best performance. There are few which came to my mind:
Left JOIN:
SELECT Employee.Id
FROM Employee
LEFT JOIN DataSource ON DataSource.EmployeeId = Employee.Id AND DataSource.DataSourceName = 'Source1'
WHERE DataSource.Id IS NULL
INNER SELECT:
SELECT Employee.Id
FROM Employee
WHERE NOT EXIST (SELECT NULL FROM DataSource WHERE DataSource.EmployeeId = Employee.Id AND DataSource.DataSourceName = 'Source1')
EXCEPTION:
SELECT Employee.ID
FROM Employee
EXCEPT
SELECT Employee.Id
FROM Employee
INNER JOIN DataSource ON DataSource.EmployeeId = Employee.Id AND DataSource.DataSourceName = 'Source1'
Before starting benchmarking them I'd like to ask whether there are more ways which I should take into account (and might perform well). Could you please share your ideas for the best performant query.
If you require further reading on the subject this article is good;
http://www.sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/
It suggests that the NOT EXISTS will perform better as it doesn't need to complete the full join (does an Anti-Semi join instead of a Semi Join);
"That’s the major difference between these two. When using the LEFT OUTER JOIN … IS NULL technique, SQL can’t tell that you’re only doing a check for nonexistance. Optimiser’s not smart enough (yet). Hence it does the complete join and then filters. The NOT EXISTS filters as part of the join."