Let's start with the following two tables:
CREATE TABLE people AS (
SELECT * FROM VALUES
('david', 10),
('george', 20)
AS tmp(name, age)
);
CREATE TABLE position AS (
SELECT * FROM VALUES
('george', 'c++'),
('george', 'frontend')
AS tmp(name, job)
);
Is the following two equivalent ways to write the SEMI JOIN
or ANTI JOIN
pattern? If not, what is missing?
[SEMI-JOIN == WHERE EXISTS]
SELECT * FROM people SEMI JOIN position ON (people.name=position.name)
SELECT * FROM people WHERE EXISTS (SELECT * FROM position WHERE people.name=position.name)
// or like this?
SELECT * FROM people WHERE name IN (SELECT name FROM position)
[ANTI-JOIN == WHERE NOT EXISTS]
SELECT * FROM people ANTI JOIN position ON (people.name=position.name)
SELECT * FROM people WHERE NOT EXISTS (SELECT * FROM position WHERE people.name=position.name)
// or like this?
SELECT * FROM people WHERE name NOT IN (SELECT name FROM position)
In each example (semi and anti joins), the following pair of options are not the same because of the SELECT *
at the start - which will get all columns from all the joined tables.
If you used SELECT people.*
as the start, they are functionally the same as far as I can see.
In SQL Server at least, the bottom two options of each have identical plans in an example I tried (SQL Server referred to them as Left semi join
and Left anti semi join
respectively).
Edit: ANTI JOIN Note that a more analagous way may be to use LEFT OUTER JOIN and then WHERE the joined table doesn't have any rows e.g.,
SELECT *
FROM people LEFT OUTER JOIN position ON (people.name=position.name)
WHERE position.name IS NULL
All will give the same results, but the EXISTS
versions tend to be closer to the theoretical approach.