I have read many answers related to this and all are saying both works almost same except in the case of null values ,and not in works for a single column.
The task is to find each account number that has never been assigned to any line item in invoice_line_items table:
Correct query is:
SELECT
account_number, account_description
FROM
general_ledger_accounts gl
WHERE
NOT EXISTS( SELECT
account_number
FROM
invoice_line_items
WHERE
gl.account_number = account_number);
And if I remove gl.account_number = account_number it is returning zero rows.
I want to know :
1 ) Why there is a need for the statement gl.account_number = account_number in the subquery.
2 ) How the selection process is different in Not In and Not Exists.
You should write this using qualified column references throughout the query. Also, the select
in the subquery doesn't matter so I usually use 1
:
SELECT gl.account_number, gl.account_description
FROM general_ledger_accounts gl
WHERE NOT EXISTS (SELECT 1
FROM invoice_line_items ili
WHERE gl.account_number = ili.account_number
);
The subquery is a correlated subquery because the where
clause connects the inner subquery on ili
to the outer subquery on gl
. Conceptually, this is going through each account in gl
. The where
clause evaluates to true
when no rows match a particular account_number
.
This is similar to NOT IN
, except -- as you yourself point out -- when the NOT IN
subquery returns NULL
for any row.