I try a query which test NULL NOT IN Empty_Relation on Postrgresql, Spark and I got different results.
select count(*) from
(select 1)
where null not in
(a empty relation)
Postgresql outputs 1. The other outputs 0.
I understand the NULL behaviour of NOT IN, but my the subquery is empty relation, this situation seems more interesting. There are a lot of posts discussing NOT IN but I don't find anything related to NOT IN Empty_Relation.
So my question is more like does ANSI SQL define this behavior or this is actually a grey area, both answers could be accepted.
tl;dr: PostgreSQL is correct.
This is what the SQL specification says about this behavior:
4) The expression
RVC NOT IN IPV
is equivalent toNOT ( RVC IN IPV )
5) The expression
RVC IN IPV
is equivalent toRVC = ANY IPV
So, NULL NOT IN (<empty relation>)
is equivalent to NOT (NULL = ANY (<empty relation>))
Then, it goes on to say:
The result of
R <comp op> <quantifier> T
is derived by the application of the implied<comparison predicate>
R <comp op> RT
to every rowRT
inT
.[...]
d) If
T
is empty or if the implied<comparison predicate>
is False for every rowRT
inT
, thenR <comp op> <some> T
is False.
(Note: <some>
is either ANY
or SOME
-- they both mean the same).
By this rule, since T
is empty, NULL = ANY (<empty>)
is False, so NOT (NULL = ANY (<empty relation>)
is True.