Search code examples
sqlpostgresqlsql-null

NULL NOT IN (Empty_Relation) of SQL query shows different behaviors on different engines


I try a query which test NULL NOT IN Empty_Relation on Postrgresql, Spark and I got different results.

select count(*) from
(select 1)
where null not in
(a empty relation)

Postgresql outputs 1. The other outputs 0.

I understand the NULL behaviour of NOT IN, but my the subquery is empty relation, this situation seems more interesting. There are a lot of posts discussing NOT IN but I don't find anything related to NOT IN Empty_Relation.

So my question is more like does ANSI SQL define this behavior or this is actually a grey area, both answers could be accepted.


Solution

  • tl;dr: PostgreSQL is correct.

    This is what the SQL specification says about this behavior:

    4) The expression RVC NOT IN IPV is equivalent to NOT ( RVC IN IPV )

    5) The expression RVC IN IPV is equivalent to RVC = ANY IPV

    So, NULL NOT IN (<empty relation>) is equivalent to NOT (NULL = ANY (<empty relation>))

    Then, it goes on to say:

    The result of R <comp op> <quantifier> T is derived by the application of the implied <comparison predicate> R <comp op> RT to every row RT in T.

    [...]

    d) If T is empty or if the implied <comparison predicate> is False for every row RT in T, then R <comp op> <some> T is False.

    (Note: <some> is either ANY or SOME -- they both mean the same).

    By this rule, since T is empty, NULL = ANY (<empty>) is False, so NOT (NULL = ANY (<empty relation>) is True.