Search code examples
sqlnullleft-joinin-subquery

SQL select with "IN" subquery returns no records if the sub-query contains NULL


I came across this interesting behavior. I see left-join is the way to go, but would still like to have this cleared. Is it a bug or behavior by-design? Any explanations?

When I select records from left table, where a value is not present in the result of a subquery on the right table, the expected "missing" record is not returned if the subquery result has nulls. I expected the two ways to write this query to be equivalent.

Thanks!

declare @left table  (id int not null primary key identity(1,1), ref int null)
declare @right table (id int not null primary key identity(1,1), ref int null)

insert @left (ref) values (1)
insert @left (ref) values (2)

insert @right (ref) values (1)
insert @right (ref) values (null)

print 'unexpected empty resultset:'
select * from @left
where ref not in (select ref from @right)

print 'expected result - ref 2:'
select * from @left
where ref not in (select ref from @right where ref is not null)

print 'expected result - ref 2:'
select l.* from @left l
  left join @right r on r.ref = l.ref
where r.id is null

print @@version

gives:

(1 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)
unexpected empty resultset:
id          ref
----------- -----------

(0 row(s) affected)

expected result - ref 2:
id          ref
----------- -----------
2           2

(1 row(s) affected)

expected result - ref 2:
id          ref
----------- -----------
2           2

(1 row(s) affected)

Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (X64) 
    Apr  2 2010 15:48:46 
    Copyright (c) Microsoft Corporation
    Standard Edition (64-bit) on Windows NT 6.0 <X64> (Build 6002: Service Pack 2) (Hypervisor)

Solution

  • This is by design. If the match fails and the set contains NULL the result is NULL, as specified by the SQL standard.

    '1' IN ('1', '3') => true
    '2' IN ('1', '3') => false
    '1' IN ('1', NULL) => true
    '2' IN ('1', NULL) => NULL
    
    '1' NOT IN ('1', '3') => false
    '2' NOT IN ('1', '3') => true
    '1' NOT IN ('1', NULL) => false
    '2' NOT IN ('1', NULL) => NULL
    

    Informally, the logic behind this is that NULL can be thought of as an unknown value. For example here it doesn't matter what the unknown value is - '1' is clearly in the set, so the result is true.

    '1' IN ('1', NULL) => true
    

    In the following example we can't be sure that '2' is in the set, but since we don't know all the values we also can't be sure that it isn't in the set. So the result is NULL.

    '2' IN ('1', NULL) => NULL
    

    Another way of looking at it is by rewriting x NOT IN (Y, Z) as X <> Y AND X <> Z. Then you can use the rules of three-valued logic:

    true AND NULL => NULL
    false AND NULL => false