Search code examples
c#sqlnullthree-valued-logic

SQL that check for Duplicates with NULL doesn't not return


I am using 2 separate query to check for duplicates and both query were supposed to give the same answer when executed, however it did not. Below is an example of my database table data:

id   Name   Age  Group SeatNo
------------------------------------------
1    Alpha  11    A    NULL
2    Bravo  12    A    1
3    Alpha  11    B    NULL

This is my first query where it shows there are duplicates

SELECT count(*)
FROM Test AS ta
JOIN Test AS tb ON ta.name=tb.name AND ta.age=tb.age
WHERE ta.GroupName='A'
AND tb.GroupName='B'

This is my second Query that shows zero duplicates

SELECT count(*)
FROM Test AS ta
JOIN Test AS tb ON ta.name=tb.name AND ta.age=tb.age AND ta.SeatNo=tb.SeatNo
WHERE ta.GroupName='A'
AND tb.GroupName='B

After looking through both query and the data in the table, it seems that NULL in the seatNo affected the second query and cause it to return 0 duplicates. Is there any solution to search for duplicates even though it is NULL?


Solution

  • You have learned that NULL <> NULL. What can you do about it?

    Well, ANSI SQL has a comparator for this purpose IS DISTINCT FROM. So, you could write:

    SELECT count(*)
    FROM Test AS ta JOIN
         Test AS tb
         ON ta.name = tb.name AND ta.age = tb.age AND
            NOT ta.SeatNo IS DISTINCT FROM tb.SeatNo
    WHERE ta.GroupName = 'A' AND
          tb.GroupName = 'B';
    

    However, most databases do not support that. A more general form is:

    SELECT count(*)
    FROM Test AS ta JOIN
         Test AS tb
         ON ta.name = tb.name AND ta.age = tb.age AND
            (ta.SeatNo = tb.SeatNo OR (ta.SeatNo IS NULL AND tb.SeatNo IS NULL) )
    WHERE ta.GroupName = 'A' AND
          tb.GroupName = 'B';
    

    Another method uses UNION ALL and aggregation. The following gets the duplicates:

    select name, age, SeatNo
    from ((select name, age, SeatNo, 'a' as which
           from test
           where GroupName = 'A'
          ) union all
          (select name, age, SeatNo, 'B' as which
           from test
           where GroupName = 'B'
          )
         ) ab
    group by name, age, seatno
    having count(distinct which) = 2;
    

    You can then use this as a subquery to get the count:

    select count(*)
    from (select name, age, SeatNo
          from ((select name, age, SeatNo, 'a' as which
                 from test
                 where GroupName = 'A'
                ) union all
                (select name, age, SeatNo, 'B' as which
                 from test
                 where GroupName = 'B'
                )
               ) ab
          group by name, age, seatno
          having count(distinct which) = 2  
         ) ab;