I am using 2 separate query to check for duplicates and both query were supposed to give the same answer when executed, however it did not. Below is an example of my database table data:
id Name Age Group SeatNo
------------------------------------------
1 Alpha 11 A NULL
2 Bravo 12 A 1
3 Alpha 11 B NULL
This is my first query where it shows there are duplicates
SELECT count(*)
FROM Test AS ta
JOIN Test AS tb ON ta.name=tb.name AND ta.age=tb.age
WHERE ta.GroupName='A'
AND tb.GroupName='B'
This is my second Query that shows zero duplicates
SELECT count(*)
FROM Test AS ta
JOIN Test AS tb ON ta.name=tb.name AND ta.age=tb.age AND ta.SeatNo=tb.SeatNo
WHERE ta.GroupName='A'
AND tb.GroupName='B
After looking through both query and the data in the table, it seems that NULL in the seatNo affected the second query and cause it to return 0 duplicates. Is there any solution to search for duplicates even though it is NULL?
You have learned that NULL <> NULL
. What can you do about it?
Well, ANSI SQL has a comparator for this purpose IS DISTINCT FROM
. So, you could write:
SELECT count(*)
FROM Test AS ta JOIN
Test AS tb
ON ta.name = tb.name AND ta.age = tb.age AND
NOT ta.SeatNo IS DISTINCT FROM tb.SeatNo
WHERE ta.GroupName = 'A' AND
tb.GroupName = 'B';
However, most databases do not support that. A more general form is:
SELECT count(*)
FROM Test AS ta JOIN
Test AS tb
ON ta.name = tb.name AND ta.age = tb.age AND
(ta.SeatNo = tb.SeatNo OR (ta.SeatNo IS NULL AND tb.SeatNo IS NULL) )
WHERE ta.GroupName = 'A' AND
tb.GroupName = 'B';
Another method uses UNION ALL
and aggregation. The following gets the duplicates:
select name, age, SeatNo
from ((select name, age, SeatNo, 'a' as which
from test
where GroupName = 'A'
) union all
(select name, age, SeatNo, 'B' as which
from test
where GroupName = 'B'
)
) ab
group by name, age, seatno
having count(distinct which) = 2;
You can then use this as a subquery to get the count:
select count(*)
from (select name, age, SeatNo
from ((select name, age, SeatNo, 'a' as which
from test
where GroupName = 'A'
) union all
(select name, age, SeatNo, 'B' as which
from test
where GroupName = 'B'
)
) ab
group by name, age, seatno
having count(distinct which) = 2
) ab;