I have two tables that each hold the same fields but from two different data entry systems, which I've joined using UNION so have no true duplicate records. However some records have been entered on both systems with slightly different data, and I need to remove rows where they were entered with null values on one system but have values on the other - I've used COALESCE and in 99% of the cases this has worked perfectly. The problem is where there are two rows which both have genuine nulls (see below for sample data):
PersonID Location1 Location2 Date
1 NULL NULL 2016-05-01
1 NULL NULL 2014-10-01
1 Home Home 2016-05-01
2 HospitalA HospitalB 2016-07-23
2 NULL NULL 2016-07-23
3 HospitalA HospitalA 2014-12-19
3 HospitalB HospitalB 2016-08-16
If the dates are the same, I only want the row with location values but if the dates are different I want one row for each date even if one of the dates has NULL as the location - PersonID 1 has two rows, one NULL location dated 2014-10-01 and one with location dated 2016-05-01; PersonID 2 has one row with a location dated 2016-07-23 and PersonID 3 has two rows both with locations and different dates.
The query I'm using is
select *
from
(select PersonID, Location1, Location2, Date
from tablea
union
select PersonID, Location1, Location2, Date
from tableb) as PID
where Location1 = coalesce(Location1,'')
Thanks in advance for any pointers
The problem is that UNION
lists the content of the two tables one after the other, so each row only has the fields of its original table. This means that where Location1 = coalesce(Location1,'')
is checking a field against itself.
To achieve what you want you will probably need an aggregation on the result of your UNION
select PersonID, max(Location1), max(Location2), Date
from (
select PersonID, Location1, Location2, Date
from tablea
union all
select PersonID, Location1, Location2, Date
from tableb
) as PID
group by PersonID, Date