Search code examples
sqlcoalesce

Coalesce in where clause removing too many rows


I have two tables that each hold the same fields but from two different data entry systems, which I've joined using UNION so have no true duplicate records. However some records have been entered on both systems with slightly different data, and I need to remove rows where they were entered with null values on one system but have values on the other - I've used COALESCE and in 99% of the cases this has worked perfectly. The problem is where there are two rows which both have genuine nulls (see below for sample data):

PersonID    Location1   Location2           Date
1           NULL            NULL            2016-05-01
1           NULL            NULL            2014-10-01
1           Home            Home            2016-05-01

2           HospitalA       HospitalB       2016-07-23
2           NULL            NULL            2016-07-23

3           HospitalA       HospitalA       2014-12-19
3           HospitalB       HospitalB       2016-08-16

If the dates are the same, I only want the row with location values but if the dates are different I want one row for each date even if one of the dates has NULL as the location - PersonID 1 has two rows, one NULL location dated 2014-10-01 and one with location dated 2016-05-01; PersonID 2 has one row with a location dated 2016-07-23 and PersonID 3 has two rows both with locations and different dates.

The query I'm using is

select *
from 
(select PersonID, Location1, Location2, Date
from tablea
union
select PersonID, Location1, Location2, Date
from tableb) as PID
where Location1 = coalesce(Location1,'')

Thanks in advance for any pointers


Solution

  • The problem is that UNION lists the content of the two tables one after the other, so each row only has the fields of its original table. This means that where Location1 = coalesce(Location1,'') is checking a field against itself.

    To achieve what you want you will probably need an aggregation on the result of your UNION

    select  PersonID, max(Location1), max(Location2), Date
    from    (
              select  PersonID, Location1, Location2, Date
              from    tablea
              union all
              select  PersonID, Location1, Location2, Date
              from tableb
            ) as PID
    group by PersonID, Date