I have a table of people that I need to have a way to find duplicate records by multiple possible scenarios. For instance, group together if fname, lname and address are the same OR if fname, lname, dob are the same, OR if fname, lname, and Email are the same. I can't figure out a way to do this in SQL. I am using the above examples solely as examples, because the grouping criteria will eventually be much more strict. I've set up an example with data in SQL Fiddle. My desired result would group records 2-5 together and 1 and 6 would be unique rows.
CREATE TABLE Persons (
ID int IDENTITY(1,1),
FirstName varchar(255),
LastName varchar(255),
Address1 varchar(255),
City varchar(255),
State varchar(255),
BDay Varchar(255),
Email Varchar(255)
);
INSERT INTO Persons
SELECT 'RICK', 'ALLEN', '44 Street', 'Minneapolis', 'MN', '1/2/1970','[email protected]'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '123 Street', 'Minneapolis', 'MN', '4/8/1980','[email protected]'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '123 Street', 'Minneapolis', 'MN', '4/8/1981','[email protected]'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '42 Street', 'Minneapolis', 'MN', '4/8/1980','[email protected]'
UNION ALL
SELECT 'JENNIFER', 'ALLEN', '123 Street', 'Minneapolis', 'MN', '4/8/1980','[email protected]'
UNION ALL
SELECT 'STEVEN', 'ALLEN', '555 Street', 'Minneapolis', 'MN', '2/8/1980','[email protected]'
You could use a not exists
clause:
select p1.*
from Persons p1
where not exists
(
select *
from Persons p2
where p1.id < p2.id and
p1.FirstName = p2.FirstName and
p1.LastName = p2.LastName and
(
p1.Address1 = p2.Address1 or
p1.BDay = p2.BDay or
p1.Email = p2.Email
)
)
Working example on SQL Fiddle.
Replying to your comment, you can mark duplicates in the table with an update query:
with dupe as
(
select min(p1.ID) as OriginalID
, p2.ID as DupeID
from Persons p1
join Persons p2
on p1.id < p2.id and
p1.FirstName = p2.FirstName and
p1.LastName = p2.LastName and
(
p1.Address1 = p2.Address1 or
p1.BDay = p2.BDay or
p1.Email = p2.Email
)
group by
p2.ID
)
update p1
set DupeOfID = dupe.OriginalID
from Persons p1
join dupe
on dupe.DupeID = p1.ID