I'm looking to find all duplicate records and then select all duplicates minus the oldest record from each set so that I can delete duplicates and keep one unique record.
When I run this query, I get the results I want. Leaves me with a unique email address and the oldest create date.
SELECT
EmailAddress,
MIN(CreatedDate)
FROM [_ListSubscribers]
WHERE EmailAddress IN
(
SELECT EmailAddress
FROM _ListSubscribers
GROUP BY EmailAddress
HAVING COUNT(EmailAddress) > 1
)
GROUP BY EmailAddress
When I add SubscriberKey to the query, the results DOUBLE! Why is that? I just want to see the SubscriberKey tied to the EmailAddress I found that has the oldest date in the subquery.
SELECT
EmailAddress,
SubscriberKey,
MIN(CreatedDate)
FROM [_ListSubscribers]
WHERE EmailAddress IN
(
SELECT EmailAddress
FROM _ListSubscribers
GROUP BY EmailAddress
HAVING COUNT(EmailAddress) > 1
)
GROUP BY EmailAddress, SubscriberKey
You're getting multiple records because you're grouping by SubscriberKey
. You'll need to match by EmailAddress
and CreatedDate
. Try doing a sub query and joining it back to your original table.
select
[_ListSubscribers].EmailAddress,
[_ListSubscribers].SubscriberKey,
[_ListSubscribers].CreatedDate,
from
(
SELECT
EmailAddress,
MIN(CreatedDate) as CreatedDate
FROM [_ListSubscribers]
GROUP BY EmailAddress, SubscriberKey
Having count(EmailAddress)>1
) SubTbl
inner join
[_ListSubscribers] on
[_ListSubscribers].EmailAddress = SubTbl.EmailAddress
and
[_ListSubscribers].CreatedDate = SubTbl.CreatedDate