I am trying to identify a list of duplicates from a table and my table looks like this:
Column1-Column2
5-4
Column 2 really is a varchar column, but I made everything numbers for simplicity sack.
I have been playing with CheckSum_Agg, but it has false positives. :(
My output would look something like this:
Where I select the min ID for the first column and all of the other values for the second column. Non-duplicates are omitted.
Another example might look like:
I'm using SQL Server 2012. Thanks!
--This code produced the results I was looking for in the original post.
WITH t AS (
SELECT
column1,
COUNT(*) c
FROM #tbl
GROUP BY column1
),
tt AS(
SELECT
t1.column1 as 'winner',
t2.column1 as 'loser'
FROM t t1
INNER JOIN t t2 ON (
t1.c = t2.c AND
t1.column1 < t2.column1
)
WHERE NOT EXISTS (
SELECT column2 FROM #tbl WHERE column1 = t1.column1
EXCEPT
SELECT column2 FROM #tbl WHERE column1 = t2.column1
)
)
SELECT fullList.winner, fullList.loser
FROM
( SELECT winner FROM tt tt1
EXCEPT
SELECT loser FROM tt tt2
) winnerList
JOIN tt fullList on winnerList.winner = fullList.winner
ORDER BY fullList.winner, fullList.loser