Search code examples
sqlsql-serverhashaggregatechecksum

SQL Table with a list of repeating values (duplicates) to find


I am trying to identify a list of duplicates from a table and my table looks like this:

Column1-Column2

  1. 1-1
  2. 1-2
  3. 1-3
  4. 2-1
  5. 2-2
  6. 2-3
  7. 3-1
  8. 3-2
  9. 3-4
  10. 4-1
  11. 4-2
  12. 4-3
  13. 4-4
  14. 5-1
  15. 5-2
  16. 5-4

    • 1 has a group of {1,2,3}
    • 2 has a group of {1,2,3}
    • And are duplicates
    • 3 has a group of {1,2,4}
    • 5 has a group of {1,2,4}
    • And are duplicates
    • 4 has a group of {1,2,3,4}
    • And has no friends ;)

Column 2 really is a varchar column, but I made everything numbers for simplicity sack.

I have been playing with CheckSum_Agg, but it has false positives. :(

My output would look something like this:

  • 1,2
  • 3,5

Where I select the min ID for the first column and all of the other values for the second column. Non-duplicates are omitted.

Another example might look like:

  • 1,2
  • 1,6
  • 3,5
  • 3,7
  • 3,8
  • (Notice no "4" in the list, I just added other "pairs" for show that 1 and 3 are the lowest. If 4 is in the list like 4,0 or 4,null, I can make that work too.)

I'm using SQL Server 2012. Thanks!


Solution

  • --This code produced the results I was looking for in the original post.  
    
    WITH t AS (
      SELECT
        column1,
        COUNT(*) c
      FROM #tbl
      GROUP BY column1
    ),
    tt AS(
    SELECT
      t1.column1 as 'winner',
      t2.column1 as 'loser'
    FROM t t1
    INNER JOIN t t2 ON (
      t1.c = t2.c AND
      t1.column1 < t2.column1
    )
    WHERE NOT EXISTS (
      SELECT column2 FROM #tbl WHERE column1 = t1.column1
      EXCEPT
      SELECT column2 FROM #tbl WHERE column1 = t2.column1
    )
    )
    SELECT fullList.winner, fullList.loser
    FROM
    (  SELECT winner FROM tt tt1
       EXCEPT
       SELECT loser FROM tt tt2
    ) winnerList
    JOIN tt fullList on winnerList.winner = fullList.winner
    ORDER BY fullList.winner, fullList.loser