Search code examples
postgresqlduplicatesmultiple-columns

duplicate multi column entries postgresql


I have a bunch of data in a postgresql database. I think that two keys should form a unique pair, so want to enforce that in the database. I try

create unique index key1_key2_idx on table(key1,key2)

but that fails, telling me that I have duplicate entries.

How do I find these duplicate entries so I can delete them?


Solution

  • Assuming you only want to delete the duplicates and keep the original, the accepted answer is inaccurate -- it'll delete your originals as well and only keep records that have one entry from the start. This works on 9.x:

    SELECT * FROM tblname WHERE ctid IN 
    (SELECT ctid FROM 
      (SELECT ctid, ROW_NUMBER() OVER 
        (partition BY col1, col2, col3 ORDER BY ctid) AS rnum 
      FROM tblname) t
    WHERE t.rnum > 1);
    

    https://wiki.postgresql.org/wiki/Deleting_duplicates