Search code examples
sqlduplicatesamazon-redshiftrownum

Remove duplicates based on two columns SQL


Hi everyone,

I need to remove duplicates based on two columns. ANON ID and USER ID. They have a many to many relationship. i.e. an anon id can have several user id's and vice versa. I need to leave just one instance. Anywhere anon id OR user id appears as a duplicate, this needs to be removed.

Sample data

enter image description here

Only rows 1, 4, 6, 7 should remain.

I know I can use rownum() and delete where rownum > 1 for ONE duplicate column. However in this case I need to remove any row where EITHER ANON id or USER ID has already appeared.

Any help would be appreciated.


Solution

  • You can have two rownum() functions and delete based on either results. If for some reason you can't have 2 rownum functions in one query, you can use dense_rank too.