Search code examples
sqlduplicatesuniquesql-delete

Delete all but one duplicate record


I have a table that is supposed to keep a trace of visitors to a given profile (user id to user id pair). It turns out my SQL query was a bit off and is producing multiple pairs instead of single ones as intended. With hindsight I should have enforced a unique constraint on each id+id pair.

Now, how could I go about cleaning up the table? What I want to do is delete all duplicate pairs and leave just one.

So for example change this:

23515 -> 52525 date_visited
23515 -> 52525 date_visited
23515 -> 52525 date_visited
12345 -> 54321 date_visited
12345 -> 54321 date_visited
12345 -> 54321 date_visited
12345 -> 54321 date_visited
23515 -> 52525 date_visited
...

Into this:

23515 -> 52525 date_visited
12345 -> 54321 date_visited

Update: Here is the table structure as requested:

id  int(10)         UNSIGNED    Non     Aucun   AUTO_INCREMENT
profile_id  int(10)         UNSIGNED    Non     0 
visitor_id  int(10)         UNSIGNED    Non     0
date_visited    timestamp           Non     CURRENT_TIMESTAMP   

Solution

  • ANSI SQL Solution

    Use group by in a subquery:

    delete from my_tab where id not in 
    (select min(id) from my_tab group by profile_id, visitor_id);
    

    You need some kind of unique identifier(here, I'm using id).

    MySQL Solution

    As pointed out by @JamesPoulson, this causes a syntax error in MySQL; the correct solution is (as shown in James' answer):

    delete from `my_tab` where id not in
    ( SELECT * FROM 
        (select min(id) from `my_tab` group by profile_id, visitor_id) AS temp_tab
    );