Search code examples
sqlmysqldirty-data

Is there a clean way of cleaning up duplicate entries in MySQL?


In a table, I have three columns - id, name, and count. A good number of name columns are identical (due to the lack of a UNIQUE early on) and I want to fix this. However, the id column is used by other (4 or 5, I think - I would have to check the docs) tables to look up the name and just removing them would break things. So is there a good, clean way of saying "find all identical records and merge them together"?


Solution

  • This kind of question comes up from time to time. No, there's not a really clean way to do it. You have to change all the rows in the child table that depend on unwanted values in the parent table before you can eliminate the unwanted rows in the parent table.

    MySQL supports multi-table UPDATE and DELETE statements (unlike other brands of database) and so you can do some pretty neat tricks like the following:

    UPDATE names n1
      JOIN names n2 ON (n1.id < n2.id AND n1.name = n2.name)
      JOIN child_table c ON (n2.id = c.id)
    SET c.name_id = n1.id
    ORDER BY n1.id DESC;
    

    Once you have done this on all the child table(s), you can use MySQL's multi-table DELETE syntax to remove unwanted rows in the parent table:

    DELETE FROM n2
      USING names n1 JOIN names n2 ON (n1.id < n2.id AND n1.name = n2.name);