Search code examples
mysqlduplicatesdelete-rowcorresponding-records

Removing duplicate data from many rows in mysql?


I am a web developer so my knowledge of manipulating mass data is lacking.

A coworker is looking for a solution to our data problems. We have a table of about 400k rows with company names listed.

Whoever designed this didnt realize there needed to be some kind of unique identifier for a company, so there are duplicate entries for company names.

What method would one use in order to match all these records up based on company name, and delete the duplicates based on some kind of criteria (another column)

I was thinking of writing a script to do this in php, but I really have a hard time believing that my script would be able to execute while making comparisons between so many rows. Any advice?


Solution

  • Answer: Answer origin

    1) delete from table1

    2) USING table1, table1 as vtable

    3) WHERE (NOT table1.ID>vtable.ID)

    4) AND (table1.field_name=vtable.field_name)

    1. Here you tell mysql that there is a table1.
    2. Then you tell it that you will use table1 and a virtual table with the values of table1.
    3. This will let mysql not compare a record with itself!
    4. Here you tell it that there shouldn’t be records with the same field_name.