Search code examples
mysqldeduplication

Remove all duplicates from mysql table


I have a table which shows product ID's and how many times they have been given, 1 star, 2 stars, 3 stars, 4 stars and 5 stars when reviewed by customers along with the average rating for that product. There are some duplicate rows appearing in this table. How can I remove the rows which are complete duplicates e.g. ProductId 1196585. An example of the table is below.

ProductId  | rate1 | rate2 | rate3 | rate4 | rate5 | aveRate

1294518    |   4   |   1   |   0   |   0   |   0   |   1

9226582    |   0   |   0   |   3   |   0   |   0   |   3

3946583    |   0   |   0   |   0   |   1   |   0   |   4

7392588    |   1   |   0   |   0   |   0   |   6   |   5

1196585    |   0   |   0   |   2   |   3   |   9   |   5

1196585    |   0   |   0   |   2   |   3   |   9   |   5

For clarification I want to modify the table and I will make a copy of it first.


Solution

  • Further to my earlier comment about a unique ID field, I had a play trying to come up with a normal SQL way of doing this in one statement and without the need to temporarily alter the table if you had such an id.

    DELETE ProductRate 
    FROM ProductRate
    INNER JOIN (
    SELECT id, ProductId, rate1, rate2, rate3, rate4, rate5, aveRate, 
    @Counter := (
                CASE
                    WHEN ProductId != @ProductId THEN 0
                    WHEN rate1 != @rate1 THEN 0
                    WHEN rate2 != @rate2 THEN 0
                    WHEN rate3 != @rate3 THEN 0
                    WHEN rate4 != @rate4 THEN 0
                    WHEN rate5 != @rate5 THEN 0
                    WHEN aveRate != @aveRate THEN 0
                    ELSE @Counter + 1
                END
                ) AS RecCounter,
    @ProductId:=ProductId, 
    @rate1:=rate1, 
    @rate2:=rate2, 
    @rate3:=rate3, 
    @rate4:=rate4, 
    @rate5:=rate5, 
    @aveRate:=aveRate
    FROM ProductRate,
    (SELECT @Counter:=0, @ProductId:=0, @rate1:=0, @rate2:=0, @rate3:=0, @rate4:=0, @rate5:=0, @aveRate:=0) Deriv1
    ORDER BY ProductId, rate1, rate2, rate3, rate4, rate5, aveRate) Deriv2
    ON ProductRate.id = Deriv2.id
    WHERE Deriv2.RecCounter >0
    

    Note that this assumes you do have a unique id on the table. You can easily add one by:-

    ALTER TABLE `ProductRate` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST
    

    I have only done this out of interest and for a one off job I would be happy using the suggestions from either Er. Nikhil Agrawal or Devart. However if this is a regular issue which can occur but only occasionally needs cleaning up, adding the unique meaningless id and using this SQL might be worthwhile.