Search code examples
mysqlsqlgroup-byaggregate-functionswindow-functions

MySQL - How to find duplicate rows, based on single column?


  1. I am trying to find all the rows in my table, which a certain column is duplicated in them. and identify them by the unique id column that each row has.

  2. I want to delete them and leave only one copy(row), it doesn't matter which one. (if it can be done through SQL, it is preferred, if note, i will code it..)

(I'm using workbench)

I tried this:

SELECT 
  *
FROM
  table_name
GROUP BY `field_name`
HAVING COUNT(*) > 1;

It failed.
My first problem was the sql_mode=only_full_group_by, so I fixed it by adding field names to the query. and it worked.. well, sort of. Since now I saw only the aggregated result without the id column.

SELECT 
    field_name
FROM
    table_name
GROUP BY `field_name`
HAVING COUNT(*) > 1;

So, here I am.. How can I do it?


Solution

  • If I follow you correctly, you can use window functions:

    select *
    from (
        select t.*, count(*) over(partition by field_name) cnt
        from mytable t
    ) t
    where cnt > 1
    

    For each row, the subquery does a window count that computes how many rows share the same field_name value. Then, the outer query filters on rows whose field_name is not unique.

    Note that this requires MySQL 8.0. In earlier version, an alternative uses exists:

    select *
    from mytable t
    where exists (
        select 1 
        from mytable t1 
        where t1.field_value = t.field_value and t1.id != t.id
    )