Search code examples
sqlsql-servert-sqlsql-delete

Which way is faster to delete duplicate rows in sql?


I need to find and delete duplicate rows in a table with more than 4 million records. I would like to know which way is the best from speed point of view. I read this article, where several solutions are given, including using GROUP BY, PARTITION BY. But there is no explanation about which one is faster and most efficient.

My table has 23 columns (different data types), no Primary KEY and no FOREIGN KEYS but I was wondering if generally there is a better way to do it.

The article can be seen here.


Solution

  • Not having a primary key for your table is a general bad idea. Here is one way you can delete duplicates, with the record retained per 23 columns is arbitrary:

    WITH cte AS (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, ..., col22, col23
                                     ORDER BY (SELECT NULL)) rn
        FROM yourTable
    )
    
    DELETE
    FROM cte
    WHERE rn > 1;