Search code examples
sqlsql-servert-sql

COUNT and delete of the number of duplicates from table with billion rows


I need to know how many duplicate rows are in a table which has a billion rows

And then I need to delete these by chunks if possible.

This table has 32 columns and I want to check if there are any repeated rows with the same values in all the 32 columns.

I have tried 1)

SELECT COUNT(*) 
FROM 
    (SELECT COLUMN1,COLUMN2,COLUMN3....COLUMN32 ,COUNT(*)
     FROM TABLE 
     GROUP BY COLUMN1,COLUMN2,COLUMN3....COLUMN32
     HAVING COUNT(*) > 1)

and 2)

I have also tried with ROW_NUMBER

Could not allocate space for object 'dbo.SORT temporary run storage: 141881590939648' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup

This takes forever and this fail with tempdb error.


Solution

  • I used the below

    SELECT -COUNT_BIG(*) Dup
    into #T
    FROM (
        SELECT DISTINCT Table.*
        FROM Table
    ) X
    
    Insert into #T
    SELECT COUNT_BIG(*)
    FROM Table
    
    SELECT SUM(DUP)
    FROM #T
    
    DROP TABLE #T
    

    Took few hours to run as this table has billions of records.