I need to know how many duplicate rows are in a table which has a billion rows
And then I need to delete these by chunks if possible.
This table has 32 columns and I want to check if there are any repeated rows with the same values in all the 32 columns.
I have tried 1)
SELECT COUNT(*)
FROM
(SELECT COLUMN1,COLUMN2,COLUMN3....COLUMN32 ,COUNT(*)
FROM TABLE
GROUP BY COLUMN1,COLUMN2,COLUMN3....COLUMN32
HAVING COUNT(*) > 1)
and 2)
I have also tried with ROW_NUMBER
Could not allocate space for object 'dbo.SORT temporary run storage: 141881590939648' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup
This takes forever and this fail with tempdb error.
I used the below
SELECT -COUNT_BIG(*) Dup
into #T
FROM (
SELECT DISTINCT Table.*
FROM Table
) X
Insert into #T
SELECT COUNT_BIG(*)
FROM Table
SELECT SUM(DUP)
FROM #T
DROP TABLE #T
Took few hours to run as this table has billions of records.