Search code examples
sql-serverperformancesql-delete

Why delete rows seems to be slow in SQL Server?


I have 2 tables, one of them student name has 35 M rows, and another table (name is index) has 1500 rows.

When I use this query:

SELECT DISTINCT family 
FROM [DBName].[dbo].[student] 

it takes about 3 seconds to run and return results (given that it has 35 million rows, but soon return the result).

But when I run

DELETE FROM [DBName].[dbo].[index] 
WHERE family NOT IN (SELECT DISTINCT family FROM [DBName].[dbo].[student]) 

that's very slow and I waited for thirty minutes but it was not finished yet.

Why?


Solution

  • This is an apples to oranges comparison, since a DELETE is writing to the transaction log and SELECT is not.

    The DISTINCT works by virtue of sorting the output rows in order to find duplicates. This will be made really slow if you are missing an index (for what it's worth 3 seconds seems slow to me for output). Even slower when performing a NOT IN on the same column. This doesn't take into consideration any constraints you might have on that column, which could further slow it down.

    You can likely make this faster using GROUP BY.

    DELETE 
    FROM      [DBName].[dbo].[index] 
    WHERE     family NOT IN (
                SELECT      family 
                FROM        [DBName].[dbo].[student]            
                GROUP BY    family) 
    

    Likely faster still using NOT EXISTS instead of NOT IN:

    DELETE 
    FROM      [DBName].[dbo].[index] AS outer
    WHERE     family NOT EXISTS (
                SELECT      family 
                FROM        [DBName].[dbo].[student] AS inner
                WHERE       inner.family = outer.family
                GROUP BY    family)