Search code examples
sql-servert-sql

How to optimize this TSQL script (delete by iterating)?


I have 70 000 tokens in an excel file. I have to use them to find and delete rows from a table. Table has about 4 million rows.

When I tried to run this script CPU usage went over the top. What is the best way to optimize it (like maybe split it into smaller transactions, batch delete, cursor etc.)?

Here is an example record from the table: enter image description here

IF EXISTS(SELECT [name] FROM tempdb.sys.objects WHERE [name] like '#tokens%') 
BEGIN
   DROP TABLE #tokens
END

CREATE TABLE #tokens(token nvarchar(50))

INSERT INTO #tokens(token) SELECT t.token FROM (VALUES ('aaa'), ('bbb'), ('ccc')) AS t(token) --- here would be more tokens inserted

DECLARE @command nvarchar(max)
DECLARE @token nvarchar(50)

WHILE ((select count(1) from #tokens) > 0)
BEGIN
    SELECT TOP 1 @token = token FROM #tokens

    SET @command = 'SET ANSI_NULLS ON; SET QUOTED_IDENTIFIER ON; delete from MyTable where token like ''%' + @token + '%''' --- has to be 'like' due to some different prefixes so it can be 'x.aaa' or 'y.aaa' or both
    EXECUTE sp_executesql @command
    
    SET ANSI_NULLS ON
    SET QUOTED_IDENTIFIER ON

    DELETE FROM #tokens WHERE token = @token
END

DROP TABLE #tokens

Solution

  • You clarified in the comments that the motivation for the LIKE was to match {guid} with x.{guid} when the prefix is unknown.

    Assuming this is a one off task so you might not want to change the table structure to facilitate a one off operation then you can do.

    --Primary key added for index
    CREATE TABLE #tokens(token varchar(36) PRIMARY KEY)
    
    INSERT INTO #tokens(token) 
    SELECT t.token
    FROM (VALUES ('aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa'), 
                 ('bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb'), 
                 ('cccccccc-cccc-cccc-cccc-cccccccccccc')) AS t(token)
    
    DELETE mt
    FROM MyTable mt
    WHERE RIGHT(mt.token,36) IN (SELECT t.token FROM #tokens t);
    

    This changes the condition from a LIKE to an equality condition and means that hash join, merge join, or nested loops with MyTable as the outer table and an index seek on #tokens are all viable options rather than having to scan all 4 million rows in MyTable for every row in #tokens.