I have 70 000 tokens in an excel file. I have to use them to find and delete rows from a table. Table has about 4 million rows.
When I tried to run this script CPU usage went over the top. What is the best way to optimize it (like maybe split it into smaller transactions, batch delete, cursor etc.)?
Here is an example record from the table:
IF EXISTS(SELECT [name] FROM tempdb.sys.objects WHERE [name] like '#tokens%')
BEGIN
DROP TABLE #tokens
END
CREATE TABLE #tokens(token nvarchar(50))
INSERT INTO #tokens(token) SELECT t.token FROM (VALUES ('aaa'), ('bbb'), ('ccc')) AS t(token) --- here would be more tokens inserted
DECLARE @command nvarchar(max)
DECLARE @token nvarchar(50)
WHILE ((select count(1) from #tokens) > 0)
BEGIN
SELECT TOP 1 @token = token FROM #tokens
SET @command = 'SET ANSI_NULLS ON; SET QUOTED_IDENTIFIER ON; delete from MyTable where token like ''%' + @token + '%''' --- has to be 'like' due to some different prefixes so it can be 'x.aaa' or 'y.aaa' or both
EXECUTE sp_executesql @command
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
DELETE FROM #tokens WHERE token = @token
END
DROP TABLE #tokens
You clarified in the comments that the motivation for the LIKE
was to match {guid}
with x.{guid}
when the prefix is unknown.
Assuming this is a one off task so you might not want to change the table structure to facilitate a one off operation then you can do.
--Primary key added for index
CREATE TABLE #tokens(token varchar(36) PRIMARY KEY)
INSERT INTO #tokens(token)
SELECT t.token
FROM (VALUES ('aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa'),
('bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb'),
('cccccccc-cccc-cccc-cccc-cccccccccccc')) AS t(token)
DELETE mt
FROM MyTable mt
WHERE RIGHT(mt.token,36) IN (SELECT t.token FROM #tokens t);
This changes the condition from a LIKE
to an equality condition and means that hash join, merge join, or nested loops with MyTable
as the outer table and an index seek on #tokens
are all viable options rather than having to scan all 4 million rows in MyTable
for every row in #tokens
.