I have a SQL query that is taking hours to run. Basically, I'm looking to find documents (PDF) in our "page" table that has special characters. I find those PDFs in the PDFFile table. Executing just the first AND clause returns in 16 seconds. Adding a second clause makes the SQL take 3hrs. Trying to figure out what I'm doing wrong. Any help is greatly appreciated.
The query:
select b.bookletname, b.trackingID, b.[version], s.name
from page p
inner join section s on s.id = p.sectionid
inner join booklet b on b.id = s.bookletid
INNER JOIN [user] u ON b.CreatedBy = u.id
INNER JOIN client c ON c.id = u.clientID
WHERE u.clientID = 2
AND p.[filename] IN (
SELECT DISTINCT pdf.[FileName]
FROM PDFFile pdf
WHERE pdf.fileName LIKE '%\<%' ESCAPE '\'
OR pdf.fileName LIKE '%\>%' ESCAPE '\'
OR pdf.fileName LIKE '%\"%' ESCAPE '\'
OR pdf.fileName LIKE '%\%%' ESCAPE '\'
OR pdf.fileName LIKE '%''%' ESCAPE '\'
OR pdf.fileName LIKE '%\*%' ESCAPE '\'
OR pdf.fileName LIKE '%\+%' ESCAPE '\'
OR pdf.fileName LIKE '%\\%' ESCAPE '\'
OR pdf.fileName LIKE '%\/%' ESCAPE '\'
OR pdf.fileName LIKE '%\:%' ESCAPE '\'
OR pdf.fileName LIKE '%\?%' ESCAPE '\'
OR pdf.fileName LIKE '%\[%' ESCAPE '\'
OR pdf.fileName LIKE '%\]%' ESCAPE '\'
OR pdf.fileName LIKE '%\|%' ESCAPE '\'
)
OR p.[PDF_File_Name] IN (
SELECT DISTINCT pdf.[FileName]
FROM PDFFile pdf
WHERE pdf.fileName LIKE '%\<%' ESCAPE '\'
OR pdf.fileName LIKE '%\>%' ESCAPE '\'
OR pdf.fileName LIKE '%\"%' ESCAPE '\'
OR pdf.fileName LIKE '%\%%' ESCAPE '\'
OR pdf.fileName LIKE '%''%' ESCAPE '\'
OR pdf.fileName LIKE '%\*%' ESCAPE '\'
OR pdf.fileName LIKE '%\+%' ESCAPE '\'
OR pdf.fileName LIKE '%\\%' ESCAPE '\'
OR pdf.fileName LIKE '%\/%' ESCAPE '\'
OR pdf.fileName LIKE '%\:%' ESCAPE '\'
OR pdf.fileName LIKE '%\?%' ESCAPE '\'
OR pdf.fileName LIKE '%\[%' ESCAPE '\'
OR pdf.fileName LIKE '%\]%' ESCAPE '\'
OR pdf.fileName LIKE '%\|%' ESCAPE '\'
)
OR p.[PDFName] IN (
SELECT DISTINCT pdf.[FileName]
FROM PDFFile pdf
WHERE pdf.fileName LIKE '%\<%' ESCAPE '\'
OR pdf.fileName LIKE '%\>%' ESCAPE '\'
OR pdf.fileName LIKE '%\"%' ESCAPE '\'
OR pdf.fileName LIKE '%\%%' ESCAPE '\'
OR pdf.fileName LIKE '%''%' ESCAPE '\'
OR pdf.fileName LIKE '%\*%' ESCAPE '\'
OR pdf.fileName LIKE '%\+%' ESCAPE '\'
OR pdf.fileName LIKE '%\\%' ESCAPE '\'
OR pdf.fileName LIKE '%\/%' ESCAPE '\'
OR pdf.fileName LIKE '%\:%' ESCAPE '\'
OR pdf.fileName LIKE '%\?%' ESCAPE '\'
OR pdf.fileName LIKE '%\[%' ESCAPE '\'
OR pdf.fileName LIKE '%\]%' ESCAPE '\'
OR pdf.fileName LIKE '%\|%' ESCAPE '\'
)
Rewrite this using exists
and simplify the like
s. SQL Server allows you to have lists of characters in the pattern, and most of the character you have do not need escaping (I hope I got the resulting pattern exactly right).
I would recommend:
select b.bookletname, b.trackingID, b.[version], s.name
from page p inner join
section s
on s.id = p.sectionid inner join
booklet b
on b.id = s.bookletid inner join
[user] u
on b.CreatedBy = u.id inner join
client c
on c.id = u.clientID
where u.clientID = 2 and
(exists (select 1
from PDFFile pdf
where pdf.fileName LIKE '%[<>"''*+\\/:?\[\]|]%' ESCAPE '\' and
pdf.fileName = p.filename
) or
exists (select 1
from PDFFile pdf
where pdf.fileName LIKE '%[<>"''*+\\/:?\[\]|]%' ESCAPE '\' and
pdf.fileName = p.PDF_File_Name
) or
exists (select 1
from PDFFile pdf
where pdf.fileName LIKE '%[<>"''*+\\/:?\[\]|]%' ESCAPE '\' and
pdf.fileName = p.PDFName
)
);
You want to be sure that you have an index on PDFFile(fileName)
for performance.
I also changed the logic. Your original logic was like where A and B or C or D
which is parsed as where (A and B) or C or C
. I changed it to where A and (B or C or D)
. This might also have accounted for a performance bottleneck.