Search code examples
sqlsql-serverdatabasequery-performance

SQL command running slow. Need help identifying slowness


I have a SQL query that is taking hours to run. Basically, I'm looking to find documents (PDF) in our "page" table that has special characters. I find those PDFs in the PDFFile table. Executing just the first AND clause returns in 16 seconds. Adding a second clause makes the SQL take 3hrs. Trying to figure out what I'm doing wrong. Any help is greatly appreciated.

The query:

select b.bookletname, b.trackingID, b.[version], s.name
from page p
inner join section s on s.id = p.sectionid
inner join booklet b on b.id = s.bookletid
INNER JOIN [user] u ON b.CreatedBy = u.id
INNER JOIN client c ON c.id = u.clientID
WHERE u.clientID = 2
AND p.[filename] IN (
    SELECT DISTINCT pdf.[FileName]
    FROM PDFFile pdf
    WHERE pdf.fileName LIKE '%\<%' ESCAPE '\'
    OR pdf.fileName LIKE '%\>%' ESCAPE '\'
    OR pdf.fileName LIKE '%\"%' ESCAPE '\'
    OR pdf.fileName LIKE '%\%%' ESCAPE '\'
    OR pdf.fileName LIKE '%''%' ESCAPE '\'
    OR pdf.fileName LIKE '%\*%' ESCAPE '\'
    OR pdf.fileName LIKE '%\+%' ESCAPE '\'
    OR pdf.fileName LIKE '%\\%' ESCAPE '\'
    OR pdf.fileName LIKE '%\/%' ESCAPE '\'
    OR pdf.fileName LIKE '%\:%' ESCAPE '\'
    OR pdf.fileName LIKE '%\?%' ESCAPE '\'
    OR pdf.fileName LIKE '%\[%' ESCAPE '\'
    OR pdf.fileName LIKE '%\]%' ESCAPE '\'
    OR pdf.fileName LIKE '%\|%' ESCAPE '\'
)
OR p.[PDF_File_Name] IN (
    SELECT DISTINCT pdf.[FileName]
    FROM PDFFile pdf
    WHERE pdf.fileName LIKE '%\<%' ESCAPE '\'
    OR pdf.fileName LIKE '%\>%' ESCAPE '\'
    OR pdf.fileName LIKE '%\"%' ESCAPE '\'
    OR pdf.fileName LIKE '%\%%' ESCAPE '\'
    OR pdf.fileName LIKE '%''%' ESCAPE '\'
    OR pdf.fileName LIKE '%\*%' ESCAPE '\'
    OR pdf.fileName LIKE '%\+%' ESCAPE '\'
    OR pdf.fileName LIKE '%\\%' ESCAPE '\'
    OR pdf.fileName LIKE '%\/%' ESCAPE '\'
    OR pdf.fileName LIKE '%\:%' ESCAPE '\'
    OR pdf.fileName LIKE '%\?%' ESCAPE '\'
    OR pdf.fileName LIKE '%\[%' ESCAPE '\'
    OR pdf.fileName LIKE '%\]%' ESCAPE '\'
    OR pdf.fileName LIKE '%\|%' ESCAPE '\'
)
OR p.[PDFName] IN (
    SELECT DISTINCT pdf.[FileName]
    FROM PDFFile pdf
    WHERE pdf.fileName LIKE '%\<%' ESCAPE '\'
    OR pdf.fileName LIKE '%\>%' ESCAPE '\'
    OR pdf.fileName LIKE '%\"%' ESCAPE '\'
    OR pdf.fileName LIKE '%\%%' ESCAPE '\'
    OR pdf.fileName LIKE '%''%' ESCAPE '\'
    OR pdf.fileName LIKE '%\*%' ESCAPE '\'
    OR pdf.fileName LIKE '%\+%' ESCAPE '\'
    OR pdf.fileName LIKE '%\\%' ESCAPE '\'
    OR pdf.fileName LIKE '%\/%' ESCAPE '\'
    OR pdf.fileName LIKE '%\:%' ESCAPE '\'
    OR pdf.fileName LIKE '%\?%' ESCAPE '\'
    OR pdf.fileName LIKE '%\[%' ESCAPE '\'
    OR pdf.fileName LIKE '%\]%' ESCAPE '\'
    OR pdf.fileName LIKE '%\|%' ESCAPE '\'
)

Solution

  • Rewrite this using exists and simplify the likes. SQL Server allows you to have lists of characters in the pattern, and most of the character you have do not need escaping (I hope I got the resulting pattern exactly right).

    I would recommend:

    select b.bookletname, b.trackingID, b.[version], s.name
    from page p inner join
         section s
         on s.id = p.sectionid inner join
         booklet b
         on b.id = s.bookletid inner join
         [user] u
         on b.CreatedBy = u.id inner join
         client c 
         on c.id = u.clientID
    where u.clientID = 2 and
          (exists (select 1
                   from PDFFile pdf
                   where pdf.fileName LIKE '%[<>"''*+\\/:?\[\]|]%' ESCAPE '\' and
                         pdf.fileName = p.filename
                  ) or
           exists (select 1
                   from PDFFile pdf
                   where pdf.fileName LIKE '%[<>"''*+\\/:?\[\]|]%' ESCAPE '\' and
                         pdf.fileName = p.PDF_File_Name
                  ) or
           exists (select 1
                   from PDFFile pdf
                   where pdf.fileName LIKE '%[<>"''*+\\/:?\[\]|]%' ESCAPE '\' and
                         pdf.fileName = p.PDFName
                  )
          );
    

    You want to be sure that you have an index on PDFFile(fileName) for performance.

    I also changed the logic. Your original logic was like where A and B or C or D which is parsed as where (A and B) or C or C. I changed it to where A and (B or C or D). This might also have accounted for a performance bottleneck.