Search code examples
sql-serverfunctionperformancewhere-clause

Why there is a significant performance difference when using function in WHERE clause


I have the same part of where clause used across different stored procedures and I am trying to get rid of duplication and ensure that code doesn't go out of sync. So I decided to extract this check into a function.

I am aware that generally this is not a good practice as this prevents SQL Server to use index seek efficiently, however in my scenario there is a wildcard search from both ends which would result in index scan anyways so I was not expecting this huge performance difference.

Original query:

SELECT COUNT(1) 
FROM MyTable 
WHERE MyField LIKE '%term1%' 
   OR MyField LIKE '%term2%' 
   OR MyField LIKE '%term3%'

Function:

CREATE FUNCTION dbo.fn_MyFunc
    (@CodeResult VARCHAR(MAX))
RETURNS BIT
WITH SCHEMABINDING, RETURNS NULL ON NULL INPUT  
AS
BEGIN
    RETURN 
        CASE 
            WHEN @CodeResult LIKE '%term1%' 
                 OR @CodeResult LIKE '%term2%' 
                 OR @CodeResult LIKE '%term3%' 
                THEN 1
                ELSE 0
        END;
END;

Updated query:

SELECT COUNT(1) 
FROM MyTable 
WHERE dbo.fn_MyFunc(MyField) = 1

Query execution plan:

enter image description here

Am I missing something? Why there is almost 10x difference?


Solution

  • The obvious difference is that one plan is executing in parallel and the other is serial. Non inlined scalar UDFs block parallel plans. So they are both equally inefficient in this case in that they both have a full index scan but the elapsed time is less in the parallel plan as it has multiple workers processing in parallel (as well as the time saved from not invoking a scalar UDF 6.3 million times in a separate execution context - which can also be significant).

    If you wanted to make this inlineable but in a module and aren't on a version/compatibility level where inline scalar UDFs are available you can use an inline TVF (the inlining of these is more reliable even in cases where inline scalar UDFs should be available as the implementation of these has had some issues).

    One example approach is

    CREATE FUNCTION dbo.fn_MyFilterFunc 
    (@CodeResult VARCHAR(MAX))
    RETURNS TABLE
    AS
        RETURN
          SELECT 1 AS Result
          WHERE  @CodeResult LIKE '%term1%'
                  OR @CodeResult LIKE '%term2%'
                  OR @CodeResult LIKE '%term3%' 
    

    And then you can use it as

    SELECT COUNT(1) 
    FROM MyTable 
    CROSS APPLY dbo.fn_MyFilterFunc(MyField)
    

    Query execution plan (original query with where clause vs TVF approach):

    enter image description here

    Depending on your scenario you could also consider a computed column to encapsulate this logic.

    That can be indexed and considerably improve performance

    ALTER TABLE MyTable ADD MatchResult AS  CASE 
                WHEN MyField LIKE '%term1%' 
                     OR MyField LIKE '%term2%' 
                     OR MyField LIKE '%term3%' 
                    THEN 1
                    ELSE 0
            END;
    
    
    CREATE INDEX IX_MatchResult ON MyTable(MatchResult)
    
    SELECT COUNT(*)
    FROM MyTable
    WHERE MatchResult = 1;