Search code examples

Why does the Execution Plan include a user-defined function call for a computed column that is persisted?

I have a table with 2 computed columns, both of which has "Is Persisted" set to true. However, when using them in a query the Execution Plan shows the UDF used to compute the columns as part of the plan. Since the column data is calculated by the UDF when the row is added/updated why would the plan include it?

The query is incredibly slow (>30s) when these columns are included in the query, and lightning fast (<1s) when they are excluded. This leads me to conclude that the query is actually calculating the column values at run time, which shouldn't be the case since they are set to persisted.

Am I missing something here?

UPDATE: Here's a little more info regarding our reasoning for using the computed column.

We are a sports company and have a customer who stores full player names in a single column. They require us to allow them to search player data by first and/or last name separately. Thankfully they use a consistent format for player names - LastName, FirstName (NickName) - so parsing them is relatively easy. I created a UDF that calls into a CLR function to parse the name parts using a regular expression. So obviously calling the UDF, which in turn calls a CLR function, is very expensive. But since it is only used on a persisted column I figured it would only be used during the few times a day that we import data into the database.


  • The reason is that the query optimizer does not do a very good job at costing user-defined functions. It decides, in some cases, that it would be cheaper to completely re-evaluate the function for each row, rather than incur the disk reads that might be necessary otherwise.

    SQL Server's costing model does not inspect the structure of the function to see how expensive it really is, so the optimizer does not have accurate information in this regard. Your function could be arbitrarily complex, so it is perhaps understandable that costing is limited this way. The effect is worst for scalar and multi-statement table-valued functions, since these are extremely expensive to call per-row.

    You can tell whether the optimizer has decided to re-evaluate the function (rather than using the persisted value) by inspecting the query plan. If there is a Compute Scalar iterator with an explicit reference to the function name in its Defined Values list, the function will be called once per row. If the Defined Values list references the column name instead, the function will not be called.

    My advice is generally not to use functions in computed column definitions at all.

    The reproduction script below demonstrates the issue. Notice that the PRIMARY KEY defined for the table is nonclustered, so fetching the persisted value would require a bookmark lookup from the index, or a table scan. The optimizer decides it is cheaper to read the source column for the function from the index and re-compute the function per row, rather than incur the cost of a bookmark lookup or table scan.

    Indexing the persisted column speeds the query up in this case. In general, the optimizer tends to favour an access path that avoids re-computing the function, but the decision is cost-based so it is still possible to see a function re-computed for each row even when indexed. Nevertheless, providing an 'obvious' and efficient access path to the optimizer does help to avoid this.

    Notice that the column does not have to be persisted in order to be indexed. This is a very common misconception; persisting the column is only required where it is imprecise (it uses floating-point arithmetic or values). Persisting the column in the present case adds no value and expands the base table's storage requirement.

    Paul White

    -- An expensive scalar function
    CREATE FUNCTION dbo.fn_Expensive(@n INTEGER)
        DECLARE @sum_n BIGINT;
        SET @sum_n = 0;
        WHILE @n > 0
            SET @sum_n = @sum_n + @n;
            SET @n = @n - 1
        RETURN @sum_n;
    -- A table that references the expensive
    -- function in a PERSISTED computed column
    CREATE TABLE dbo.Demo
        sum_n   AS dbo.fn_Expensive(n) PERSISTED
    -- Add 8000 rows to the table
    -- with n from 1 to 8000 inclusive
    WITH Numbers AS
        SELECT TOP (8000)
            n = ROW_NUMBER() OVER (ORDER BY (SELECT 0))
        FROM master.sys.columns AS C1
        CROSS JOIN master.sys.columns AS C2
        CROSS JOIN master.sys.columns AS C3
    INSERT dbo.Demo (N.n)
    FROM Numbers AS N
        N.n >= 1
        AND N.n <= 5000
    -- This is slow
    -- Plan includes a Compute Scalar with:
    -- [dbo].[Demo].sum_n = Scalar Operator([[dbo].[fn_Expensive]([dbo].[Demo].[n]))
    -- QO estimates calling the function is cheaper than the bookmark lookup
    FROM dbo.Demo;
    -- Index the computed column
    -- Notice the actual plan also calls the function for every row, and includes:
    -- [dbo].[Demo].sum_n = Scalar Operator([[dbo].[fn_Expensive]([dbo].[Demo].[n]))
    CREATE UNIQUE INDEX uq1 ON dbo.Demo (sum_n);
    -- Query now uses the index, and is fast
    FROM dbo.Demo;
    -- Drop the index
    DROP INDEX uq1 ON dbo.Demo;
    -- Don't persist the column
    ALTER TABLE dbo.Demo
    -- Show again, as you would expect
    -- QO has no option but to call the function for each row
    FROM dbo.Demo;
    -- Index the non-persisted column
    CREATE UNIQUE INDEX uq1 ON dbo.Demo (sum_n);
    -- Fast again
    -- Persisting the column bought us nothing
    -- and used extra space in the table
    FROM dbo.Demo;
    -- Clean up
    DROP TABLE dbo.Demo;
    DROP FUNCTION dbo.fn_Expensive;