Search code examples
sql-serverquery-optimizationsql-server-2016

How to prevent bad optimization of CROSS APPLY with Inline Table-Valued Function


I have an inline table valued function which runs a query on an OLE-DB linked server, defined as follows:

 CREATE FUNCTION [dbo].[fnGetResultsForTag]
 (  
    @elapsedTimeTag NVARCHAR(50)
 )
 RETURNS TABLE 
 AS
 RETURN 
 (
    SELECT tag, time, value
    FROM PI.piarchive..picount
    WHERE tag = @elapsedTimeTag
        AND filterexpr = QUOTENAME(@elapsedTimeTag, '''') + ' > NEXTVAL(' + QUOTENAME(@elapsedTimeTag, '''') + ', ''*'')'
        AND time BETWEEN (SELECT result FROM PI.pifunction..date WHERE arg1='t-20h') AND (SELECT result FROM PI.pifunction..date WHERE arg1='*')
        AND timestep = (SELECT result FROM PI.pifunction..time WHERE arg1='12h')
        AND calcbasis = 'EventWeighted'
 )
 GO

It is called from a stored procedure like this:

SELECT results.*
FROM PI.pipoint..pipoint2 points
CROSS APPLY dbo.fnGetResultsForTag(points.tag) AS results
WHERE points.tag LIKE @TagPattern 
ORDER BY time ASC, tag ASC

MS documentation says that:

The APPLY operator works in the following way to produce the table source for the FROM clause:

  1. Evaluates right_table_source against each row of the left_table_source to produce rowsets.

    The values in the right_table_source depend on left_table_source. right_table_source can be represented approximately this way: TVF(left_table_source.row), where TVF is a table-valued function.

  2. Combines the result sets that are produced for each row in the evaluation of right_table_source with the left_table_source by performing a UNION ALL operation.

    The list of columns produced by the result of the APPLY operator is the set of columns from the left_table_source that is combined with the list of columns from the right_table_source.

That's exactly the behavior that I want. But it's not what SQL Server is actually doing.

What actually happens is that the SQL Server query optimizer is trying to optimize the TVF subqueries by running one SELECT FROM PI.piarchive..picount query on the remote server, without the tag = @elapsedTimeTag and filterexpr = <stuff> expressions in the WHERE clause, and then do a join within SQL Server to select only the data for the specified tags.

Unfortunately this is an incorrect optimization. The picount table is a view on an underlying database which contains thousands of tags. If the tag = @elapsedTimeTag filter criteria is not supplied to the remote server, the query times out. And the filterexpr=<stuff> criteria needs to match the same tagname as the tag criteria, in order to get the correct answers. So I really need to run one TVF remote subquery for each row in the left-hand table.

How do I hint/force SQL Server to actually run the table-valued function's query once for each row in the left-hand table of the CROSS APPLY, instead of what it's currently doing?

I have tried refactoring the outer query to explicitly select the tag list first, but it doesn't seem to help:

DECLARE @TagNames AS TABLE (tag NVARCHAR(50) NOT NULL)

INSERT INTO @TagNames 
SELECT tag FROM PI.pipoint..pipoint2 WHERE tag LIKE @TagPattern

SELECT results.*
FROM @TagNames t
CROSS APPLY dbo.fnGetResultsForTag(t.tag) AS results
ORDER BY time ASC, tag ASC

Solution

  • Rewriting dbo.fnGetResultsForTag as a Multistatement Table Value Function (MTVF), instead of an Inline Table Value Function (ITVF), seems to have the desired effect of prohibiting this "optimization".

    But it feels awkwardly like relying on an implementation detail, so I'm still interested in alternative solutions which explicitly instruct the query optimizer not to "flatten" the ITVF sub-queries.