Search code examples
sqlperformanceviewsql-server-2019

Poor Performance of Table Driven Parameterized View due to poor cardinality estimation


We are trying to create a number of views which return data which is filtered by parameters stored in a parameter table.

This is so that a large number of views do not need to be redeployed in the scenario that some filtering criteria changes and means we don't need to update reports / front end code if some filtering criteria changes - it is all data driven by changing the parameters in the parameter table.

Below is a StackOverflow2010 database example representative of what is in place

Create a parameters table and add some values

CREATE TABLE Parameters
(
    ID INT IDENTITY(1,1) PRIMARY KEY,
    GroupName NVARCHAR(30),
    ParameterName NVARCHAR(30),
    ParameterValue NVARCHAR(255),
)

INSERT INTO Parameters VALUES ('Filtering Rules','LocationEqualityFilter','Paris, France')
INSERT INTO Parameters VALUES ('Filtering Rules','TitleLikeFilter','sql')

Create some covering indexes for the view

CREATE INDEX IX_Location ON Users
(
    [Location]
)
INCLUDE
(
    DisplayName,
    Reputation
)

CREATE INDEX IX_Title ON Posts
(
    Title
)
INCLUDE
(
    Tags,
    OwnerUserId
)

Create our view - this looks up posts whos titles start with a given string by users in a given location

CREATE OR ALTER VIEW vw_PostsByLocationWithTitle
AS
SELECT  u.DisplayName,
        u.Reputation,
        pos.tags
FROM    Users u
        JOIN Parameters p
            ON u.Location = p.ParameterValue AND
                p.ParameterName = 'LocationEqualityFilter'
        JOIN Posts pos
            ON pos.OwnerUserId = u.Id
        JOIN Parameters p1
            ON pos.Title LIKE p1.ParameterValue + N'%' AND
                p1.ParameterName = N'TitleEqualityFilter' 

Select from the view:

SET STATISTICS IO ON
SELECT * FROM vw_PostsByLocationWithTitle

The actual plan is here and the logical reads are

Table 'Users'. Scan count 7, logical reads 1873, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Parameters'. Scan count 2, logical reads 4, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 222, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

If I run the equivalent query with hardcoded literals, I get a much simpler plan, built on far better estimates:

SELECT  u.DisplayName,
        u.Reputation,
        p.Tags
FROM    Users u
        JOIN posts p
            ON p.OwnerUserId = u.Id
WHERE   Location = N'Paris, France' AND
        p.Title LIKE N'SQL%'

Plan

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 222, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Users'. Scan count 1, logical reads 10, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

Even the far simpler

SELECT  u.DisplayName,
        u.Reputation
FROM    Users u
        JOIN Parameters p
            ON u.Location = p.ParameterValue AND
                p.ParameterName = N'LocationEqualityFilter'

has pretty terrible estimations vs its literal counterpart below

SELECT  u.DisplayName,
        u.Reputation
FROM    Users u
WHERE   Location = N'Paris, France'

We are having a number of performance issues when users use views with this pattern and in each case, this can be tracked down to this poor estimation which can be fixed by replacing with literals

I can fix the estimate on this simpler query in two ways:

Using a variable:

DECLARE @Location NVARCHAR(100) = (SELECT ParameterValue FROM Parameters WHERE ParameterName = N'LocationEqualityFilter')

SELECT  u.DisplayName,
        u.Reputation
FROM    Users u
WHERE   Location = @Location

Using a temp table

SELECT  ParameterValue 
INTO    #Location
FROM    Parameters 
WHERE   ParameterName = N'LocationEqualityFilter'

SELECT  u.DisplayName,
        u.Reputation
FROM    Users u
        JOIN #Location l
            ON u.Location = l.ParameterValue

DROP TABLE #Location

I believe the reason these fix the estimate is that in the original query, SQL Server can estimate how many rows in Parameters match N'LocationEqualityFilter' but it doesn't know what the value for the ParameterValue column is so cannot use that for the histogram lookup required to estimate the number of rows in Users that match that value. However, declaring a variable means SQL Server can use the value of that variable for the histogram lookup and the temp table variant means SQL Server knows the ParameterValue by means of statistics on the temp table to be able to use it for the Histogram lookup on Users

Whilst both of these fixes solve the problem, neither are permitted in views.

I also tried this pattern which also gives estimation issues

SELECT  u.DisplayName,
        u.Reputation
FROM    Users u
WHERE   Location = (SELECT CAST(ParameterValue AS NVARCHAR(100)) FROM Parameters WHERE ParameterName = N'LocationEqualityFilter')

Finally, I tried to create an indexed view which returns the parameter value forLocationEqualityFilter

CREATE OR ALTER VIEW dbo.LocationEqualityFilter
WITH SCHEMABINDING AS
SELECT  ParameterValue
FROM    dbo.Parameters
WHERE   ParameterName = N'LocationEqualityFilter' 
GO

CREATE UNIQUE CLUSTERED INDEX IX_LocationEqualityFilter ON dbo.LocationEqualityFilter
(
    ParameterValue
)

The Optimizer indeed scans the index but the estimation on the Users table seek is still off.

I can see that the stats that were created for this view represent that the view returns one row and shows us the value:

enter image description here

In the same way the temp table solution does however, I can see that the plan does not load this stat:

enter image description here

The same is true if I rewrite my view to reference the new Indexed view explicitly:

SELECT  u.DisplayName,
        u.Reputation
FROM    Users u
        JOIN LocationEqualityFilter p
            ON u.Location = p.ParameterValue

However, If I add the WITH (NOEXPAND) hint. It does then get the correct estimate

Is there a way to fix this estimation in the view or some other method to create this data driven parameterisation?

The only options I can think of are:

  • Break each ParameterName into its own table, to replicate the temp table fix. There would have to be a trigger on the Parameters table to update this new table as records change
  • Create the views with literal values, alter the view definition each time the relevant value in the Parameters table is updated (by means of a trigger)
  • Create a scalar UDF which returns the literal value of the ParameterValue and use in the WHERE clause and update this definition by a trigger each time the Parameter table is updated. (Assuming SQL Server 2019 and scalar function inlining)
  • Indexed views as above

Solution

  • You are probably best off creating indexed views. Given that you only have inner join that should be no problem.

    Add the primary/unique keys of Users and Posts to the view, and create an index on those columns. The reason to use those columns is so that the compiler can efficiently locate them in the indexed view to update them when the base tables are updated.

    CREATE UNIQUE CLUSTERED INDEX ux ON dbo.vw_PostsByLocationWithTitle
      (UserId, PostId);
    

    This hopefully should allow the server to create statistics on the actual values, at the obvious cost of actually storing and maintaining all that data a second time.


    Note that you should always query such views using the WITH (NOEXPAND) hint, even on Enterprise Edition. To enforce that, you may wish to create another view which queries this view using NOEXPAND, and only allow the user to use that view.

    Also note that indexed views have many restrictions. Among the most notable:

    • Particular SET options (such as ANSI_NULLS) must be enabled when querying the view, or modifying the base tables. Older clients may not have these options set by default.
    • Only INNER JOIN is allowed, no LEFT RIGHT FULL or APPLY.
    • No TOP or OFFSET.
    • No DISTINCT UNION or other set operators.
    • GROUP BY is allowed, but only if the index covers all the non-aggregated columns, and a COUNT_BIG is included. The only other aggregation function allowed is SUM. HAVING is not allowed.