We are trying to create a number of views which return data which is filtered by parameters stored in a parameter table.
This is so that a large number of views do not need to be redeployed in the scenario that some filtering criteria changes and means we don't need to update reports / front end code if some filtering criteria changes - it is all data driven by changing the parameters in the parameter table.
Below is a StackOverflow2010 database example representative of what is in place
Create a parameters table and add some values
CREATE TABLE Parameters
(
ID INT IDENTITY(1,1) PRIMARY KEY,
GroupName NVARCHAR(30),
ParameterName NVARCHAR(30),
ParameterValue NVARCHAR(255),
)
INSERT INTO Parameters VALUES ('Filtering Rules','LocationEqualityFilter','Paris, France')
INSERT INTO Parameters VALUES ('Filtering Rules','TitleLikeFilter','sql')
Create some covering indexes for the view
CREATE INDEX IX_Location ON Users
(
[Location]
)
INCLUDE
(
DisplayName,
Reputation
)
CREATE INDEX IX_Title ON Posts
(
Title
)
INCLUDE
(
Tags,
OwnerUserId
)
Create our view - this looks up posts whos titles start with a given string by users in a given location
CREATE OR ALTER VIEW vw_PostsByLocationWithTitle
AS
SELECT u.DisplayName,
u.Reputation,
pos.tags
FROM Users u
JOIN Parameters p
ON u.Location = p.ParameterValue AND
p.ParameterName = 'LocationEqualityFilter'
JOIN Posts pos
ON pos.OwnerUserId = u.Id
JOIN Parameters p1
ON pos.Title LIKE p1.ParameterValue + N'%' AND
p1.ParameterName = N'TitleEqualityFilter'
Select from the view:
SET STATISTICS IO ON
SELECT * FROM vw_PostsByLocationWithTitle
The actual plan is here and the logical reads are
Table 'Users'. Scan count 7, logical reads 1873, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Parameters'. Scan count 2, logical reads 4, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 222, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
If I run the equivalent query with hardcoded literals, I get a much simpler plan, built on far better estimates:
SELECT u.DisplayName,
u.Reputation,
p.Tags
FROM Users u
JOIN posts p
ON p.OwnerUserId = u.Id
WHERE Location = N'Paris, France' AND
p.Title LIKE N'SQL%'
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Posts'. Scan count 1, logical reads 222, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Users'. Scan count 1, logical reads 10, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Even the far simpler
SELECT u.DisplayName,
u.Reputation
FROM Users u
JOIN Parameters p
ON u.Location = p.ParameterValue AND
p.ParameterName = N'LocationEqualityFilter'
has pretty terrible estimations vs its literal counterpart below
SELECT u.DisplayName,
u.Reputation
FROM Users u
WHERE Location = N'Paris, France'
We are having a number of performance issues when users use views with this pattern and in each case, this can be tracked down to this poor estimation which can be fixed by replacing with literals
I can fix the estimate on this simpler query in two ways:
DECLARE @Location NVARCHAR(100) = (SELECT ParameterValue FROM Parameters WHERE ParameterName = N'LocationEqualityFilter')
SELECT u.DisplayName,
u.Reputation
FROM Users u
WHERE Location = @Location
SELECT ParameterValue
INTO #Location
FROM Parameters
WHERE ParameterName = N'LocationEqualityFilter'
SELECT u.DisplayName,
u.Reputation
FROM Users u
JOIN #Location l
ON u.Location = l.ParameterValue
DROP TABLE #Location
I believe the reason these fix the estimate is that in the original query, SQL Server can estimate how many rows in Parameters
match N'LocationEqualityFilter'
but it doesn't know what the value for the ParameterValue
column is so cannot use that for the histogram lookup required to estimate the number of rows in Users
that match that value. However, declaring a variable means SQL Server can use the value of that variable for the histogram lookup and the temp table variant means SQL Server knows the ParameterValue
by means of statistics on the temp table to be able to use it for the Histogram lookup on Users
Whilst both of these fixes solve the problem, neither are permitted in views.
I also tried this pattern which also gives estimation issues
SELECT u.DisplayName,
u.Reputation
FROM Users u
WHERE Location = (SELECT CAST(ParameterValue AS NVARCHAR(100)) FROM Parameters WHERE ParameterName = N'LocationEqualityFilter')
Finally, I tried to create an indexed view which returns the parameter value forLocationEqualityFilter
CREATE OR ALTER VIEW dbo.LocationEqualityFilter
WITH SCHEMABINDING AS
SELECT ParameterValue
FROM dbo.Parameters
WHERE ParameterName = N'LocationEqualityFilter'
GO
CREATE UNIQUE CLUSTERED INDEX IX_LocationEqualityFilter ON dbo.LocationEqualityFilter
(
ParameterValue
)
The Optimizer indeed scans the index but the estimation on the Users table seek is still off.
I can see that the stats that were created for this view represent that the view returns one row and shows us the value:
In the same way the temp table solution does however, I can see that the plan does not load this stat:
The same is true if I rewrite my view to reference the new Indexed view explicitly:
SELECT u.DisplayName,
u.Reputation
FROM Users u
JOIN LocationEqualityFilter p
ON u.Location = p.ParameterValue
However, If I add the WITH (NOEXPAND) hint. It does then get the correct estimate
Is there a way to fix this estimation in the view or some other method to create this data driven parameterisation?
The only options I can think of are:
ParameterName
into its own table, to replicate the temp table fix. There would have to be a trigger on the Parameters
table to update this new table as records changeParameters
table is updated (by means of a trigger)ParameterValue
and use in the WHERE clause and update this definition by a trigger each time the Parameter table is updated. (Assuming SQL Server 2019 and scalar function inlining)You are probably best off creating indexed views. Given that you only have inner join that should be no problem.
Add the primary/unique keys of Users
and Posts
to the view, and create an index on those columns. The reason to use those columns is so that the compiler can efficiently locate them in the indexed view to update them when the base tables are updated.
CREATE UNIQUE CLUSTERED INDEX ux ON dbo.vw_PostsByLocationWithTitle
(UserId, PostId);
This hopefully should allow the server to create statistics on the actual values, at the obvious cost of actually storing and maintaining all that data a second time.
Note that you should always query such views using the WITH (NOEXPAND)
hint, even on Enterprise Edition. To enforce that, you may wish to create another view which queries this view using NOEXPAND
, and only allow the user to use that view.
Also note that indexed views have many restrictions. Among the most notable:
SET
options (such as ANSI_NULLS
) must be enabled when querying the view, or modifying the base tables. Older clients may not have these options set by default.INNER JOIN
is allowed, no LEFT
RIGHT
FULL
or APPLY
.TOP
or OFFSET
.DISTINCT
UNION
or other set operators.GROUP BY
is allowed, but only if the index covers all the non-aggregated columns, and a COUNT_BIG
is included. The only other aggregation function allowed is SUM
. HAVING
is not allowed.