I have a SQL Server table with a recurring query that I need to improve the performance of it. I need to retrieve rows matching the WHERE
clause shown here. All 3 columns are defined as datetime2
columns.
WHERE GREATEST(dateColumn1, dateColumn2) > dateColumn3
What is the best way to index the table to improve performance?
Should I create a computed column and index that?
What is the best way to index the table to improve performance?
Make your WHERE
predicate clause expression into a computed-column and then index that column.
Last time I checked, SQL Server (incl. Azure SQL) does not support using arbitrary expressions in a CREATE INDEX
statement, so any expressions you want to index must be defined as computed (but not necessarily PERSISTED
) columns.
...I was doing this myself a few months ago, but I ran into a 😩-inducing bug: you actually need to use PERSISTED
if you use LEAST
or GREATEST
in a computed-column expression in SQL Server 2022 or Azure SQL.
Now, because you'll need to define a computed-column that'll be indexed as-is then consider that you could either...
Define the computed-column simply as the result of GREATEST( col1, col2, etc )
- which means you'll still need the >
comparison in your SELECT
query, and potentially experience poor query-plan selection if even the slightest change happens to your DB.
...or: define another computed-column using the result of the >
expression, and INDEX
over that; you'll need to update your SELECT
queries to directly reference the new computed-columns, but this significantly reduces the odds of SQL Server generating a bad query-plan (but YMMV, ofc).
(I assume your dateCol1
and dateCol2
columns are both datetime2(7) NOT NULL
. (If they actually are NULL
-able then this answer probably won't work for you because the computed-column and index here won't handle dateCol1 IS NOT NULL
or dateCol1 IS NOT DISTINCT FROM dateCol2
-style predicates)
Anyway, like so:
-- Before doing anything, ensure we're in a deterministic and standards-compliant environment, which is important for PERSISTED columns:
SET ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER ON
SET XACT_ABORT ON; /* See https://stackoverflow.com/questions/1150032/what-is-the-benefit-of-using-set-xact-abort-on-in-a-stored-procedure */
BEGIN TRANSACTION ddlTxn;
RAISERROR( 'Now running ALTER TABLE...', 0, 1) WITH NOWAIT;
ALTER TABLE dbo.MyTable
ADD
Greatest_1_or_2 AS GREATEST( dateCol1, dateCol2 ) PERSISTED NOT NULL,
DateCol_1_or_2_is_gt_3 AS ( CASE WHEN GREATEST( dateCol1, dateCol2 ) > dateCol3 THEN CONVERT(bit,1) ELSE CONVERT(bit,0) END ) PERSISTED NOT NULL,
DateCol_1_or_2_diff_3 AS ( DATEDIFF( microsecond, GREATEST( dateCol1, dateCol2 ) - dateCol3 ) PERSISTED NOT NULL;
GO /* This `GO` is to stop SSMS complaining that the computed cols ( defined above, but used below) won't exist at parse-time. */
RAISERROR( 'Now running CREATE INDEX...', 0, 1) WITH NOWAIT;
CREATE INDEX IX_Greatest12 ON dbo.MyTable ( Greatest_1_or_2 ) INCLUDE ( dateCol3 );
CREATE INDEX IX_Greatest12IsGT3 ON dbo.MyTable ( DateCol_1_or_2_is_gt_3 );
CREATE INDEX IX_DateCol12Diff3 ON dbo.MyTable ( DateCol_1_or_2_diff_3 );
RAISERROR( 'COMMITing TRANSACTION...', 0, 1) WITH NOWAIT;
COMMIT TRANSACTION ddlTxn;
RAISERROR( 'Ding!', 0, 1) WITH NOWAIT;
Notice that I defined DateCol_1_or_2_is_gt_3
and DateCol_1_or_2_diff_3
- and associated IX_
indexes - so you'll likely want to choose one bunch or the other (or both?) based on your actual queries.
So your queries might now look like this:
SELECT /* ... */ FROM dbo.MyTable AS t WHERE t.Greatest_1_or_2 > t.dateCol3
or:
SELECT /* ... */ FROM dbo.MyTable AS t WHERE t.DateCol_1_or_2_is_gt_3 = 1;
or:
SELECT /* ... */ FROM dbo.MyTable AS t WHERE t.DateCol_1_or_2_diff_3 < 0;
You'll need to test-out and benchmark these 3 different variations for yourself because whichever is faster or preferable overall depends entirely on how the rest of your DB works, and from my vantage-point here, on the Internet, I can't see anything :)