I need to create statistics from several log tables. Most of the time every hour but sometimes more frequently every 5 minutes.
Selecting rows only by datetime
isn't fast enough for larger logs so I thought I select only rows that are new since the last query by storing the max Id
and reusing it next time:
SELECT TOP(1000) * -- so that it's not too much
FROM [dbo].[Log]
WHERE Id > lastId AND [Timestamp] >= timestampMin
ORDER BY [Id] DESC
My question: is the SQL Server smart enough to:
Id
and then by the Timestamp
even if I change the order of the conditions or does the condition order matter or Id
and then filter them by the Timestamp
.with subquery:
SELECT *
FROM (
SELECT TOP(1000) * FROM [dbo].[Log]
WHERE Id > lastId
ORDER BY [Id] DESC
) t
WHERE t.[TimeStamp] >= timestampMin
The table schema is:
CREATE TABLE [dbo].[Log](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Timestamp] [datetime2](7) NOT NULL,
-- other columns
CONSTRAINT [PK_dbo_Log] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
I tried to use the query plan to find out how it works but it turns out that I cannot read it and I don't understand it.
In your case you don't have an index on TimeStamp
so SQL Server will always use the Clustered Index (Id
) first (the Clustered index seek you see in the query plan) to find the first row matching Id > lastId
and then perform a scan on the remaining rows with the predicate [Timestamp] >= timestampMin
(actually is the other way around since you are sorting in reverse order with DESC
).
If you were to add a index on TimeStamp
SQL Server might use it based on:
the cardinality of the predicate [Timestamp] >= timestampMin
. Please note that cardinality is always an estimate based on statistics (see https://msdn.microsoft.com/en-us/library/ms190397.aspx) and the cardinality estimator (it changed from SQL 2012 to 2014+, see https://msdn.microsoft.com/en-us/library/dn600374.aspx).
how covering the non-clustered index is (since you are using the wildcard it would hardly matters anyway). If the non-clustered index is non covering SQL Server would have to add a Key Lookup
(see https://technet.microsoft.com/en-us/library/bb326635(v=sql.105).aspx) operator in order to retrieve all the fields (or perform a join). This will likely make the index not worthwhile for this query.
Also note that your two queries - the one with subplan and the one without - are functionally different. The first will give you the first 1000 rows the have both Id > lastId AND [Timestamp] >= timestampMin
. The second will give you only the rows having [Timestamp] >= timestampMin
from the first 1000 rows having Id > lastId
. So, for example, you might get 1000 rows from the first query but less than that on the second one.