Search code examples
sql-servert-sqldatetimedatabase-performance

Selecting rows first by Id then by datetime - with or without a subquery?


I need to create statistics from several log tables. Most of the time every hour but sometimes more frequently every 5 minutes.

Selecting rows only by datetime isn't fast enough for larger logs so I thought I select only rows that are new since the last query by storing the max Id and reusing it next time:

SELECT TOP(1000) * -- so that it's not too much
FROM [dbo].[Log]
WHERE Id > lastId AND [Timestamp] >= timestampMin
ORDER BY [Id] DESC

select1

My question: is the SQL Server smart enough to:

  • first filter the rows by Id and then by the Timestamp even if I change the order of the conditions or does the condition order matter or
  • do I need a subquery to first select the rows by Id and then filter them by the Timestamp.

with subquery:

SELECT *
FROM (
    SELECT TOP(1000) * FROM [dbo].[Log]
    WHERE Id > lastId
    ORDER BY [Id] DESC
) t
WHERE t.[TimeStamp] >= timestampMin

select2

The table schema is:

CREATE TABLE [dbo].[Log](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Timestamp] [datetime2](7) NOT NULL,
    -- other columns    
CONSTRAINT [PK_dbo_Log] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

I tried to use the query plan to find out how it works but it turns out that I cannot read it and I don't understand it.


Solution

  • In your case you don't have an index on TimeStamp so SQL Server will always use the Clustered Index (Id) first (the Clustered index seek you see in the query plan) to find the first row matching Id > lastId and then perform a scan on the remaining rows with the predicate [Timestamp] >= timestampMin (actually is the other way around since you are sorting in reverse order with DESC).

    If you were to add a index on TimeStamp SQL Server might use it based on:

    1. the cardinality of the predicate [Timestamp] >= timestampMin. Please note that cardinality is always an estimate based on statistics (see https://msdn.microsoft.com/en-us/library/ms190397.aspx) and the cardinality estimator (it changed from SQL 2012 to 2014+, see https://msdn.microsoft.com/en-us/library/dn600374.aspx).

    2. how covering the non-clustered index is (since you are using the wildcard it would hardly matters anyway). If the non-clustered index is non covering SQL Server would have to add a Key Lookup (see https://technet.microsoft.com/en-us/library/bb326635(v=sql.105).aspx) operator in order to retrieve all the fields (or perform a join). This will likely make the index not worthwhile for this query.

    Also note that your two queries - the one with subplan and the one without - are functionally different. The first will give you the first 1000 rows the have both Id > lastId AND [Timestamp] >= timestampMin. The second will give you only the rows having [Timestamp] >= timestampMin from the first 1000 rows having Id > lastId. So, for example, you might get 1000 rows from the first query but less than that on the second one.