Search code examples
databaseoptimizationindexingsql-server-2000

Does this optimisation always work


A database guru has suggested refactoring a query:

SELECT * FROM MyTable 
WHERE UnIndexedDate 
BETWEEN '2012-08-01' AND '2012-09-01'

to

SELECT * FROM MyTable
WHERE IndexedID 
BETWEEN (SELECT MIN(IndexedID) FROM MyTable WHERE UnIndexedDate BETWEEN '2012-08-01' AND  '2012-08-30')
AND (SELECT MAX(IndexedID) FROM MyTable WHERE UnIndexedDate BETWEEN '2012-08-01' AND  '2012-08-30')

Note that the table is not indexed on the UnIndexedDate column but is indexed on the IndexedID column. This table has several millions of records in it.

It clearly does improve the speed of the query and I suspect it is because the subqueries will only be performed once and may even be far more efficient in some way because they involve an indexed field.

My question is, does this apply generally across most databases or just the SQL2000 one here.

Added: BTW IndexedID is numeric, unique and strictly increasing.


Solution

  • Your optimization will only work if the unindexed date column increases in conjuntion with the indexed id column; this is an assumption of DESIGN, not of the platform on which it runs. In other words, it sounds like the unindexed date is correlated with the indexed ID, as in the ID is an autonumber that is inserted at the time of insertion, and the unindexed date is the time of insertion. In that narrow case, then @MartinSmith's comment will hold true; your results may vary if the unindexed date column and the indexed column have no correlation.

    Without looking at the execution plans, I'd bet that you're seeing two scans on the table to retrieve the ID's (a very narrow and small data set), and then an index seek to return the actual rows. It'll work as long as the assumed relationship holds true. However, if someone ever updates as date column (and breaks the relationship between date and id), your results will no longer be accurate because the MIN or MAX id values may no longer be in the range.