Search code examples
sqlsql-servert-sqldatabase-performance

Is there any performance benefit if we exclude null rows in where clause in Select query


Table Foo structure:

ID – PK
SampleCol – Can have null and is not indexed
SampleCol2, SampleCol3, etc

Table Foo has some 100,000+ rows with many SampleCol as NULL.

SQL query #1:

select * 
from Foo 
where SampleCol = 'Test'

SQL query #2:

select * 
from Foo 
where SampleCol is not null and SampleCol = 'Test'

Does query #2 have any performance benefit over query #1? Or any suggestions on how to improve performance of these SQL queries?

Thanks!


Solution

  • No, it will not help -- although it could make things slightly (probably unmeasurably) worse.

    The condition SampleCol = 'Test' is exactly the comparison you want to make. So, the database has to make this comparison, in some fashion, for every row that is returned.

    There are basically two situations. Without an index, your query needs to do a full table scan. Two comparisons on each row (one for NULL and one for the value) take longer than a single comparison. To be honest, some databases might optimize this just to the equality comparison, so the two could be equal. I don't think SQL Server does this elimination but it might.

    With an index, SQL Server will use an index for the = comparison. It might then do an additional comparison against NULL (even though that is redundant). You run into a bigger issue here, though: The more complicated the predicate the more likely the optimizer gets confused and doesn't use an index.

    There is a third case where your column is used for partitioning. I do not know if the redundant comparison would have an impact on partition pruning.

    You want your where comparisons to be simple. In general, you want to let the optimizer do its work. On very rare occasions, you might want to give the optimizer some help, but that is very, very, very rare -- and generally involves functions that are much more expensive to run than simple comparisons.