Search code examples
sql-servert-sqldatabase-performance

select top 1 * vs select top 1 1


I know there's a lot of these questions, but I can't find one that relates to my question.

Looking at this question, Is Changing IF EXIST(SELECT 1 FROM ) to IF EXIST(SELECT TOP 1 FROM ) has any side effects?

Specifically referring to this section in the answer:

select * from sys.objects
select top 1 * from sys.objects
select 1 where exists(select * from sys.objects)
select 1 where exists(select top 1 * from sys.objects)

I'm running some of my own tests to properly understand it. As indicated in the answer:

select 1 where exists(select top 1 * from sys.objects)
select 1 where exists(select top 1 1 from sys.objects)

both cause the same execution plan and also causes the same plan as

select 1 where exists(select * from sys.objects)
select 1 where exists(select 1 from sys.objects)

From my research into questions like this one, “SELECT TOP 1 1” VS “IF EXISTS(SELECT 1”. I'm deducing that this is the agreed best practice:

select 1 where exists(select * from sys.objects)

My first question is why is this preferred over this:

select 1 where exists(select 1 from sys.objects)

In trying to understand it, I broke them down to their more basic expressions (I'm using 'top 1' to mimic an execution plan resembling exists):

select top 1 * from sys.objects
select top 1 1 from sys.objects

I now see that the first is 80% of the execution time (relative to the batch of 2) whilst the second is only 20%. Would it then not be better practice to use

select 1 where exists(select 1 from sys.objects)

as it can be applied to both scenarios and thereby reduce possible human error?


Solution

  • SQL Server detects EXISTS predicate relatively early in the query compilation / optimisation process, and eliminates actual data retrieval for such clauses, replacing them with existence checks. So your assumption:

    I now see that the first is 80% of the execution time (relative to the batch of 2) whilst the second is only 20%.

    is wrong, because in the preceding comparison you have actually retrieved some data, which doesn't happen if the query is put into the (not) exists predicate.

    Most of the time, there is no difference how to test for the existence of rows, except for a single yet important catch. Suppose you say:

    if exists (select * from dbo.SomeTable)
    ...
    

    somewhere in the code module (view, stored procedure, function etc.). Then, later, when someone else will decide to put WITH SCHEMABINDING clause into this code module, SQL Server will not allow it and instead of possibly binding to the current list of columns it will throw an error:

    Msg 1054, Level 15, State 7, Procedure BoundView, Line 6
    Syntax '*' is not allowed in schema-bound objects.

    So, in short:

    if exists (select 0 from ...)
    

    is a safest, fastest and one-size-fits-all way for existence checks.