Search code examples
sql-serverquery-optimization

How can a query with ORDER BY run faster than the same query without ordering?


I have a SQL Server database with EventJournal table with the following columns:

Ordering (bigint, primary key)
PersistenceID (nvarchar(255))
SequenceNr (bigint)
Payload (varbinary(max))

Other columns are omitted for clarity. In addition to the primary key on Ordering there is a unique constraint on PersistenceID+SequenceNr.

If I run a query

select top 100 * from EventJournal where PersistenceID like 'msc:%'

... it takes very long time to execute (the table contains more than 100M rows)

But if I add ordering to results:

select top 100 * from EventJournal where PersistenceID like 'msc:%' order by Ordering

... then it returns the result immediately.

The execution plan for both queries are the same and in essence is the clustered index scan on PK. Then why does the first query take long time to execute?

Here's the table definition:

CREATE TABLE [dbo].[EventJournal](
    [PersistenceID] [nvarchar](255) NOT NULL,
    [SequenceNr] [bigint] NOT NULL,
    [IsDeleted] [bit] NOT NULL,
    [Manifest] [nvarchar](500) NOT NULL,
    [Payload] [varbinary](max) NOT NULL,
    [Timestamp] [bigint] NOT NULL,
    [Tags] [nvarchar](100) NULL,
    [Ordering] [bigint] IDENTITY(1,1) NOT NULL,
    [SerializerId] [int] NULL,
 CONSTRAINT [PK_EventJournal] PRIMARY KEY CLUSTERED 
(
    [Ordering] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
 CONSTRAINT [QU_EventJournal] UNIQUE NONCLUSTERED 
(
    [PersistenceID] ASC,
    [SequenceNr] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

This it 1st plan:

https://www.brentozar.com/pastetheplan/?id=SJ3kCo-Fv

And here's the 2nd one:

https://www.brentozar.com/pastetheplan/?id=Hy1B0ibtP


Solution

  • As I mentioned in my comments, the plans are different, the difference is in access method:

    The first plan uses unordered scan:

    enter image description here

    And the second plan uses ordered scan:

    enter image description here

    By the way, the other answer suggests useless index.

    SQL Server will NOT use this index, as it's equivalent to the non-clustered index already in place. As the index QU_EventJournal on ([PersistenceID], [SequenceNr]) was not used, the same way the index on (PersistenceID, Ordering) will not be used. Both of these indexes has PersistenceID, Ordering in the index as Ordering is clustered index key, so it is presented in index on ([PersistenceID], [SequenceNr]) even if you don't see it in the definition. The suggested index will be also bigger as it is not defined as unique, and the sizes of other fields are the same: Ordering is bigint, SequenceNr is bigint.

    It's wrong to think that in index on 2 fields the second field(Ordering) can be used to avoid the sort in the second query, it's not true.

    For example the index on PersistenceID, Ordering can have rows like these:

    • msc:123, 100
    • msc:124, 5
    • msc:124, 6
    • msc:125, 1

    I hope you clearly see that the index is ordered by PersistenceID, Ordering, but the result of the second query is expected to be

    • msc:125, 1
    • msc:124, 5
    • msc:124, 6
    • msc:123, 100

    So the SORT operator is needed, so this index will not be used.

    Now to your question:

    shouldn't lack of ORDER BY be used by the query analyzer as an opportunity to build more efficient execution plan without ordering constraints

    Yes you are correct, without order by server is free to choose both the ordered and unordered scan, and yes you are right in this:

    I also don't understand why using TOP without ORDER BY is a bad practice in case I want ANY N rows from the result

    When you don't need top N ordered by, because you just want to see what kind of records have 'msc:' in them, you should not add order by because it could cause a SORT in your plan.

    And to your main question:

    Then why does the first query take long time to execute?

    The answer is: this was pure coincidence. Your data is laying in way that the rows with 'msc:' in them go first, in the order defined by Ordering. And if you scan your index not in order they are just in the middle or at the end of the table.

    If you seek for another pattern in PersistenceID the unordered scan will be faster