I've been messing around with query performance for a system with pagination to make the data selection as fast as possible, but I've come across something I don't quite understand. To my knowledge, when a limit with an offset is used, MySQL has to iterate through each row before the offset and then discard them, so in theory a query with an offset of 10,000 would be much slower than one without, which is normally true as in this case
select SQL_NO_CACHE * from `customers` where `NetworkID`='\func uuid()'
order by `DateTimeAdded` desc limit 0, 100;
/* finishes in 2.497 seconds */
select SQL_NO_CACHE * from `customers` where `NetworkID`='\func uuid()'
order by `DateTimeAdded` desc limit 10000, 100;
/* finishes in 2.702 seconds */
But, if I use an inner join to join the table to itself with only UserID
column for doing the sorting and limiting, it's consistently faster with the offset of 10,000 than without one, which completely stumps me. Example here would be
select SQL_NO_CACHE * from `customers`
inner join (select `UserID` from `customers` where `NetworkID`='\func uuid()'
order by `DateTimeAdded` desc limit 100)
as `Results` using(`UserID`)
/* finishes in 1.133 seconds */
select SQL_NO_CACHE * from `customers`
inner join (select `UserID` from `customers` where `NetworkID`='\func uuid()'
order by `DateTimeAdded` desc limit 10000, 100)
as `Results` using(`UserID`)
/* finishes in 1.120 seconds */
Why is the query using the offset always faster than the query without the offset?
Explains:
I have posted a Google Docs spreadsheet here with the explains
content here
Note: The tests above were done in PHP looping 20 times each
Note2: customers
is a view, not a base table
Case 1: The optimizer can use an index on the ORDER BY
. LIMIT 10
will be faster than LIMIT 10000,10
because it can stop reading rows sooner.
Case 2: The optimizer cannot (or chooses not to) use an index for the ORDER BY
. In this case, the entire set of rows (after WHERE
) is collected, that set is sorted, and only then the OFFSET
and LIMIT
are applied. In this case the value of OFFSET
makes little difference; most of the time was consumed fetching rows, filtering them, and sorting them.
INDEX(x,y)
SELECT ... WHERE x=2 ORDER BY y LIMIT ... -- case 1
SELECT ... WHERE x=2 AND deleted=0 ORDER BY y LIMIT ... -- case 2
INDEX(NetworkID, DateTimeAdded) -- composite
SELECT ... WHERE NetworkID='...' ORDER BY DateTimeAdded DESC ... -- Case 1
INDEX(NetworkID), INDEX(DateTimeAdded) -- separate
SELECT ... WHERE NetworkID='...' ORDER BY DateTimeAdded DESC ... -- Case 3
Case 3 might be like Case 1 because it might use INDEX(DateTimeAdded)
. Or, of the optimizer chooses to use the other index, then it is a slow Case 2. Anyway, it is not as good as using the composite index that can handle both the WHERE
and the ORDER BY
.
If you can manage to get to Case 1, I recommend you also "remember where you left off" to make Pagination even more efficient. See my Pagination blog.