Search code examples
mysqlperformanceinner-joinlimitoffset

MySQL Select with Offset is Faster Than no Offset


I've been messing around with query performance for a system with pagination to make the data selection as fast as possible, but I've come across something I don't quite understand. To my knowledge, when a limit with an offset is used, MySQL has to iterate through each row before the offset and then discard them, so in theory a query with an offset of 10,000 would be much slower than one without, which is normally true as in this case

select SQL_NO_CACHE * from `customers` where `NetworkID`='\func uuid()' 
    order by `DateTimeAdded` desc limit 0, 100;
/* finishes in 2.497 seconds */

 select SQL_NO_CACHE * from `customers` where `NetworkID`='\func uuid()' 
   order by `DateTimeAdded` desc limit 10000, 100;
 /* finishes in 2.702 seconds */

But, if I use an inner join to join the table to itself with only UserID column for doing the sorting and limiting, it's consistently faster with the offset of 10,000 than without one, which completely stumps me. Example here would be

select SQL_NO_CACHE * from `customers` 
    inner join (select `UserID` from `customers` where `NetworkID`='\func uuid()' 
        order by `DateTimeAdded` desc limit 100) 
    as `Results` using(`UserID`)
/* finishes in 1.133 seconds */

select SQL_NO_CACHE * from `customers` 
    inner join (select `UserID` from `customers` where `NetworkID`='\func uuid()' 
        order by `DateTimeAdded` desc limit 10000, 100) 
    as `Results` using(`UserID`)
/* finishes in 1.120 seconds */

Why is the query using the offset always faster than the query without the offset?


Explains:

I have posted a Google Docs spreadsheet here with the explains content here

Note: The tests above were done in PHP looping 20 times each

Note2: customers is a view, not a base table


Solution

  • Case 1: The optimizer can use an index on the ORDER BY. LIMIT 10 will be faster than LIMIT 10000,10 because it can stop reading rows sooner.

    Case 2: The optimizer cannot (or chooses not to) use an index for the ORDER BY. In this case, the entire set of rows (after WHERE) is collected, that set is sorted, and only then the OFFSET and LIMIT are applied. In this case the value of OFFSET makes little difference; most of the time was consumed fetching rows, filtering them, and sorting them.

    INDEX(x,y)
    SELECT ... WHERE x=2               ORDER BY y LIMIT ... -- case 1
    SELECT ... WHERE x=2 AND deleted=0 ORDER BY y LIMIT ... -- case 2
    
    INDEX(NetworkID, DateTimeAdded)         -- composite
    SELECT ... WHERE NetworkID='...' ORDER BY DateTimeAdded DESC ... -- Case 1
    
    INDEX(NetworkID), INDEX(DateTimeAdded)  -- separate
    SELECT ... WHERE NetworkID='...' ORDER BY DateTimeAdded DESC ... -- Case 3
    

    Case 3 might be like Case 1 because it might use INDEX(DateTimeAdded). Or, of the optimizer chooses to use the other index, then it is a slow Case 2. Anyway, it is not as good as using the composite index that can handle both the WHERE and the ORDER BY.

    If you can manage to get to Case 1, I recommend you also "remember where you left off" to make Pagination even more efficient. See my Pagination blog.

    More on creating INDEXes.