Search code examples
mysqlsqlnested-sets

Avoid Full Table Scan On Paged Nested Set


I have a nested set setup like so:

Node (Id, ParentId, LeftBounds, RightBounds, Level, Name)

LeftBounds has an index on it.

But when I attempt to select paged results,

SELECT * FROM Node ORDER BY LeftBounds ASC LIMIT 500000, 1000

Sql does a full table scan. Is there something else I should look at to avoid a full table scan?

This normally wouldn't be a huge problem, but with a table of several million rows, it takes ~3-5 seconds for the last page to load.


Solution

  • Your LIMIT 5000000, 1000 clause requires MySQL to order your results in your result set, skip a half-million of them, and then display 1000. It seems likely that MySQL has decided that's best done with a table scan. That's not surprising.

    You might try a deferred join operation. The purpose of this is to reduce the size of the result set needing to be ordered. It works like this.

    SELECT Node.*
      FROM Node
      JOIN (
             SELECT id
               FROM Node
              ORDER BY LeftBounds ASC
              LIMIT 500000, 1000
           ) Subset ON Node.id = Subset.id
      ORDER BY Node.LeftBounds ASC
    

    As you can see, this limits the big result set you need to wrangle to fewer columns, specifically id and LeftBounds. It then uses the set of 1000 different id values it finds to retrieve the full records.

    If you make yourself a compound index on (LeftBounds, id) you may very well speed up this query by a lot. But it will still have to skip half a million rows, so your EXPLAIN may say you're doing a full index scan.

    The next thing you might be able to do with this query to speed it up is get rid of SELECT *, instead naming the columns you need. Why does that help? Because it gives a hint at a compound covering index that might help satisfy the query completely. You've mentioned that LeftBounds is unique and therefore a candidate for the JOIN criterion. So, let's explore this with an example. Let's say you want ParentId, LeftBounds, RightBounds, Level, Name in your resultset. Then you could use this query:

    SELECT Node.ParentId, Node.LeftBounds, 
           Node.RightBounds, Node.Level, Node.Name
      FROM Node
      JOIN (
             SELECT LeftBounds
               FROM Node
              ORDER BY LeftBounds ASC
              LIMIT 500000, 1000
           ) Subset ON Node.LeftBounds = Subset.LeftBounds
      ORDER BY Node.LeftBounds ASC
    

    If you have an index on the columns you need, MySQL can satisfy the query from the index. That index should incorporate these columns in this order.

    LeftBounds, ParentId, RightBounds, Level, Name
    

    LeftBounds needs to be first in the index, because that's the column you're using for random access to the index. The point here is to omit having to use the id column to access the table.