I have a nested set setup like so:
Node (Id, ParentId, LeftBounds, RightBounds, Level, Name)
LeftBounds
has an index on it.
But when I attempt to select paged results,
SELECT * FROM Node ORDER BY LeftBounds ASC LIMIT 500000, 1000
Sql does a full table scan. Is there something else I should look at to avoid a full table scan?
This normally wouldn't be a huge problem, but with a table of several million rows, it takes ~3-5 seconds for the last page to load.
Your LIMIT 5000000, 1000
clause requires MySQL to order your results in your result set, skip a half-million of them, and then display 1000. It seems likely that MySQL has decided that's best done with a table scan. That's not surprising.
You might try a deferred join operation. The purpose of this is to reduce the size of the result set needing to be ordered. It works like this.
SELECT Node.*
FROM Node
JOIN (
SELECT id
FROM Node
ORDER BY LeftBounds ASC
LIMIT 500000, 1000
) Subset ON Node.id = Subset.id
ORDER BY Node.LeftBounds ASC
As you can see, this limits the big result set you need to wrangle to fewer columns, specifically id
and LeftBounds
. It then uses the set of 1000 different id
values it finds to retrieve the full records.
If you make yourself a compound index on (LeftBounds, id)
you may very well speed up this query by a lot. But it will still have to skip half a million rows, so your EXPLAIN
may say you're doing a full index scan.
The next thing you might be able to do with this query to speed it up is get rid of SELECT *
, instead naming the columns you need. Why does that help? Because it gives a hint at a compound covering index that might help satisfy the query completely. You've mentioned that LeftBounds
is unique and therefore a candidate for the JOIN
criterion. So, let's explore this with an example. Let's say you want ParentId, LeftBounds, RightBounds, Level, Name
in your resultset. Then you could use this query:
SELECT Node.ParentId, Node.LeftBounds,
Node.RightBounds, Node.Level, Node.Name
FROM Node
JOIN (
SELECT LeftBounds
FROM Node
ORDER BY LeftBounds ASC
LIMIT 500000, 1000
) Subset ON Node.LeftBounds = Subset.LeftBounds
ORDER BY Node.LeftBounds ASC
If you have an index on the columns you need, MySQL can satisfy the query from the index. That index should incorporate these columns in this order.
LeftBounds, ParentId, RightBounds, Level, Name
LeftBounds
needs to be first in the index, because that's the column you're using for random access to the index. The point here is to omit having to use the id
column to access the table.