Search code examples
mysqlperformancecorrelated-subquerymysql-dependent-subquery

How do I improve the performance of a MySQL query that has a dependent subquery that isn't?


I am working with a table in MySQL that defines a tree hierarchy using the "adjacency list" method, which should be fine for my purposes.

I need to compute the maximum of some value over all children using the (fast) query

SELECT MAX(N.someValue) AS rate
FROM `nodes` N
WHERE N.parent = <some node id>;

Sometimes I am not as lucky and have to work with descendants of the children (its defined, and always references some leaf node in the branch of that tree).

SELECT MAX(N.someValue) AS rate
FROM `nodes` N
WHERE N.id IN (SELECT N2.descendant FROM `nodes` N2 WHERE N2.parent = <some node id>);

This second query is quite slow. The number of children for a given parent is quite low, rarely more than 10, never more than 20. It does not appear to be a correlated subquery by my eye, though EXPLAIN says the subquery is dependent. I am testing in MySQL 5.1. nodes.id is the primary key and there is a BTREE index on nodes.parent. Is there a way I can improve the speed of this query?


Solution

  • I don't see anything that specifically explains why this query is slow, but I do see that it could be restructured using a JOIN instead of a subquery. Something like this (I've reversed the table aliases because that's how my brain works):

    SELECT MAX(n2.someValue) AS `rate`
    FROM `nodes` n1
    JOIN `nodes` n2 ON n1.descendant = n2.id
    WHERE n1.parent = '<some static id>'
    

    I don't know why your example query was slow, but this structure shouldn't pose performance issues as long as the proper fields are indexed and of course depending on how many rows are being fetched.