Search code examples
mysqlsqlquery-optimization

where like and order by on different tables/columns


For information, on the following examples, big_table is composed of millions of rows and small_table of hundreds.

Here is the basic query i'm trying to do:

SELECT b.id 
    FROM big_table b 
    LEFT JOIN small_table s 
    ON b.small_id=s.id
    WHERE s.name like 'something%' 
    ORDER BY b.name 
    LIMIT 10, 10;

This is slow and I can understand why both index can't be used.

My initial idea was to split the query into parts.

This is fast:

SELECT id FROM small_table WHERE name like 'something%';

This is also fast:

SELECT id FROM big_table WHERE small_id IN (1, 2) ORDER BY name LIMIT 10, 10;

But, put together, it becomes slow:

SELECT id FROM big_table 
    WHERE small_id 
    IN (
        SELECT id 
        FROM small_table WHERE name like 'something%'
    ) 
    ORDER BY name 
    LIMIT 10, 10;

Unless the subquery is re-evaluated for every row, it shouldn't be slower than executing both query separately right?

I'm looking for any help optimizing the initial query and understanding why the second one doesn't work.


EXPLAIN result for the last query :

| id   | select_type | table | type | possible_keys | key | key_len | ref | rows  | Extra
| 1 | PRIMARY | small_table | range | PRIMARY, ix_small_name | ix_small_name | 768 | NULL | 1 | Using where; Using index; Using temporary; Using filesort |
| 1 | PRIMARY | big_table | ref | ix_join_foreign_key   | ix_join_foreign_key | 9 | small_table.id | 11870 | |

temporary solution :

SELECT id FROM big_table ignore index(ix_join_foreign_key)
    WHERE small_id 
    IN (
        SELECT id 
        FROM small_table ignore index(PRIMARY)
        WHERE name like 'something%'
    ) 
    ORDER BY name 
    LIMIT 10, 10;

(result & explain is the same with an EXISTS instead of IN)

EXPLAIN output becomes:

| 1 | PRIMARY | big_table | index  | NULL | ix_big_name | 768 | NULL | 20 | |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 8 | func | 1 | |
| 2 | MATERIALIZED | small_table | range | ix_small_name | ix_small_name | 768 | NULL | 1 | Using where; Using index |

if anyone has a better solution, I'm still interested.


Solution

  • The problem that you are facing is that you have conditions on the small table but are trying to avoid a sort in the large table. In MySQL, I think you need to do at least a full table scan.

    One step is to write the query using exists, as others have mentioned:

    SELECT b.id
    FROM big_table b
    WHERE EXISTS (SELECT 1
                  FROM small_table s
                  WHERE s.name LIKE 'something%' AND s.id = b.small_id
                 )
    ORDER BY b.name;
    

    The question is: Can you trick MySQL into doing the ORDER BY using an index? One possibility is to use the appropriate index. In this case, the appropriate index is: big_table(name, small_id, id) and small_table(id, name). The ordering of the keys in the index is important. Because the first is a covering index, MySQL might read through the index in order by name, choosing the appropriate ids.