Search code examples
sqlmysqljoinsubquery

optimize join using equality comparison on 2 columns with results of dependent subquery


I'm trying to optimize join by stopping when finding one record for each nested loop.

MySQL version is 8.0.29.

I found out from MySQL JOIN with LIMIT 1 on joined table that I can do that with a one-column equality comparison, but I am struggling to do it with a composite primary key.

create table foo (
  id int unsigned, 
  fooValue int, 
  barValue binary(32), 
  primary key (id), 
  index fooV (fooValue)
);

create table bar (
  id int unsigned, 
  barValueChecksum int unsigned, 
  barValue binary(32), 
  primary key (id, barValueChecksum), 
  index checksum (barValueChecksum)
);

I'm trying to find out all bar s for foo s with certain value

explain select * from foo
    left join bar on (bar.id, bar.barValueChecksum)
        = (select b.id, b.barValueChecksum from bar b 
                   where b.barValueChecksum = crc32(foo.barValue)
                     and b.barValue = foo.barValue)
where foo.fooValue = 10;

but it returns

id|select_type       |table|partitions|type|possible_keys|key     |key_len|ref  |rows|filtered|Extra                                     |
--+------------------+-----+----------+----+-------------+--------+-------+-----+----+--------+------------------------------------------+
 1|PRIMARY           |foo  |          |ref |fooV         |fooV    |5      |const|   1|   100.0|                                          |
 1|PRIMARY           |bar  |          |ALL |             |        |       |     |   1|   100.0|Using where; Using join buffer (hash join)|
 2|DEPENDENT SUBQUERY|b    |          |ref |checksum     |checksum|4      |func |   1|   100.0|Using index condition; Using where        |

The optimizer tries to use hash join, and, even worse, PRIMARY does not appear in section possible_keys for table bar.

I can achieve a result similar to what I want by separating comparisons, but it runs the subquery twice

explain select * from foo
    left join bar on (bar.id 
                            = (select b.id from bar b 
                                where b.barValueChecksum = crc32(foo.barValue) 
                                      and b.barValue = foo.barValue))
                 and (bar.barValueChecksum 
                            = (select b.barValueChecksum from bar 
                                b where b.barValueChecksum = crc32(foo.barValue) 
                                        and b.barValue = foo.barValue))
where foo.fooValue = 10;

which results in

id|select_type       |table|partitions|type  |possible_keys   |key     |key_len|ref      |rows|filtered|Extra                             |
--+------------------+-----+----------+------+----------------+--------+-------+---------+----+--------+----------------------------------+
 1|PRIMARY           |foo  |          |ref   |fooV            |fooV    |5      |const    |   1|   100.0|                                  |
 1|PRIMARY           |bar  |          |eq_ref|PRIMARY,checksum|PRIMARY |8      |func,func|   1|   100.0|Using where                       |
 3|DEPENDENT SUBQUERY|b    |          |ref   |checksum        |checksum|4      |func     |   1|   100.0|Using index condition; Using where|
 2|DEPENDENT SUBQUERY|b    |          |ref   |checksum        |checksum|4      |func     |   1|   100.0|Using index condition; Using where|

What would be a solution?


Solution

  • Starting in MySQL 8.0.14 you can use a lateral join for this. It could be quite efficient with the right indexes.

    For example you can do:

    select * 
    from foo
    left join lateral (
      select b.*
      from bar b 
      where b.barValueChecksum = crc32(foo.barValue)
        and b.barValue = foo.barValue
      limit 1
    ) on true
    where foo.fooValue = 10;
    

    The following indexes can speed up the filter on the main table and the join:

    create index ix1 on foo (fooValue); -- You already have this index
    
    create index ix2 on bar (barValueChecksum, barValue); -- You don't have this one