I'm trying to optimize join by stopping when finding one record for each nested loop.
MySQL version is 8.0.29.
I found out from MySQL JOIN with LIMIT 1 on joined table that I can do that with a one-column equality comparison, but I am struggling to do it with a composite primary key.
create table foo (
id int unsigned,
fooValue int,
barValue binary(32),
primary key (id),
index fooV (fooValue)
);
create table bar (
id int unsigned,
barValueChecksum int unsigned,
barValue binary(32),
primary key (id, barValueChecksum),
index checksum (barValueChecksum)
);
I'm trying to find out all bar
s for foo
s with certain value
explain select * from foo
left join bar on (bar.id, bar.barValueChecksum)
= (select b.id, b.barValueChecksum from bar b
where b.barValueChecksum = crc32(foo.barValue)
and b.barValue = foo.barValue)
where foo.fooValue = 10;
but it returns
id|select_type |table|partitions|type|possible_keys|key |key_len|ref |rows|filtered|Extra |
--+------------------+-----+----------+----+-------------+--------+-------+-----+----+--------+------------------------------------------+
1|PRIMARY |foo | |ref |fooV |fooV |5 |const| 1| 100.0| |
1|PRIMARY |bar | |ALL | | | | | 1| 100.0|Using where; Using join buffer (hash join)|
2|DEPENDENT SUBQUERY|b | |ref |checksum |checksum|4 |func | 1| 100.0|Using index condition; Using where |
The optimizer tries to use hash join, and, even worse, PRIMARY
does not appear in section possible_keys
for table bar
.
I can achieve a result similar to what I want by separating comparisons, but it runs the subquery twice
explain select * from foo
left join bar on (bar.id
= (select b.id from bar b
where b.barValueChecksum = crc32(foo.barValue)
and b.barValue = foo.barValue))
and (bar.barValueChecksum
= (select b.barValueChecksum from bar
b where b.barValueChecksum = crc32(foo.barValue)
and b.barValue = foo.barValue))
where foo.fooValue = 10;
which results in
id|select_type |table|partitions|type |possible_keys |key |key_len|ref |rows|filtered|Extra |
--+------------------+-----+----------+------+----------------+--------+-------+---------+----+--------+----------------------------------+
1|PRIMARY |foo | |ref |fooV |fooV |5 |const | 1| 100.0| |
1|PRIMARY |bar | |eq_ref|PRIMARY,checksum|PRIMARY |8 |func,func| 1| 100.0|Using where |
3|DEPENDENT SUBQUERY|b | |ref |checksum |checksum|4 |func | 1| 100.0|Using index condition; Using where|
2|DEPENDENT SUBQUERY|b | |ref |checksum |checksum|4 |func | 1| 100.0|Using index condition; Using where|
What would be a solution?
Starting in MySQL 8.0.14 you can use a lateral join for this. It could be quite efficient with the right indexes.
For example you can do:
select *
from foo
left join lateral (
select b.*
from bar b
where b.barValueChecksum = crc32(foo.barValue)
and b.barValue = foo.barValue
limit 1
) on true
where foo.fooValue = 10;
The following indexes can speed up the filter on the main table and the join:
create index ix1 on foo (fooValue); -- You already have this index
create index ix2 on bar (barValueChecksum, barValue); -- You don't have this one