When I perform EXPLAIN on the following query to fetch users with bookings
SELECT user.id
FROM user
LEFT JOIN booking
ON booking.user_id = user.id
AND booking.end_timestamp > 1706878800
AND booking.end_timestamp <= 1706882400
AND booking.status IN ('pending', 'progress', 'done');
I get the following result:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | user | NULL | index | NULL | status | 1 | NULL | 17644 | 100 | Using index |
1 | SIMPLE | booking | NULL | ref | status,user_id_start_timestamp,user_id_status_end_timestamp,end_timestamp | user_id_status_end_timestamp | 5 | db.user.id | 403 | 0.01 | Using where; Using index |
As you can see, it automatically uses the index user_id_status_end_timestamp
for the booking
table.
But when I add another join to the query to include address
which depends on the booking
table:
SELECT user.id
FROM user
LEFT JOIN booking
ON booking.user_id = user.id
AND booking.end_timestamp > 1706878800
AND booking.end_timestamp <= 1706882400
AND booking.status IN ('pending', 'progress', 'done')
LEFT JOIN address ON booking.address_id = address.id;
now it no longer uses any index for the booking
table making the query very slow even though it shows the same list of possible keys:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | user | NULL | index | NULL | status | 1 | NULL | 17644 | 100 | Using index |
1 | SIMPLE | booking | NULL | ALL | status,user_id_start_timestamp,user_id_status_end_timestamp,end_timestamp | NULL | NULL | NULL | 9509525 | 100 | Range checked for each record (index map: 0x98040) |
1 | SIMPLE | address | NULL | eq_ref | PRIMARY | PRIMARY | 4 | db.booking.address_id | 1 | 100 | Using index |
Of course, if I add USE INDEX (user_id_status_end_timestamp)
, it works, but why is MySQL not using the index automatically in this case?
I am using MySQL version 5.7.42.
EDIT
Please note that I need both the joins to be left join. The original query is much bigger. I have trimmed it down to focus only on the part where the problem lies for easier understanding and debugging. To explain why they both need to be left joins, here is what one of the where clauses look like:
(booking.id IS NULL OR address.country_id = :countryId)
For your first query:
SELECT user.id
FROM user
LEFT JOIN booking
ON booking.user_id = user.id
AND booking.end_timestamp > 1706878800
AND booking.end_timestamp <= 1706882400
AND booking.status IN ('pending', 'progress', 'done');
the index user_id_status_end_timestamp (user_id, status, end_timestamp)
is covering, meaning there is no additional read from the clustered index required. When you add the join to the address
table, the booking.address_id
is now required but not available in the index. This now requires an additional read from the clustered index (table data), so MySQL decides the index is not suitable for the whole join (it may still be used for per row range scans).
If you add address_id
to the index, it becomes covering again and should be chosen by the optimizer. Switching the order of status
and end_timestamp
may also improve performance.
ALTER TABLE booking
ADD INDEX idx_user_endtime_stat_address (user_id, end_timestamp, status, address_id);