Search code examples
mysqlleft-joinquery-optimization

MySQL join not using index when adding a dependent join


When I perform EXPLAIN on the following query to fetch users with bookings

SELECT user.id
FROM user
    LEFT JOIN booking
        ON booking.user_id = user.id
            AND booking.end_timestamp > 1706878800
            AND booking.end_timestamp <= 1706882400
            AND booking.status IN ('pending', 'progress', 'done');

I get the following result:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE user NULL index NULL status 1 NULL 17644 100 Using index
1 SIMPLE booking NULL ref status,user_id_start_timestamp,user_id_status_end_timestamp,end_timestamp user_id_status_end_timestamp 5 db.user.id 403 0.01 Using where; Using index

As you can see, it automatically uses the index user_id_status_end_timestamp for the booking table.

But when I add another join to the query to include address which depends on the booking table:

SELECT user.id
FROM user
    LEFT JOIN booking
        ON booking.user_id = user.id
            AND booking.end_timestamp > 1706878800
            AND booking.end_timestamp <= 1706882400
            AND booking.status IN ('pending', 'progress', 'done')
    LEFT JOIN address ON booking.address_id = address.id;

now it no longer uses any index for the booking table making the query very slow even though it shows the same list of possible keys:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE user NULL index NULL status 1 NULL 17644 100 Using index
1 SIMPLE booking NULL ALL status,user_id_start_timestamp,user_id_status_end_timestamp,end_timestamp NULL NULL NULL 9509525 100 Range checked for each record (index map: 0x98040)
1 SIMPLE address NULL eq_ref PRIMARY PRIMARY 4 db.booking.address_id 1 100 Using index

Of course, if I add USE INDEX (user_id_status_end_timestamp), it works, but why is MySQL not using the index automatically in this case?

I am using MySQL version 5.7.42.

EDIT

Please note that I need both the joins to be left join. The original query is much bigger. I have trimmed it down to focus only on the part where the problem lies for easier understanding and debugging. To explain why they both need to be left joins, here is what one of the where clauses look like:

(booking.id IS NULL OR address.country_id = :countryId)

Solution

  • For your first query:

    SELECT user.id
    FROM user
        LEFT JOIN booking
            ON booking.user_id = user.id
                AND booking.end_timestamp > 1706878800
                AND booking.end_timestamp <= 1706882400
                AND booking.status IN ('pending', 'progress', 'done');
    

    the index user_id_status_end_timestamp (user_id, status, end_timestamp) is covering, meaning there is no additional read from the clustered index required. When you add the join to the address table, the booking.address_id is now required but not available in the index. This now requires an additional read from the clustered index (table data), so MySQL decides the index is not suitable for the whole join (it may still be used for per row range scans).

    If you add address_id to the index, it becomes covering again and should be chosen by the optimizer. Switching the order of status and end_timestamp may also improve performance.

    ALTER TABLE booking
        ADD INDEX idx_user_endtime_stat_address (user_id, end_timestamp, status, address_id);