Search code examples
mysqlsqlselectleft-joininnodb

MySql - How to optimize query with indexes?


We're trying to get the latest 10 notifications for a follower from the database. There are a few joins we do to ensure we get the correct set of notifications for the follower. If the person they follow (their leader) added a new post, the follower should only get the notification for posts that were added after they started following the leader (no sense in showing them their leader's older posts as new notifications). The other join is to make sure we get the notification's read_at time,so the follower knows if it was already read or not. Here is the query, but it takes ~9 secs which is too slow. It should ideally only take a few ms, specially with the indexes:

Query:

SELECT nf.id, nf.uuid, nf.leader_id, nf.data, nf.created_at, nfr.read_at
FROM notification_followers nf
LEFT JOIN user_follows uf ON uf.leader_id = nf.leader_id AND uf.follower_id = 14 AND uf.follow_status = 'follow'
LEFT JOIN notification_followers_read nfr ON nf.id = nfr.notification_followers_id AND nfr.follower_id = 14
WHERE (nf.created_at > uf.created_at)
ORDER BY nf.id DESC
LIMIT 10

Indexes:

ALTER TABLE `notification_followers` ADD INDEX `nf_lid_ca_id_idx` (`leader_id`,`created_at`,`id`);
ALTER TABLE `user_follows` ADD KEY`uf_fid_lid_fs_ca_idx` (`follower_id`,`leader_id`,`follow_status`,`created_at`)
ALTER TABLE `notification_followers_read` ADD INDEX `nfr_fid_nfid_ra_idx` (`follower_id`,`notification_followers_id`,`read_at`);

Explain:

enter image description here

Correct Results (take ~9 secs):

enter image description here

SQL DUMP:

SQL DUMP TO REPRODUCE LOCALLY just create speed_test database locally and import file to see the slow query issue live with all the table data (~100K rows).

How can we optimize the above to get the correct results within a few ms?


Solution

  • For this query:

    SELECT nf.id, nf.uuid, nf.leader_id, nf.data, nf.created_at, nfr.read_at
    FROM notification_followers nf JOIN
         user_follows uf 
         ON uf.leader_id = nf.leader_id AND uf.follower_id = 14 AND
            uf.follow_status = 'follow' LEFT JOIN 
         notification_followers_read nfr
         ON nf.id = nfr.notification_followers_id AND nfr.follower_id = 14
    WHERE nf.created_at > uf.created_at
    ORDER BY nf.id DESC
    LIMIT 10;
    

    I would recommend indexes on user_follower(leader_id, follower_id, follow_status, created_at) and notification_followers_read(notification_followers_id, follower_id, read_at). The order of the columns in the indexes matters.

    Notice that I changed the first JOIN to an inner join, because the WHERE clause turns it into one anyway.

    Hmmm, let's try rewriting the query:

    SELECT nf.id, nf.uuid, nf.leader_id, nf.data, nf.created_at,
           (SELECT nfr.read_at
            FROM notification_followers_read nfr
            WHERE nf.id = nfr.notification_followers_id AND nfr.follower_id = 14
           ) nfr
    FROM (SELECT nf.*
          FROM notification_followers nf 
          WHERE EXISTS (SELECT 1
                        FROM user_follows uf 
                        WHERE uf.leader_id = nf.leader_id AND uf.follower_id = 14 AND
                              uf.follow_status = 'follow' AND nf.created_at > uf.created_at
                       )
          ORDER BY nf.id DESC
          LIMIT 10
         ) nf;
    

    For this, you want to be sure you have an index on notification_followers(id) as well.

    Depending on your data, the inner subquery might be faster with this approach:

    FROM (SELECT nf.*
          FROM user_follows uf JOIN
               notification_followers nf 
               ON uf.leader_id = nf.leader_id AND nf.created_at > uf.created_at
          WHERE uf.follower_id = 14 AND uf.follow_status = 'follow' 
          ORDER BY nf.id DESC
          LIMIT 10
         ) nf
    

    For this, the indexes are user_follows(follower_id, follow_status, leader_id, created_at) and notification_followers(leader_id, created_at, id). This might be faster.