We're trying to get the latest 10 notifications for a follower from the database. There are a few joins we do to ensure we get the correct set of notifications for the follower. If the person they follow (their leader) added a new post, the follower should only get the notification for posts that were added after they started following the leader (no sense in showing them their leader's older posts as new notifications). The other join is to make sure we get the notification's read_at
time,so the follower knows if it was already read or not. Here is the query, but it takes ~9 secs which is too slow. It should ideally only take a few ms, specially with the indexes:
Query:
SELECT nf.id, nf.uuid, nf.leader_id, nf.data, nf.created_at, nfr.read_at
FROM notification_followers nf
LEFT JOIN user_follows uf ON uf.leader_id = nf.leader_id AND uf.follower_id = 14 AND uf.follow_status = 'follow'
LEFT JOIN notification_followers_read nfr ON nf.id = nfr.notification_followers_id AND nfr.follower_id = 14
WHERE (nf.created_at > uf.created_at)
ORDER BY nf.id DESC
LIMIT 10
Indexes:
ALTER TABLE `notification_followers` ADD INDEX `nf_lid_ca_id_idx` (`leader_id`,`created_at`,`id`);
ALTER TABLE `user_follows` ADD KEY`uf_fid_lid_fs_ca_idx` (`follower_id`,`leader_id`,`follow_status`,`created_at`)
ALTER TABLE `notification_followers_read` ADD INDEX `nfr_fid_nfid_ra_idx` (`follower_id`,`notification_followers_id`,`read_at`);
Explain:
Correct Results (take ~9 secs):
SQL DUMP:
SQL DUMP TO REPRODUCE LOCALLY just create speed_test
database locally and import file to see the slow query issue live with all the table data (~100K rows).
How can we optimize the above to get the correct results within a few ms?
For this query:
SELECT nf.id, nf.uuid, nf.leader_id, nf.data, nf.created_at, nfr.read_at
FROM notification_followers nf JOIN
user_follows uf
ON uf.leader_id = nf.leader_id AND uf.follower_id = 14 AND
uf.follow_status = 'follow' LEFT JOIN
notification_followers_read nfr
ON nf.id = nfr.notification_followers_id AND nfr.follower_id = 14
WHERE nf.created_at > uf.created_at
ORDER BY nf.id DESC
LIMIT 10;
I would recommend indexes on user_follower(leader_id, follower_id, follow_status, created_at)
and notification_followers_read(notification_followers_id, follower_id, read_at)
. The order of the columns in the indexes matters.
Notice that I changed the first JOIN
to an inner join, because the WHERE
clause turns it into one anyway.
Hmmm, let's try rewriting the query:
SELECT nf.id, nf.uuid, nf.leader_id, nf.data, nf.created_at,
(SELECT nfr.read_at
FROM notification_followers_read nfr
WHERE nf.id = nfr.notification_followers_id AND nfr.follower_id = 14
) nfr
FROM (SELECT nf.*
FROM notification_followers nf
WHERE EXISTS (SELECT 1
FROM user_follows uf
WHERE uf.leader_id = nf.leader_id AND uf.follower_id = 14 AND
uf.follow_status = 'follow' AND nf.created_at > uf.created_at
)
ORDER BY nf.id DESC
LIMIT 10
) nf;
For this, you want to be sure you have an index on notification_followers(id)
as well.
Depending on your data, the inner subquery might be faster with this approach:
FROM (SELECT nf.*
FROM user_follows uf JOIN
notification_followers nf
ON uf.leader_id = nf.leader_id AND nf.created_at > uf.created_at
WHERE uf.follower_id = 14 AND uf.follow_status = 'follow'
ORDER BY nf.id DESC
LIMIT 10
) nf
For this, the indexes are user_follows(follower_id, follow_status, leader_id, created_at)
and notification_followers(leader_id, created_at, id)
. This might be faster.