The following query is intended to receive a list of unread messages by user. It involves 3 tables: recipients
contains a relation of users to message IDs, messages
contains the messages themselves, and message_readers
contains a list of which users have read which messages.
The query reliably takes 4.9 seconds - this is seriously hurting our performance, and is especially worrisome since we hope the database will eventually be several orders of magnitude larger. Granted, it's an inherently heavy query, but the data set is tiny, and intuitively it seems that it should be much faster. The server has enough memory (32gb) that the entire database should be loaded in RAM at all times, and there's nothing else running on the box.
The tables are all tiny:
recipients: 23581
messages: 9679
message_readers: 2685
The query itself:
SELECT
m.*
FROM
messages m
INNER JOIN recipients r ON r.message_id = m.id
LEFT JOIN message_readers mr ON mr.message_id = m.id
WHERE
r.id = $user_id
AND (mr.read_by_id IS NULL OR mr.read_by_id <> $user_id)
The explain plan is pretty straightforward:
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
| 1 | SIMPLE | r | ref | index_recipients_on_id | index_recipients_on_id | 768 | const | 11908 | Using where |
| 1 | SIMPLE | m | eq_ref | PRIMARY | PRIMARY | 4 | db.r.message_id | 1 | Using index |
| 1 | SIMPLE | mr | ALL | NULL | NULL | NULL | NULL | 2498 | Using where |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
There IS an index on message_readers.read_by_id
, but I guess it can't really use it because of the IS NULL condition.
I'm using all default settings except for the following:
key_buffer=4G
query_cache_limit = 256M
query_cache_size = 1G
innodb_buffer_pool_size=12G
Thanks!
Assuming that message_readers
is a subset of recipients
, I recommend making the following changes:
Get rid of the message_readers
table and replace it with a flag on the recipients
table. This will eliminiate the null check and remove a join.
It probably already is, but make sure your clustered index for recipients
is id, message_id
rather than message_id, id
, since nearly all searches for messages will be based on the recipients.
Here is the SELECT that results:
SELECT
r.whatever,
m.whatever,
-- ...
FROM
recipients r
INNER JOIN messages m ON m.id = r.message_id
WHERE
r.id = $user_id
AND r.read_flag = 'N'
UPDATE
Here is the correct version of your query using the existing scheme:
SELECT
r.whatever,
m.whatever,
-- ...
FROM
recipients r
INNER JOIN messages m ON r.message_id = m.id
LEFT JOIN message_readers mr ON mr.read_by_id = r.id
AND mr.message_id = m.id
WHERE
r.id = $user_id
AND mr.read_by_id IS NULL
This assumes that your clustered indexes are what would be expected:
recipients: id, message_id
messages: id
message_readers: read_by_id, message_id