Search code examples
mysqlsqlperformancesql-tuning

Is MySQL naturally slow at this kind of query, or do I have it misconfigured?


The following query is intended to receive a list of unread messages by user. It involves 3 tables: recipients contains a relation of users to message IDs, messages contains the messages themselves, and message_readers contains a list of which users have read which messages.

The query reliably takes 4.9 seconds - this is seriously hurting our performance, and is especially worrisome since we hope the database will eventually be several orders of magnitude larger. Granted, it's an inherently heavy query, but the data set is tiny, and intuitively it seems that it should be much faster. The server has enough memory (32gb) that the entire database should be loaded in RAM at all times, and there's nothing else running on the box.

The tables are all tiny:

recipients: 23581
messages: 9679
message_readers: 2685

The query itself:

SELECT 
    m.*
FROM 
    messages m
INNER JOIN recipients r ON r.message_id = m.id
LEFT JOIN message_readers mr ON mr.message_id = m.id
WHERE
    r.id = $user_id
    AND (mr.read_by_id IS NULL OR mr.read_by_id <> $user_id)

The explain plan is pretty straightforward:

+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
| id | select_type | table | type   | possible_keys                     | key                               | key_len | ref                            | rows  | Extra       |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
|  1 | SIMPLE      | r     | ref    | index_recipients_on_id            | index_recipients_on_id            | 768     | const                          | 11908 | Using where |
|  1 | SIMPLE      | m     | eq_ref | PRIMARY                           | PRIMARY                           | 4       | db.r.message_id                |     1 | Using index |
|  1 | SIMPLE      | mr    | ALL    | NULL                              | NULL                              | NULL    | NULL                           |  2498 | Using where |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+

There IS an index on message_readers.read_by_id, but I guess it can't really use it because of the IS NULL condition.

I'm using all default settings except for the following:

key_buffer=4G
query_cache_limit = 256M
query_cache_size = 1G
innodb_buffer_pool_size=12G

Thanks!


Solution

  • Assuming that message_readers is a subset of recipients, I recommend making the following changes:

    1. Get rid of the message_readers table and replace it with a flag on the recipients table. This will eliminiate the null check and remove a join.

    2. It probably already is, but make sure your clustered index for recipients is id, message_id rather than message_id, id, since nearly all searches for messages will be based on the recipients.

    Here is the SELECT that results:

    SELECT
        r.whatever,
        m.whatever,
        -- ...
    FROM
        recipients r
        INNER JOIN messages m ON m.id = r.message_id
    WHERE
        r.id = $user_id
        AND r.read_flag = 'N'
    

    UPDATE

    Here is the correct version of your query using the existing scheme:

    SELECT
        r.whatever,
        m.whatever,
        -- ...
    FROM
        recipients r
        INNER JOIN messages m ON r.message_id = m.id
        LEFT JOIN message_readers mr ON mr.read_by_id = r.id 
                                     AND mr.message_id = m.id
    WHERE
        r.id = $user_id
        AND mr.read_by_id IS NULL
    

    This assumes that your clustered indexes are what would be expected:

    recipients: id, message_id
    messages: id
    message_readers: read_by_id, message_id