Search code examples
mysqlperformancequery-optimization

How do I fix this extremely slow MYSQL query


The purpose of this query is to list the distinct users someone has connections to (ie. users that are followed by or are following the user with id 256 but excludes users who are either blocking or are blocked by the current user making the request (user with id 2)

The relationships table is pretty simple. The status column can be one of two values: "following" or "blocked":

mysql> describe relationships;
+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| follower_id | int(11)      | YES  | MUL | NULL    |                |
| followee_id | int(11)      | YES  | MUL | NULL    |                |
| created_at  | datetime     | YES  |     | NULL    |                |
| updated_at  | datetime     | YES  |     | NULL    |                |
| status      | varchar(191) | YES  | MUL | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

This query currently takes about 58 seconds to complete! User 256 only has 1500 connections. To put this is context, there are roughly 10,000 user rows, 5500 relationships rows.

SELECT DISTINCT `users`.*, 
    -- "followed" is just a flag indicating if user #2 is currently following a given user
    (
      SELECT COUNT(*) FROM `relationships`  
      WHERE `relationships`.`followee_id` = `users`.`id` 
        AND `relationships`.`follower_id` = 2
    ) AS 'followed'
FROM `users` 
INNER JOIN `relationships` 
ON (
  (`users`.`id` = `relationships`.`follower_id` 
    AND `relationships`.`followee_id` = 256
  ) 
  OR (`users`.`id` = `relationships`.`followee_id` 
    AND `relationships`.`follower_id` = 256
  )
)
WHERE `relationships`.`status` = 'following' 
  AND (
    -- Ensure we don't return users who are blocked by user #2 
    `users`.`id` NOT IN (
      SELECT `relationships`.`followee_id` 
      FROM `relationships` 
      WHERE `relationships`.`follower_id` = 2
        AND `relationships`.`status` = 'blocked'
    )
  )
  AND (
    -- Ensure we don't return users who are blocking user #2 
    `users`.`id` NOT IN (
      SELECT `relationships`.`follower_id` 
      FROM `relationships` 
      WHERE `relationships`.`followee_id` = 2 
        AND `relationships`.`status` = 'blocked'
    )
  )
ORDER BY `users`.`id` ASC 
LIMIT 10

Here's are the current indexes on relationships:

mysql> show index from relationships;
+---------------+------------+---------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table         | Non_unique | Key_name                                                      | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+---------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| relationships |          0 | PRIMARY                                                       |            1 | id          | A         |        3002 |     NULL | NULL   |      | BTREE      |         |               |
| relationships |          0 | index_relationships_on_status_and_follower_id_and_followee_id |            1 | status      | A         |           2 |     NULL | NULL   | YES  | BTREE      |         |               |
| relationships |          0 | index_relationships_on_status_and_follower_id_and_followee_id |            2 | follower_id | A         |        3002 |     NULL | NULL   | YES  | BTREE      |         |               |
| relationships |          0 | index_relationships_on_status_and_follower_id_and_followee_id |            3 | followee_id | A         |        3002 |     NULL | NULL   | YES  | BTREE      |         |               |
| relationships |          1 | index_relationships_on_followee_id                            |            1 | followee_id | A         |        3002 |     NULL | NULL   | YES  | BTREE      |         |               |
| relationships |          1 | index_relationships_on_follower_id                            |            1 | follower_id | A         |        3002 |     NULL | NULL   | YES  | BTREE      |         |               |
| relationships |          1 | index_relationships_on_status_and_followee_id_and_follower_id |            1 | status      | A         |           2 |     NULL | NULL   | YES  | BTREE      |         |               |
| relationships |          1 | index_relationships_on_status_and_followee_id_and_follower_id |            2 | followee_id | A         |        3002 |     NULL | NULL   | YES  | BTREE      |         |               |
| relationships |          1 | index_relationships_on_status_and_followee_id_and_follower_id |            3 | follower_id | A         |        3002 |     NULL | NULL   | YES  | BTREE      |         |               |
+---------------+------------+---------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

explain results:

mysql> EXPLAIN SELECT DISTINCT `users`.*, (SELECT COUNT(*) FROM `relationships` WHERE `relationships`.`followee_id` = `users`.`id` AND `relationships`.`follower_id` = 2) AS 'followed' FROM `users` INNER JOIN `relationships` ON(`users`.`id` = `relationships`.`follower_id` AND `relationships`.`followee_id` = 256) OR (`users`.`id` = `relationships`.`followee_id` AND `relationships`.`follower_id` = 256) WHERE `relationships`.`status` = 'following' AND (`users`.`id` NOT IN (SELECT `relationships`.`followee_id` FROM `relationships` WHERE `relationships`.`follower_id` = 2 AND `relationships`.`status` = 'blocked')) AND (`users`.`id` NOT IN (SELECT `relationships`.`follower_id` FROM `relationships` WHERE `relationships`.`followee_id` = 2 AND `relationships`.`status` = 'blocked')) ORDER BY `users`.`id` ASC LIMIT 10;
+----+--------------------+---------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+---------+-------------------------------+------+----------------------------------------------------------------------------------------------------------------------------------+
| id | select_type        | table         | type        | possible_keys                                                                                                                                                                                     | key                                                                   | key_len | ref                           | rows | Extra                                                                                                                            |
+----+--------------------+---------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+---------+-------------------------------+------+----------------------------------------------------------------------------------------------------------------------------------+
|  1 | PRIMARY            | relationships | index_merge | index_relationships_on_status_and_follower_id_and_followee_id,index_relationships_on_followee_id,index_relationships_on_follower_id,index_relationships_on_status_and_followee_id_and_follower_id | index_relationships_on_followee_id,index_relationships_on_follower_id | 5,5     | NULL                          |    2 | Using union(index_relationships_on_followee_id,index_relationships_on_follower_id); Using where; Using temporary; Using filesort |
|  1 | PRIMARY            | users         | ALL         | PRIMARY                                                                                                                                                                                           | NULL                                                                  | NULL    | NULL                          | 1534 | Range checked for each record (index map: 0x1)                                                                                   |
|  4 | SUBQUERY           | relationships | ref         | index_relationships_on_status_and_follower_id_and_followee_id,index_relationships_on_followee_id,index_relationships_on_follower_id,index_relationships_on_status_and_followee_id_and_follower_id | index_relationships_on_status_and_follower_id_and_followee_id         | 767     | const                         |    1 | Using where; Using index                                                                                                         |
|  3 | SUBQUERY           | relationships | ref         | index_relationships_on_status_and_follower_id_and_followee_id,index_relationships_on_followee_id,index_relationships_on_follower_id,index_relationships_on_status_and_followee_id_and_follower_id | index_relationships_on_status_and_follower_id_and_followee_id         | 772     | const,const                   |    1 | Using where; Using index                                                                                                         |
|  2 | DEPENDENT SUBQUERY | relationships | ref         | index_relationships_on_followee_id,index_relationships_on_follower_id                                                                                                                             | index_relationships_on_followee_id                                    | 5       | development.users.id |    1 | Using where                                                                                                                      |
+----+--------------------+---------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+---------+-------------------------------+------+----------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.01 sec)

Solution

  • It's hard to give you a concrete answer without testing this but I think this part of the query is the problem

    SELECT DISTINCT `users`.*, (
          SELECT COUNT(*) FROM `relationships`  
          WHERE `relationships`.`followee_id` = `users`.`id` 
            AND `relationships`.`follower_id` = 2
        ) AS 'followed'
    

    You're also using order by. Remove DISTINCT and order by and see if things speed up. I know it changes the query but I suspect that distinct is basically building a bunch of temporary tables and throwing them away for every row that it needs to check. Have a look here

    http://dev.mysql.com/doc/refman/5.7/en/distinct-optimization.html

    Counts can be slow. make sure that the count is working from the fastest column. See this...

    https://www.percona.com/blog/2007/04/10/count-vs-countcol/

    A good way to think about SQL is in SETS. Luckily MySQL supports sub queries.

    https://dev.mysql.com/doc/refman/5.7/en/from-clause-subqueries.html

    Some pseudo SQL follows...

    select user_id
    from relationships as follower, relationships as followee
    where ...
    

    In the above we have two sets that we can then manipulate. Using sub queries this gets really interesting

    select user_id
    from (select user_id as f1 from relationships where ...) as follower, 
         (select user_id as f2 from relationships where ...) as followee
    where ...
    

    I've always found something like the above an easy way to think about self referencing tables.