Search code examples
mysqloptimizationquery-performance

Optimize query with group by, inner query and count


I'm trying to fetch relative hashtag.

I have many to many relation between hashtag and post table.

For example for hashtag 'Love', I try to get all hashtag where they post has Love hashtag.

here is my query ( hashtag 67 is for 'Love' )

SELECT hashtag_id, count(hashtag_id) as count
from post_hashtag 
where 
   # where posts has hashtag '67' 
   post_id in ( SELECT post_id FROM post_hashtag WHERE hashtag_id = 67 ) 
   # remove hashtag 67 from result
   and hashtag_id != 67
# group them and sort by count, so the must repeated hashtag is the best relative hashtag
GROUP by hashtag_id
ORDER by count desc
limit 4

I try to optimize my query but I can't optimize it more ( for now it took 2 - 12 second base on number of post )

Is there anyway to optimize it?

Explain query

+----+-------------+-----------------+------------+--------+---------------+---------------------------------------------------+---------+-------------------------------------------+---------+----------+-------------+
| id | select_type | table           | partitions | type   | possible_keys | key                                               | key_len | ref                                       | rows    | filtered | Extra       |
+----+-------------+-----------------+------------+--------+---------------+---------------------------------------------------+---------+-------------------------------------------+---------+----------+-------------+
|  1 | SIMPLE      | post_hashtag    | NULL       | index  | NULL          | fk_np_account_post_has_np_hashtag_np_hashtag1_idx | 4       | NULL                                      | 4623584 |   100.00 | Using index |
|  1 | SIMPLE      | hashtag         | NULL       | eq_ref | PRIMARY       | PRIMARY                                           | 4       | graphicj_novin.np_post_hashtag.hashtag_id |       1 |   100.00 | NULL        |
+----+-------------+-----------------+------------+--------+---------------+---------------------------------------------------+---------+-------------------------------------------+---------+----------+-------------+

post_hashtag has these field

post_id,hashtag_id

Both field is foreign key


Solution

  • MySQL often optimizes WHERE IN (SELECT ...) poorly. Use a JOIN instead.

    SELECT p1.hashtag_id, count(*) as count
    from post_hashtag AS p1
    JOIN post_hashtag AS p2 ON p1.post_id = p2.post_id
    WHERE p1.hashtag_id != 67
    AND p2.hashtag_id = 67
    GROUP by p1.hashtag_id
    ORDER by count desc
    limit 4