Search code examples
mysqloptimizationquery-optimizationexplain

mySQL - How can I interpret my EXPLAIN results and optimize this query?


Looking to understand what my EXPLAIN results mean here, and to optimize this query and my tables as best as I can.

The query:

SELECT i.pending,
       i.itemid, 
       i.message,
       i.cid, 
       i.dateadded, 
       i.entrypoint,  
       SUM(CASE WHEN v.direction = 1 THEN 1
                     WHEN v.direction = 2 THEN -1
                     ELSE 0 END) AS votes,
       c.name AS cname,
       c.tag AS ctag,
       i.userid,
       (SELECT COUNT(commentid) FROM `comments` WHERE comments.itemid = i.itemid) AS commentcount,
       CASE WHEN NOT EXISTS (SELECT voteid FROM `votes` WHERE votes.itemid = i.itemid AND votes.userid = @userid) THEN '0' ELSE '1' END AS hasVoted,
       CASE WHEN NOT EXISTS (SELECT voteid FROM `user_favorites` WHERE user_favorites.itemid = i.itemid AND user_favorites.userid = @userid) THEN '0' ELSE '1' END AS isFavorite
    FROM `contentitems` i
      LEFT JOIN votes v ON i.itemid = v.itemid
      LEFT JOIN `user_favorites` uv ON i.itemid = uv.itemid AND (uv.userid = @userid)
      INNER JOIN  `categories` c ON i.cid = c.cid
    GROUP BY i.itemid
    HAVING SUM(CASE WHEN v.direction = 1 THEN 1
                    WHEN v.direction = 2 THEN -1
                    ELSE 0 END) > -3 AND i.pending = 0
    ORDER BY i.dateadded DESC

(Edited Formatting)

The explain results:

+----+--------------------+----------------+--------+-------------------------+-------------------------+---------+------------------------+------+-------------------------------------------------------
| id |    select_type     |     table      |  type  |      possible_keys                  key                               | key_len | ref                     | rows |              Extra              |
+----+--------------------+----------------+--------+-------------------------+-------------------------+---------+------------------------+------+------------------------------------------------------+
|  1 | PRIMARY            | i              | ALL    | NULL                              | NULL                              | NULL    | NULL                    |  121 | Using temporary; Using filesort |
|  1 | PRIMARY            | v              | ref    | fk_contentitemsitemid_votesitemid | fk_contentitemsitemid_votesitemid | 4       | db33481_mydb.i.itemid   |    2 |                                 |
|  1 | PRIMARY            | uv             | ALL    | NULL                              | NULL                              | NULL    | NULL                    |    7 |                                 |
|  1 | PRIMARY            | c              | eq_ref | PRIMARY                           | PRIMARY                           | 4       | db33481_mydb.i.cid      |    1 |                                 |
|  4 | DEPENDENT SUBQUERY | user_favorites | ALL    | NULL                              | NULL                              | NULL    | NULL                    |    7 | Using where                     |
|  3 | DEPENDENT SUBQUERY | votes          | ref    | fk_contentitemsitemid_votesitemid | fk_contentitemsitemid_votesitemid | 4       | func                    |    2 | Using where                     |
|  2 | DEPENDENT SUBQUERY | comments       | ALL    | NULL                              | NULL                              | NULL    | NULL                    |   26 | Using where                     |
+----+--------------------+----------------+--------+-------------------------+-------------------------+---------+------------------------+------+------------------------------------------------------+

Solution

  • First, you have a select not exists vote ID, then do a left-join in the from, and finally a sum in the having. This is hitting your votes table 3 times. IF each vote is possibly associated to a single "ItemID", then that would be best to be pre-aggregated by itself as its own "Sum" done ONCE.

    Additionally, since your final "HAVING" clause is a direct basis of the Votes, having a left join on votes becomes a dead point and ultimately ends in a normal JOIN.

    All that being said, I would pre-query FIRST on the votes that FINISH with the qualifying HAVING condition up front, then join to the content items and other joins... The query against User_Favorites is a count and will either be 0 (not found), or 1 (found). There should be no need for a case/when

    My first query alias "PQ" represents the "PreQuery"

    SELECT
          PQ.ItemID,
          PQ.VSum as Votes,
          PQ.HasVoted,
          i.pending,
          i.itemid, 
          i.message,
          i.cid, 
          i.dateadded, 
          i.entrypoint,  
          i.userid,
          c.name AS cname,
          c.tag AS ctag,
          ( SELECT COUNT(commentid) 
               FROM `comments` 
               WHERE comments.itemid = PQ.itemid) AS commentcount,
          ( SELECT COUNT(*) FROM user_favorites uf
                  WHERE uf.itemid = PQ.itemid 
                    AND uf.userid = @userid ) AS isFavorite
       from 
          ( SELECT
                  v.itemid,
                  SUM( case when v.Direction = 1 then 1
                            when v.Direction = 2 then -1
                            ELSE 0 end ) as VSum,
                  MAX( if( votes.userid = @userid, 1, 0 ) AS HasVoted 
               from 
                  votes v
               group by 
                  v.itemid
               having
                  VSum > -3 ) PQ
    
             JOIN ContentItems i
                ON PQ.ItemID = i.ItemID
                and i.Pending = 0
    
             JOIN Categories c
                ON i.cid = c.cid
    
       ORDER BY 
          i.dateadded DESC
    

    Others have indicated the need for indexes, agreed. I would ensure each table has respective index on either the user ID or Item ID (or both where appropriate).

    Couple other points... You originally start query querying all ContentItems, but left-joining to votes... But then applying the element of a user ID. This DEFINITELY smells of a query for a specific user. That being said, I would ADDITIONALLY pre-start the entire query with a select of only ItemIDs the user ID has done anything with... THEN continue the query.