Search code examples
mysqlcasequery-optimizationmysql-dependent-subquery

How to prevent dependant subqueries within CASE WHEN x THE (subquery)


I have a very complex query which uses some subqueries within a CASE statement.

For this question the complete query isn't needed and would just prevent people from getting into the problem quick.

So this post uses pseudocode to work with. If wanted I could post the query but it is a monster and of no use for this question.

What I want are cacheable subqueries within the CASE statement.

SELECT * FROM posts posts
INNER JOIN posts_shared_to shared_to
      ON shared_to.post_id = posts.post_id
INNER JOIN channels.channels 
      ON channels.channel_id = shared_to.channel_id
WHERE posts.parent_id IS NULL
AND MATCH (post.text) AGAINST (:keyword IN BOOLEAN MODE) 
AND CASE(
    WHEN channel.read_access IS NULL THEN 1
    WHEN channel.read_access = 1 THEN
    (
      SELECT count(*) FROM channel_users 
      WHERE user_id = XXX AND channel_id = channels.channel_id
    ) 
    WHEN shared_to.read_type = 2 THEN
    (
      /* another subquery with a join */
      /* check if user is in friendlist of post_author */
    )
   ELSE 0
   END;
)
GROUP BY post.post_id
ORDER BY post.post_id
DESC LIMIT n,n

As statet above this is just a simplified pseudocode.

MySql EXPLAIN says that all used subqueries within the CASE are DEPENDANT which means ( if I'm correct ) that they need to run everytime and aren't cached.

Any solution that helps speeding up this query is welcome.

EDITED PART: Now the true query looks like this:

SELECT a.id, a.title, a.message AS post_text, a.type, a.date, a.author AS uid, 
b.a_name as name, b.avatar, 
shared_to.to_circle AS circle_id, shared_to.root_circle,
c.circle_name, c.read_access, c.owner_uid, c.profile,
MATCH(a.title,a.message) AGAINST (:keyword IN BOOLEAN MODE) AS score

FROM posts a 

/** get userdetails for post_author **/
JOIN authors b ON b.id = a.author

/** get circles posts was shared to **/
JOIN posts_shared_to shared_to ON shared_to.post_id = a.id AND shared_to.deleted IS NULL

/** 
* get circle_details note: at the moment shared_to can contain NULL and 1 too and doesnt need to be a circle_id 
* if to_circle IS NULL post was shared public
* if to_circle = 1 post was shared to private circles
* since we use md5 keys as circle ids this can be a string insetad of (int) ... ugly.. 
*
**/
LEFT JOIN circles c ON c.circle_id = shared_to.to_circle 
    /*AND c.circle_name IS NOT NULL */
    AND ( c.profile IS NULL OR c.profile = 6 OR c.profile = 1 ) 
    AND c.deleted IS NULL

LEFT JOIN (
    /** if post is within a channel that requires membership we use this to check if requesting user is member **/
    SELECT COUNT(*) users_count, user_id, circle_id FROM circles_users
    GROUP BY user_id, circle_id
    ) counts ON counts.circle_id = shared_to.to_circle
             AND counts.user_id = :me

LEFT JOIN (
    /** if post is shared private we check if requesting users exists within post authors private circles **/
    SELECT count(*) in_circles_count, ci.owner_uid AS circle_owner, cu1.user_id AS user_me 
    FROM circles ci 
    INNER JOIN circles_users cu1 ON cu1.circle_id = ci.circle_id 
                                 AND cu1.deleted IS NULL 
    WHERE ci.profile IS NULL AND ci.deleted IS NULL
    GROUP BY user_me, circle_owner
) users_in_circles ON users_in_circles.user_me = :me 
                   AND users_in_circles.circle_owner = a.id

/** make sure post is a topic **/
WHERE a.parent_id IS NULL AND a.deleted IS NULL

/** search title and post body **/
AND MATCH (a.title,a.message) AGAINST (:keyword IN BOOLEAN MODE) 

AND (
    /** own circle **/
    c.owner_uid = :me
    /** site member read_access ( this query is for members, for guests we use a different query ) **/
    OR ( c.read_access = 1 OR c.read_access = "1" )
    /** public read_access **/
    OR ( shared_to.to_circle IS NULL OR ( c.read_access IS NULL AND c.owner_uid IS NOT NULL ) )
    /** channel/circle member read_access**/
    OR ( c.read_access = 3 OR c.read_access = "3" AND counts.users_count > 0 )
    /** for users within post creators private circles **/
    OR ( 
    ( 
    /** use shared_to to determine if post is private **/
    shared_to.to_circle = "1" OR shared_to.to_circle = 1 
    /** use circle settings to determine global privacy **/
    OR ( c.owner_uid IS NOT NULL AND c.read_access = 2 OR c.read_access = "2" )
    ) AND users_in_circles.circle_owner = a.author AND users_in_circles.user_me = :me
    )
)

GROUP BY a.id ORDER BY a.id DESC LIMIT n,n

Question: Is this really the better way? If I look at how many rows the derived tables can contain I'm not sure about it.

And maybe someone can help me changing the query like mentioned by @Ollie-Jones :

SELECT stuff, stuff, stuff
  FROM (
         SELECT post.post_id
           FROM your whole query
          ORDER BY post_id DESC
          LIMIT n,n
       ) ids
  JOIN whatever ON whatever.post_id = ids.post_id
  JOIN whatelse ON whatelse

Sry if this sound slazy but I'm not really a mysqlguy and I got headaches for years just from building this query. :D


Solution

  • The best way to eliminate your dependent subquery is to refactor it so it's a virtual table (an independent subquery), then JOIN or LEFT JOIN it to the rest of your tables.

    In your CASE, you have

         SELECT count(*) FROM channel_users 
          WHERE user_id = XXX AND channel_id = channels.channel_id
    

    So, the independent-subquery casting of this is

                       SELECT COUNT(*) users_count,
                              user_id, channel_id
                        FROM channel_users
                       GROUP BY user_id, channel_id
    

    Do you see how that virtual table contains one row for each distinct combination of user_id and channel_id values? Each row has the users_count value you need. You can then JOIN that into the rest of your query, like so. (Notice that INNER JOIN === JOIN in MySQL, so I used JOIN to shorten it a bit.)

    SELECT * FROM posts posts
      JOIN posts_shared_to shared_to ON shared_to.post_id = posts.post_id
      JOIN channels.channels  ON channels.channel_id = shared_to.channel_id
      LEFT JOIN (
                       SELECT COUNT(*) users_count,
                              user_id, channel_id
                        FROM channel_users
                       GROUP BY user_id, channel_id
           ) counts ON counts.channel_id = shared_to.channel_id
                   AND counts.user_id = channels.user_id
      LEFT JOIN (  /* your other refactored subquery */
                ) friendcounts ON whatever
     WHERE posts.parent_id IS NULL
       AND channels.user_id = XXX
       AND MATCH (post.text) AGAINST (:keyword IN BOOLEAN MODE) 
       AND (          channel.read_access IS NULL
                   OR (channel.read_access = 1 AND counts.users_count > 0)
                   OR (shared_to.read_type = AND friendcount.users_count > 0)
           )
     GROUP BY post.post_id
     ORDER BY post.post_id DESC
     LIMIT n,n
    

    The MySQL query planner is generally smart enough to generate an appropriate subset of each independent subquery.

    Pro tip: SELECT lots of columns ... ORDER BY something LIMIT n is generally considered a wasteful antipattern. It kills performance because it sorts a whole bunch of columns of data then discards most of the result.

    Pro tip: SELECT * in JOIN query is also wasteful. You are much better off if you give a list of the columns you actually need in your result set.

    So, you can refactor your query again to do

        SELECT stuff, stuff, stuff
          FROM (
                 SELECT post.post_id
                   FROM your whole query
                  ORDER BY post_id DESC
                  LIMIT n,n
               ) ids
          JOIN whatever ON whatever.post_id = ids.post_id
          JOIN whatelse ON whatelse.
    

    The idea is to only sort the post_id values, then use the LIMITed subset to pull the rest of the data you need.