Search code examples
mysqlsqlquery-optimization

Filter inner query by the results of the outer query


Let me start in plain english first Query: Get top 100 paying users and their current active item (just one item)

Here is a drafted query

SELECT `user_id`, SUM(p.`amount`) as `total`
FROM `users_purcahse` AS p
     LEFT JOIN (SELECT `ui`.`item_id` as `item_id`,  `ui`.`user_id` as `user_id`
                FROM `user_items` AS `ui`
                         LEFT OUTER JOIN `items` AS `i` ON  `ui`.`item_id` = `i`.`id`
                         LEFT OUTER JOIN `categories` AS `cat` ON `i`.`category_id` = `cat`.`id`
                WHERE `ui`.isActive = 1
) AS `ui` ON p.`user_id` = `ui`.`user_id`


GROUP BY `user_id`, `ui`.`item_id`
ORDER BY `total` DESC
LIMIT 0, 100;

The problem with this is that the inner query is getting all users items table and then it will join it with the top 100 paying users

user items is a very large table, the query is taking too long I simply want to attach the current active items for each user after doing the calculations

Note: a user can have so many items but only 1 active item Note2: it's not enforced on the DB level that user_items can have one column with is_active per user


Solution

  • This is a job for some well-chosen subqueries.

    First, let's find the user_id values of your top-paying users.

                    SELECT user_id, SUM(amount) total
                      FROM users_purcahse
                     ORDER BY SUM(amount) DESC
                     LIMIT 100
    

    Next, let's find the item_id values for your users. If more than one item is active, we'll take the one with the smallest item_id value to get just one.

                   SELECT user_id, MIN(item_id) item_id
                     FROM user_items
                    WHERE isActive = 1
                    GROUP BY user_id
    

    Then, in an outer query we can fetch the details of your items.

    SELECT top_users.user_id, top_users.total, 
           active_items.item_id,
           items.*, categories.*
      FROM (
                    SELECT user_id, SUM(amount) total
                      FROM users_purcahse
                     ORDER BY SUM(amount) DESC
                     LIMIT 100
           ) top_users
      LEFT JOIN (
                   SELECT user_id, MIN(item_id) item_id
                     FROM user_items
                    WHERE isActive = 1
                    GROUP BY user_id
           ) active_items ON top_users.user_id = active_items.user_id
      LEFT JOIN items ON active_items.item_id = item.id
      LEFT JOIN categories ON item.category_id = categories.id
     ORDER BY top_users.total DESC, top_users.user_id
    

    The trick here is to use GROUP BY subqueries to get the data items where you need just one value per user_id.

    Once you have the resultset you need, you can use EXPLAIN to help you sort out any performance problems.