Let me start in plain english first Query: Get top 100 paying users and their current active item (just one item)
Here is a drafted query
SELECT `user_id`, SUM(p.`amount`) as `total`
FROM `users_purcahse` AS p
LEFT JOIN (SELECT `ui`.`item_id` as `item_id`, `ui`.`user_id` as `user_id`
FROM `user_items` AS `ui`
LEFT OUTER JOIN `items` AS `i` ON `ui`.`item_id` = `i`.`id`
LEFT OUTER JOIN `categories` AS `cat` ON `i`.`category_id` = `cat`.`id`
WHERE `ui`.isActive = 1
) AS `ui` ON p.`user_id` = `ui`.`user_id`
GROUP BY `user_id`, `ui`.`item_id`
ORDER BY `total` DESC
LIMIT 0, 100;
The problem with this is that the inner query is getting all users items table and then it will join it with the top 100 paying users
user items is a very large table, the query is taking too long I simply want to attach the current active items for each user after doing the calculations
Note: a user can have so many items but only 1 active item Note2: it's not enforced on the DB level that user_items can have one column with is_active per user
This is a job for some well-chosen subqueries.
First, let's find the user_id
values of your top-paying users.
SELECT user_id, SUM(amount) total
FROM users_purcahse
ORDER BY SUM(amount) DESC
LIMIT 100
Next, let's find the item_id
values for your users. If more than one item is active, we'll take the one with the smallest item_id
value to get just one.
SELECT user_id, MIN(item_id) item_id
FROM user_items
WHERE isActive = 1
GROUP BY user_id
Then, in an outer query we can fetch the details of your items.
SELECT top_users.user_id, top_users.total,
active_items.item_id,
items.*, categories.*
FROM (
SELECT user_id, SUM(amount) total
FROM users_purcahse
ORDER BY SUM(amount) DESC
LIMIT 100
) top_users
LEFT JOIN (
SELECT user_id, MIN(item_id) item_id
FROM user_items
WHERE isActive = 1
GROUP BY user_id
) active_items ON top_users.user_id = active_items.user_id
LEFT JOIN items ON active_items.item_id = item.id
LEFT JOIN categories ON item.category_id = categories.id
ORDER BY top_users.total DESC, top_users.user_id
The trick here is to use GROUP BY subqueries to get the data items where you need just one value per user_id
.
Once you have the resultset you need, you can use EXPLAIN
to help you sort out any performance problems.