I'm working on MySQL workbench and I would like to find the top-selling products by types of buyers. First I categorized customers based on their buying activity. If a person made a purchase only on 1/1, it's a first-time buyer. If a person made a purchase on 1/1 and 1/2, it's a second-time buyer. If a person made a purchase on 1/1, 1/2, and 1/3, it's a third-time buyer. And so on.
The following is sample data, similar to my original data.
user_id | PurchaseDate | Product | Total | order_id |
---|---|---|---|---|
1 | 2023-01-01 16:46 | yogurt | 3.72 | 1000 |
1 | 2023-01-01 18:23 | milk | 4.38 | null |
2 | 2023-01-01 12:34 | egg | 4.57 | 1001 |
1 | 2023-01-01 11:54 | butter | 3.69 | null |
3 | 2023-01-01 10:41 | cheese | 3.68 | 1003 |
2 | 2023-01-01 13:45 | milk | 4.38 | null |
3 | 2023-01-03 18:56 | yogurt | 3.72 | 1005 |
3 | 2023-01-02 19:12 | cream | 4.98 | 1004 |
2 | 2023-01-02 12:47 | egg | 4.57 | 1002 |
If a customer makes more than one purchase on the same date, the order_id is null. So, I used that to categorize the buyers.
SELECT user_id, product, sum(Total) as Total,
CASE WHEN Count(DISTINCT(order_id)) = 1 THEN "FirstTimeBuyer"
WHEN Count(DISTINCT(order_id)) = 2 THEN "SecondTimeBuyer"
WHEN Count(DISTINCT(order_id)) = 3 THEN "ThirdTimeBuyer"
END AS TypeofBuyer
FROM myData
GROUP BY user_id
ORDER BY Total DESC;
And I get the following result
user_id | Product | Total | TypeofBuyer |
---|---|---|---|
2 | egg | 13.52 | SecondTimeBuyer |
3 | cheese | 12.38 | ThirdTimeBuyer |
1 | yogurt | 11.79 | FirstTimeBuyer |
But the problem is since I aggregated the total and grouped it by user_id, I cannot find the top-selling product for each user_id.
For instance, for user_id 1, the top-selling product is milk and the customer is a first-time buyer.
How can I extract the above information?
I'd appreciate your help!
Edit: Eventually, I'm aiming to visualize the top 10 selling products for 1st-time buyers, 2nd-time buyers, and so forth (my original data is much bigger than the sample)
Try this:
/*first cte*/
WITH cte AS (
SELECT user_id, Product, Total,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY Total DESC) Rnum,
SUM(Total) OVER (PARTITION BY user_id) AS GTotal,
order_id
FROM myData),
/*second cte*/
cte2 AS (
SELECT user_id,
CASE WHEN Count(DISTINCT(order_id)) = 1 THEN "FirstTimeBuyer"
WHEN Count(DISTINCT(order_id)) = 2 THEN "SecondTimeBuyer"
WHEN Count(DISTINCT(order_id)) = 3 THEN "ThirdTimeBuyer"
END AS TypeofBuyer
FROM cte
GROUP BY user_id)
/*join the ctes*/
SELECT cte.user_id, cte.Product, cte.Total, cte.GTotal, cte2.TypeofBuyer
FROM cte
JOIN cte2 ON cte.user_id=cte2.user_id
WHERE Rnum=1
ORDER BY GTotal DESC;
Using ROW_NUMBER()
to get top sales and assign with row numbering and SUM() OVER ()
to get total sales by user_id
in the first cte
. Then re-using your method of getting TypeofBuyer
in the second cte
. Lastly, a query to join both of the ctes
and filter only rows with row number=1 to get your desired result.