Search code examples
sqlmysqlgreatest-n-per-group

Top Reviewed Customers per Product


Top Reviewed Customers per Product

Using the table playground.product_reviews, write a SQL query to identify, for each product, the customer who provided the highest review score. If there are ties in review score, the customer with the most helpful votes should be considered top. The output should include columns for product_id, customer_id, review_score, and helpful_votes, capturing the details of the top review for each product ordered in ascending order of product_id

These are the tables to query for this question:

CREATE TABLE playground.product_reviews
(
    seller_id int,
    customer_id int,
    product_id int,
    review_date date,
    review_score double,
    helpful_votes int
)

INSERT INTO playground.product_reviews
(
    seller_id int,
    customer_id int,
    product_id int,
    review_date date,
    review_score double,
    helpful_votes int
)
values
(301, 202, '2024-01-04', 4.5, 1)
,(301, 202, '2024-02-05', 4.6, 12)
,(302, 203, '2024-02-05', 4.8, 5)
,(303, 204, '2024-02-05', 3.5, 12)
,(301, 202, '2024-03-05', 2.5, 12)
,(302, 203, '2024-01-04', 4.5, 7)
,(303, 204, '2024-01-04', 4.5, 8)
,(301, 204, '2024-01-04', 4.5, 12)
,(302, 203, '2024-03-04', 4.5, 8)
,(303, 204, '2024-03-04', 4.5, 12)

Your answer should include these columns:

product_id integer
customer_id integer
review_score double
helpful_votes integer

The SQL I have is -

with Ordered_Data AS
(
SELECT 
*,
RANK() OVER (Partition by customer_id order by review_score, helpful_votes desc) as r
FROM playground.product_reviews
) 
select 
product_id,
customer_id,
review_score,
helpful_votes
from Ordered_Data
where r = 1
order by product_id asc

But I am getting wrong answers with lowest review scores getting the highest ranks. What am i missing?


Solution

  • The issue in your code is you have DESC only after helpful_votes. How rank() function works is by ordering on all the order by columns. So in your query reviews will be ranked primarily based on the review_score in ascending order.

    WITH ordered_data
         AS (SELECT *,
                    Rank()
                      OVER (
                        partition BY customer_id
                        ORDER BY review_score DESC, helpful_votes DESC) AS r
             FROM   playground.product_reviews)
    SELECT product_id,
           customer_id,
           review_score,
           helpful_votes
    FROM   ordered_data
    WHERE  r = 1
    ORDER  BY product_id ASC