Search code examples
sqlapache-spark-sqlgreatest-n-per-groupranking

SQL Top N per item group and M for each user


Sample Data:

user_id item_type item_count
11 A 10
11 A 9
11 A 2
11 B 4
11 B 1
11 C 2
12 A 2
12 B 4
12 B 1
12 D 1

Desired Output:

user_id item_type item_count
11 A 10
11 B 4
11 C 2
12 A 2
12 B 4
12 D 1

For each user, I want the top item by item count within each item type they have. So For user 11, they should get the top record for item A, the top record for item B, and the top record for c. I'm a little stuck. I think it should be a two-fold ranking problem, but most examples I can find are selecting for the top user/item, regardless of item type, but I want the top item of each item type per user.

select * from (
    select user_id, 
           item_type,
           item_count, 
           row_number() over (partition by user order by item_count desc) as item_rank 
    from table) ranks
where item_rank <= 2;

this just gets the top items per user, but I want the top items per item type per user.


Solution

  • You want to partition by user and item here:

    WITH cte AS (
        SELECT t.*, ROW_NUMBER() OVER (PARTITION BY user_id, item_type
                                       ORDER BY item_count DESC) rn
        FROM yourTable t
    )
    
    SELECT user_id, item_type, item_count
    FROM cte
    WHERE rn <= 2
    ORDER BY user_id, item_type;