Search code examples
mysqlwindow-functionsrankingdense-rank

how to choose top 5 values out of 6 if rank() repeats some values in sql?


The task is to

return the top 5 customer ids and their rankings based on their spend for each store.

There are only 2 tables - payment and customer. There are 2 stores in total.

For the store_id = 2, the rank() gives repeating 1,2,2,3,4,5 values which is 6. I dont know how to choose the 5 with sql code. Since it is actually 6 - i can't "limit 5"

the sqlfiddle is here. I can't make it do the row_number() in sqlfiddle.

My own query:

with help2 as(
    select 
    store_id, 
    customer.customer_id, 
    sum(amount) as revenue
        
    from  customer inner join payment on 
    customer.customer_id = payment.customer_id
    where store_id = 2
    group by
    customer.customer_id
    order by revenue desc)              
    
    select 
    store_id, customer_id, revenue,
    
     rank() over(order by revenue desc) as ranking2
    
    from help2
    
     order by   ranking2  limit 6 -- i dont think there should be limit 6, this is hard coding

the expected answer is :

store_id  customer_id revenue ranking

2       526     221.55      1
2       178     194.61      2
2       137     194.61      2
2       469     177.60      3
2       181     174.66      4
2       259     170.67      5

Solution

  • The short answer is to use DENSE_RANK(), not RANK(). This will give you the correct ranking numbers that match the linked exercise's sample output.

    From What’s the Difference Between RANK and DENSE_RANK in SQL?:

    Unlike DENSE_RANK, RANK skips positions after equal rankings. The number of positions skipped depends on how many rows had an identical ranking. For example, Mary and Lisa sold the same number of products and are both ranked as #2. With RANK, the next position is #4; with DENSE_RANK, the next position is #3.

    WITH CustomerSpending AS (
        SELECT customer_id
        ,SUM(amount) as revenue
        FROM payment
        GROUP BY customer_id
    )
    
    ,CustomerSpendingRanking AS (
        SELECT customer.store_id
        ,customer.customer_id
        ,CustomerSpending.revenue
        ,DENSE_RANK() OVER (PARTITION BY customer.store_id ORDER BY CustomerSpending.revenue DESC) as ranking
        FROM customer
        JOIN CustomerSpending
        ON customer.customer_id = CustomerSpending.customer_id
    )
    
    SELECT store_id
    ,customer_id
    ,revenue
    ,ranking
    FROM CustomerSpendingRanking
    WHERE ranking < 6;