Search code examples
sqlamazon-redshiftwindow-functions

redshift: how to find row_number after grouping and aggregating?


Suppose I have a table of customer purchases ("my_table") like this:

--------------------------------------
customerid | date_of_purchase | price
-----------|------------------|-------
     1     |  2019-09-20      | 20.23
     2     |  2019-09-21      | 1.99
     1     |  2019-09-21      | 123.34
 ...

I'd like to be able to find the nth highest spending customer in this table (say n = 5). So I tried this:

with cte as (
  select customerid, sum(price) as total_pay, 
    row_number() over (partition by customerid order by total_pay desc) as rn 
 from my_table group by customerid order by total_pay desc)
select * from cte where rn = 5;

But this gives me nonsense results. For some reason rn doesn't seem to be unique (for example there are a bunch of customers with rn = 1). I don't understand why. Isn't rn supposed to be just a row number?


Solution

  • Remove the partition by in the definition of row_number():

    with cte as (
          select customerid, sum(price) as total_pay, 
                row_number() over (order by total_pay desc) as rn 
          from my_table
          group by customerid
         )
    select *
    from cte
    where rn = 5;
    

    You are already aggregating by customerid, so each customer has only one row. So the value of rn will always be 1.