Search code examples
sqlsubquerygreatest-n-per-groupimpala

Impala SQL return top 3 grouped records


I have this table example on impala.I want to get max top number_of_call for each caller

caller and group_id are unique grouped together

caller | group_id |number_of_call
  23   | 4433     | 12
  45   |  342     | 5
  23   |  475     | 33
  45   |  732     | 9
  46   |  4433    | 2
  23   |  522     | 5
  45   |  475     | 54
  23   |  342     | 32
  45   |  342     | 43

Expected output

  caller   | group_id | number_of_call
     23    |   475    | 33
     23    |   342    | 32
     23    |   4433   | 12
     46    |   4433   | 2
     45    |   475    | 54
     45    |   342    | 43
     45    |   732    | 9

Solution

  • You can use window functions to adress this greatest-n-per-group problem:

    select *
    from 
        select t.*, row_number() over(partition by caller order by number_of_calls desc) rn
        from mytable t
    ) t
    where rn <= 3
    order by caller, rn
    

    In the subquery, row_number() ranks records having the same caller by descending number_of_calls. Then, the outer query filters on the top 3 records per caller.