I have this table example on impala.I want to get max top number_of_call for each caller
caller and group_id are unique grouped together
caller | group_id |number_of_call
23 | 4433 | 12
45 | 342 | 5
23 | 475 | 33
45 | 732 | 9
46 | 4433 | 2
23 | 522 | 5
45 | 475 | 54
23 | 342 | 32
45 | 342 | 43
Expected output
caller | group_id | number_of_call
23 | 475 | 33
23 | 342 | 32
23 | 4433 | 12
46 | 4433 | 2
45 | 475 | 54
45 | 342 | 43
45 | 732 | 9
You can use window functions to adress this greatest-n-per-group problem:
select *
from
select t.*, row_number() over(partition by caller order by number_of_calls desc) rn
from mytable t
) t
where rn <= 3
order by caller, rn
In the subquery, row_number()
ranks records having the same caller
by descending number_of_calls
. Then, the outer query filters on the top 3 records per caller
.