Hive query tuning want to get rid of union all

I want to tune a query that unions all based on each rank.

= current query

select
        '111.111.111.111' agent_ip,
        tbl.source_ip,
        sum(tbl.cnt) t_count
from
        event tbl
where

tbl.agent_ip    = '111.111.111.111'

group by
        tbl.source_ip
order by
        t_count desc limit 3  
union all
select
        '222.222.222.222' agent_ip,
        tbl.source_ip,
        sum(tbl.cnt) t_count
from
        event tbl
where

tbl.agent_ip    = '222.222.222.222'

group by
        tbl.source_ip
order by
        t_count desc limit 3  
                


[result] 
111.111.111.111 39.126.169.47   1
111.111.111.111 45.67.97.19 1
222.222.222.222 45.134.144.113  222
222.222.222.222 94.102.61.31    111
222.222.222.222 146.88.240.4    4

When I use the query above, the question and answer are too long. Union all seems to be a problem, is there any way to reduce it?

I hope you don't do each query separately...

tbl.agent_ip in ('111.111.111.111', '222.222.222.222')

I guess there must be a way to do this

Solution

You can give below SQL a try -

select agent_ip,source_ip,t_count 
from (
select subqry1.*
row_number() over( partition by agent_ip order by agent_ip, t_count) as rn -- create a rownumber for each row
from (
select  -- select base data
        agent_ip,
        source_ip,
        sum(tbl.cnt) t_count
from event tbl
WHERE tbl.agent_ip in ('111.111.111.111', '222.222.222.222')
group by 1,2) subqry1
) subqry2
Where rn<=3  -- to select top 3 rows for agent_ip

Explanation is in the SQL. I captured both queries in one. Should be faster than union all but i dont know your data volume or partition scheme so not 100% sure.

It has 2 stages -

compute base data using subquery 1.
Give them rank for each agent IP.
Then pick top 3 count.