I have a table like below
year int
month int
symbol string
company_name string
sector string
sub_industry string
state string
avg_open double
avg_close double
avg_low double
avg_high double
avg_volume double
The field starting with avg_
refers to the average value in a month for a year. I need to find for each sector the year in which average of avg_close
is the lowest.
I tried to do something like below
SELECT sector, year FROM
(
SELECT sector, year, RANK() OVER (ORDER BY s2.yearly_avg_close) AS RANK FROM
( SELECT year,sector, AVG(avg_close) AS yearly_avg_close FROM stock_summary GROUP BY sector, year) s2
) s1
WHERE
s1.RANK = 1;
But this is printing just one sector and year like below
Telecommunications Services 2010
I am new to hive and playing around with some toy schemas. Can someone let me know what should be the correct way of solving this?
Hive Version - 1.1.0
Include sector
into the partition by
in the rank()
function:
SELECT sector, year, RANK() OVER (partition by sector ORDER BY s2.yearly_avg_close) AS RANK
Add year
as well if you need rank per each sector
and year
Read also this explanation how rank works: https://stackoverflow.com/a/55909947/2700344