I have dataset in hive table like below
date col1 col2 col3
2016-02-01 A X 5
2016-02-03 A X 5
2016-02-04 A X 5
2016-03-01 A X 6
2016-04-01 A X 5
2016-04-02 A Y 5
2016-04-03 A Y 5
I have to do selective grouping on col 1 and col2 where in one group is formed whenever col3 value changes , for example col3 value is changed from 5 to 6 in row4 , I have to take the date column and get the min and max values out of it. the output should be like this .
col1 col2 col3 minDate maxDate
A X 5 2016-02-01 2016-02-04
A X 6 2016-03-01 2016-03-01
A X 5 2016-04-01 2016-04-01
A Y 5 2016-04-02 2016-04-03
I am sure simple group by on col1 and col2 can not work. Can anyone please suggest a way to achieve this please ?
select col1,col2,col3
,min(date) as minDate
,max(date) as maxDate
from (select *
,row_number () over
(
partition by col1,col2
order by date
) as rn_part_1_2
,row_number () over
(
partition by col1,col2,col3
order by date
) as rn_part_1_2_3
from mytable
) t
group by col1,col2,col3
,rn_part_1_2 - rn_part_1_2_3
order by col1,col2
,minDate
;
+------+------+------+------------+------------+
| col1 | col2 | col3 | mindate | maxdate |
+------+------+------+------------+------------+
| A | X | 5 | 2016-02-01 | 2016-02-04 |
| A | X | 6 | 2016-03-01 | 2016-03-01 |
| A | X | 5 | 2016-04-01 | 2016-04-01 |
| A | Y | 5 | 2016-04-02 | 2016-04-03 |
+------+------+------+------------+------------+