Search code examples
sqlgroup-byhivedata-analysisbigdata

Complex Group by Query in hive


I have dataset in hive table like below

date           col1      col2  col3
2016-02-01      A         X     5 
2016-02-03      A         X     5
2016-02-04      A         X     5
2016-03-01      A         X     6
2016-04-01      A         X     5
2016-04-02      A         Y     5
2016-04-03      A         Y     5

I have to do selective grouping on col 1 and col2 where in one group is formed whenever col3 value changes , for example col3 value is changed from 5 to 6 in row4 , I have to take the date column and get the min and max values out of it. the output should be like this .

col1 col2 col3    minDate           maxDate
A     X    5     2016-02-01         2016-02-04
A     X    6     2016-03-01         2016-03-01
A     X    5     2016-04-01         2016-04-01
A     Y    5     2016-04-02         2016-04-03  

I am sure simple group by on col1 and col2 can not work. Can anyone please suggest a way to achieve this please ?


Solution

  • select      col1,col2,col3
               ,min(date) as minDate
               ,max(date) as maxDate
    
    from       (select      *
                           ,row_number () over 
                            (
                                partition by    col1,col2
                                order by        date
                            ) as rn_part_1_2
    
                           ,row_number () over 
                            (
                                partition by    col1,col2,col3
                                order by        date
                            ) as rn_part_1_2_3                            
    
                from        mytable
                ) t
    
    group by    col1,col2,col3
               ,rn_part_1_2 - rn_part_1_2_3
    
    order by    col1,col2
               ,minDate
    ;           
    

    +------+------+------+------------+------------+
    | col1 | col2 | col3 |  mindate   |  maxdate   |
    +------+------+------+------------+------------+
    | A    | X    |    5 | 2016-02-01 | 2016-02-04 |
    | A    | X    |    6 | 2016-03-01 | 2016-03-01 |
    | A    | X    |    5 | 2016-04-01 | 2016-04-01 |
    | A    | Y    |    5 | 2016-04-02 | 2016-04-03 |
    +------+------+------+------------+------------+