Search code examples
datehivewindow

hive window function - row change in value


I have data which has an id /flag and date fields I need to populate flag_date field in below way

login_date      id      flag    flag_date
5/1/2018        100     N       NULL
5/2/2018        100     N       NULL
5/3/2018        100     Y       5/3/2018
5/4/2018        100     Y       5/3/2018
5/5/2018        100     Y       5/3/2018
5/6/2018        100     N       NULL
5/7/2018        100     N       NULL
5/8/2018        100     Y       5/8/2018
5/9/2018        100     Y       5/8/2018
5/10/2018       100     Y       5/8/2018

When Flag value changes to N to Y, flag_date value changes accordingly. please help.


Solution

  • select login_date
          ,id
          ,flag
          ,case when flag = 'Y' then min(login_date) over(partition by id,grp) end as flag_date
    from (select login_date,id,flag
                ,row_number() over(partition by id order by login_date) - 
                 row_number() over(partition by id,flag order by login_date) as grp
          from tbl 
         ) t
    
    • First classify rows into groups, i.e. consecutive 'Y's and 'N's starting a new value when the series is broken. This can be done with a difference of row numbers approach. (Run the inner query to see how group numbers are assigned)
    • Once the groups are assigned, it is trivial to compute flag_date with conditional aggregation.

    One more approach to solve this involves generating a new group whenever a 'N' value is encountered. The outer query remains the same, only the inner one changes.

    select login_date
          ,id
          ,flag
          ,case when flag = 'Y' then min(login_date) over(partition by id,grp) end as flag_date 
    from (select login_date,id,flag
                ,sum(case when flag = 'N' then 1 else 0 end) over(partition by id order by login_date) as grp
          from tbl 
         ) t