Search code examples
sqlhivewindow-functions

Generate timestamp columns based on tracked state changes


I have a table in Hive, as below:

id state timestamp
A 1 2022-01-01 23:01
A 0 2022-02-13 09:00
A 1 2022-04-04 11:21
A 0 2022-04-15 09:32
A 0 2022-04-27 10:05
B 1 2022-02-03 04:51
B 0 2022-02-14 05:01

I need to output two new columns, 'actioned' and 'completed'. Column actioned would be where state = 1 whilst completed would be where state = 0. So from the above table, I'd want to create this output:

id actioned completed
A 2022-01-01 23:01 2022-02-13 09:00
A 2022-04-04 11:21 2022-04-15 09:32
B 2022-02-03 04:51 2022-02-14 05:01

What's the most efficient way of doing this?


Solution

  • You can use window functions to identify transitions from state 1 (actioned) to 0 (completed) on the same id. Here is an approach using lead():

    select id, timestamp actioned, lead_timestamp completed
    from (
        select t.*,
            lead(state)     over(partition by id order by timestamp) lead_state,
            lead(timestamp) over(partition by id order by timestamp) lead_timestamp
        from mytable t
    ) t
    where state = 1 and lead_state = 0