I have a table in Hive, as below:
id | state | timestamp |
---|---|---|
A | 1 | 2022-01-01 23:01 |
A | 0 | 2022-02-13 09:00 |
A | 1 | 2022-04-04 11:21 |
A | 0 | 2022-04-15 09:32 |
A | 0 | 2022-04-27 10:05 |
B | 1 | 2022-02-03 04:51 |
B | 0 | 2022-02-14 05:01 |
I need to output two new columns, 'actioned' and 'completed'. Column actioned would be where state = 1 whilst completed would be where state = 0. So from the above table, I'd want to create this output:
id | actioned | completed |
---|---|---|
A | 2022-01-01 23:01 | 2022-02-13 09:00 |
A | 2022-04-04 11:21 | 2022-04-15 09:32 |
B | 2022-02-03 04:51 | 2022-02-14 05:01 |
What's the most efficient way of doing this?
You can use window functions to identify transitions from state 1
(actioned) to 0
(completed) on the same id
. Here is an approach using lead()
:
select id, timestamp actioned, lead_timestamp completed
from (
select t.*,
lead(state) over(partition by id order by timestamp) lead_state,
lead(timestamp) over(partition by id order by timestamp) lead_timestamp
from mytable t
) t
where state = 1 and lead_state = 0