For example I have a dataset like the following:
time | action |
---|---|
03:00:00 | block |
04:00:00 | unblock |
05:00:00 | block |
06:00:00 | unblock |
07:00:00 | unblock |
08:00:00 | block |
Now for each row, I want to get the last time when the column action
equals to "block" before the time of current row. For example, for the fifth row whose time
equals to "07:00:00" and action
equals to "unblock", the last time before it when action
equals to "block" should be the third row, and the expected time is "05:00:00".
My final expected result would be:
time | action | last_time |
---|---|---|
03:00:00 | block | 03:00:00 |
04:00:00 | unblock | 03:00:00 |
05:00:00 | block | 05:00:00 |
06:00:00 | unblock | 05:00:00 |
07:00:00 | unblock | 05:00:00 |
08:00:00 | block | 08:00:00 |
How can I get the above result by using a window function without joining by itself?
(p.s. if the above result cannot be reached, the following output is also okay:
time | action | last_time |
---|---|---|
03:00:00 | block | NULL |
04:00:00 | unblock | 03:00:00 |
05:00:00 | block | 03:00:00 |
06:00:00 | unblock | 05:00:00 |
07:00:00 | unblock | 05:00:00 |
08:00:00 | block | 05:00:00 |
Once we determine that action = 'block'
should begin a new block and we identify those blocks (block_no
), we can then use a window function partitioned by that block_no
to find the minimum time
within each block.
If the times aren't monotonically increasing, we can use the FIRST_VALUE
window function instead if we have another way to order, or just use another case expression to grab the time only when action = 'block', which would leave the other rows with null, which is easily ignored via MAX/MIN/etc.
Given the current data, however, I don't think we can work around that assumption that time needs to be monotonically increasing or at least increasing from block to block for all rows between each block.
Try this:
WITH cte1 AS (
SELECT *, SUM(CASE WHEN action = 'block' THEN 1 END) OVER (ORDER BY time) AS block_no FROM test
)
SELECT *, MIN(time) OVER (PARTITION BY block_no) AS block_time FROM cte1
ORDER BY time
;
The result:
time | action | block_no | block_time |
---|---|---|---|
03:00:00 | block | 1 | 03:00:00 |
04:00:00 | unblock | 1 | 03:00:00 |
05:00:00 | block | 2 | 05:00:00 |
06:00:00 | unblock | 2 | 05:00:00 |
07:00:00 | unblock | 2 | 05:00:00 |
08:00:00 | block | 3 | 08:00:00 |
The setup:
CREATE TABLE test (time varchar(20), action varchar(20));
INSERT INTO test VALUES
('03:00:00', 'block')
, ('04:00:00', 'unblock')
, ('05:00:00', 'block')
, ('06:00:00', 'unblock')
, ('07:00:00', 'unblock')
, ('08:00:00', 'block')
;