I have below SQL query returning me some list of timestamps based on input range of timestamps.
select data_timestamp
from table1
where data_timestamp > :input_start_ts
and data_timestamp < :input_end_ts;
**Output : **
2024-07-10 10:21:10
2024-07-10 10:21:18
2024-07-10 10:21:21
2024-07-10 10:21:22
2024-07-10 10:21:23
2024-07-10 10:21:25
2024-07-10 10:21:26
2024-07-10 10:21:29
2024-07-10 10:21:30
2024-07-10 10:21:31
2024-07-10 10:21:32
2024-07-10 10:21:40
2024-07-10 10:21:49
2024-07-10 10:21:55
2024-07-10 10:21:56
2024-07-10 10:22:01
2024-07-10 10:22:40
I am trying to group these timestamp for 20 seconds window and add a new columns for that group indicator :
Expected Output:
2024-07-10 10:21:10 1
2024-07-10 10:21:18 1
2024-07-10 10:21:21 1
2024-07-10 10:21:22 1
2024-07-10 10:21:23 1
2024-07-10 10:21:25 1
2024-07-10 10:21:26 1
2024-07-10 10:21:29 1
2024-07-10 10:21:30 2
2024-07-10 10:21:31 2
2024-07-10 10:21:32 2
2024-07-10 10:21:40 2
2024-07-10 10:21:49 2
2024-07-10 10:21:55 3
2024-07-10 10:21:56 3
2024-07-10 10:22:01 3
2024-07-10 10:22:40 4
Can you please guide me to right direction
It depends on what you mean by a 20-second window.
If you want to perform row-by-row pattern matching and match the first row and all the rows within 20 seconds of it as the first group and then repeat for the next unmatched row then you can use MATCH_RECOGNIZE
:
SELECT *
FROM table1
MATCH_RECOGNIZE(
ORDER BY data_timestamp
MEASURES
MATCH_NUMBER() AS grp
ALL ROWS PER MATCH
PATTERN (within_20_seconds+)
DEFINE
within_20_seconds AS data_timestamp < FIRST(data_timestamp)
+ INTERVAL '20' SECOND
)
Which, for the sample data:
CREATE TABLE table1 (data_timestamp TIMESTAMP(0))
INSERT INTO table1 (data_timestamp)
SELECT TIMESTAMP '2024-07-10 10:21:10' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:18' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:21' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:22' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:23' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:25' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:26' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:29' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:30' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:31' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:32' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:40' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:49' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:55' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:21:56' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:22:01' FROM DUAL UNION ALL
SELECT TIMESTAMP '2024-07-10 10:22:40' FROM DUAL;
Outputs:
DATA_TIMESTAMP | GRP |
---|---|
2024-07-10 10:21:10. | 1 |
2024-07-10 10:21:18. | 1 |
2024-07-10 10:21:21. | 1 |
2024-07-10 10:21:22. | 1 |
2024-07-10 10:21:23. | 1 |
2024-07-10 10:21:25. | 1 |
2024-07-10 10:21:26. | 1 |
2024-07-10 10:21:29. | 1 |
2024-07-10 10:21:30. | 2 |
2024-07-10 10:21:31. | 2 |
2024-07-10 10:21:32. | 2 |
2024-07-10 10:21:40. | 2 |
2024-07-10 10:21:49. | 2 |
2024-07-10 10:21:55. | 3 |
2024-07-10 10:21:56. | 3 |
2024-07-10 10:22:01. | 3 |
2024-07-10 10:22:40. | 4 |
If you want to have groups starting at 0, 20 and 40 seconds past the minute then:
SELECT data_timestamp,
DENSE_RANK() OVER (
ORDER BY TRUNC(data_timestamp, 'MI')
+ TRUNC(EXTRACT(SECOND FROM data_timestamp) / 20)
* INTERVAL '20' SECOND
) AS grp
FROM table1
ORDER BY data_timestamp
Which outputs:
DATA_TIMESTAMP | GRP |
---|---|
2024-07-10 10:21:10. | 1 |
2024-07-10 10:21:18. | 1 |
2024-07-10 10:21:21. | 2 |
2024-07-10 10:21:22. | 2 |
2024-07-10 10:21:23. | 2 |
2024-07-10 10:21:25. | 2 |
2024-07-10 10:21:26. | 2 |
2024-07-10 10:21:29. | 2 |
2024-07-10 10:21:30. | 2 |
2024-07-10 10:21:31. | 2 |
2024-07-10 10:21:32. | 2 |
2024-07-10 10:21:40. | 3 |
2024-07-10 10:21:49. | 3 |
2024-07-10 10:21:55. | 3 |
2024-07-10 10:21:56. | 3 |
2024-07-10 10:22:01. | 4 |
2024-07-10 10:22:40. | 5 |
If you want to count 20-second groups starting from 0, 20 and 40 seconds past the minute from the first value and not skipping groups when there are zero rows then:
SELECT data_timestamp,
ROUND(
(
( TRUNC(data_timestamp, 'MI')
+ TRUNC(EXTRACT(SECOND FROM data_timestamp) / 20) * INTERVAL '20' SECOND
)
- MIN(
TRUNC(data_timestamp, 'MI')
+ TRUNC(EXTRACT(SECOND FROM data_timestamp) / 20) * INTERVAL '20' SECOND
) OVER ()
) * 4320
) + 1 AS grp
FROM table1
Which outputs the same as the previous except that the last group is 6
not 5
(since there are zero rows in the 2024-07-10 10:22:20
- 2024-07-10 10:22:39
range, which would be the 5th group).
If you want to count 20-second intervals starting from the minimum time then:
SELECT data_timestamp,
ROUND(
(CAST(data_timestamp AS DATE) - MIN(CAST(data_timestamp AS DATE)) OVER ())
* 4320
) + 1 AS grp
FROM table1
Which is similar to the previous (but some rows shuffle groups given the slight change of start from 2024-07-10 10:21:00
in the previous to 2024-07-10 10:21:10
in this query).