Search code examples
sqloracle

Group records based on Timestamp for every 20 second window


I have below SQL query returning me some list of timestamps based on input range of timestamps.

select data_timestamp 
  from table1
 where data_timestamp > :input_start_ts
   and data_timestamp < :input_end_ts;

**Output : **

2024-07-10 10:21:10
2024-07-10 10:21:18
2024-07-10 10:21:21
2024-07-10 10:21:22
2024-07-10 10:21:23
2024-07-10 10:21:25
2024-07-10 10:21:26
2024-07-10 10:21:29
2024-07-10 10:21:30
2024-07-10 10:21:31
2024-07-10 10:21:32
2024-07-10 10:21:40
2024-07-10 10:21:49
2024-07-10 10:21:55
2024-07-10 10:21:56
2024-07-10 10:22:01
2024-07-10 10:22:40

I am trying to group these timestamp for 20 seconds window and add a new columns for that group indicator :

  • Expected Output:

      2024-07-10 10:21:10    1
      2024-07-10 10:21:18    1
      2024-07-10 10:21:21    1
      2024-07-10 10:21:22    1
      2024-07-10 10:21:23    1
      2024-07-10 10:21:25    1
      2024-07-10 10:21:26    1
      2024-07-10 10:21:29    1
      2024-07-10 10:21:30    2
      2024-07-10 10:21:31    2
      2024-07-10 10:21:32    2
      2024-07-10 10:21:40    2
      2024-07-10 10:21:49    2
      2024-07-10 10:21:55    3
      2024-07-10 10:21:56    3
      2024-07-10 10:22:01    3
      2024-07-10 10:22:40    4
    

Can you please guide me to right direction


Solution

  • It depends on what you mean by a 20-second window.

    If you want to perform row-by-row pattern matching and match the first row and all the rows within 20 seconds of it as the first group and then repeat for the next unmatched row then you can use MATCH_RECOGNIZE:

    SELECT *
    FROM   table1
           MATCH_RECOGNIZE(
             ORDER BY data_timestamp
             MEASURES
               MATCH_NUMBER() AS grp
             ALL ROWS PER MATCH
             PATTERN (within_20_seconds+)
             DEFINE
               within_20_seconds AS data_timestamp < FIRST(data_timestamp)
                                                     + INTERVAL '20' SECOND
           )
    

    Which, for the sample data:

    CREATE TABLE table1 (data_timestamp TIMESTAMP(0))
    
    INSERT INTO table1 (data_timestamp)
    SELECT TIMESTAMP '2024-07-10 10:21:10' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:18' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:21' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:22' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:23' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:25' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:26' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:29' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:30' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:31' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:32' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:40' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:49' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:55' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:21:56' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:22:01' FROM DUAL UNION ALL
    SELECT TIMESTAMP '2024-07-10 10:22:40' FROM DUAL;
    

    Outputs:

    DATA_TIMESTAMP GRP
    2024-07-10 10:21:10. 1
    2024-07-10 10:21:18. 1
    2024-07-10 10:21:21. 1
    2024-07-10 10:21:22. 1
    2024-07-10 10:21:23. 1
    2024-07-10 10:21:25. 1
    2024-07-10 10:21:26. 1
    2024-07-10 10:21:29. 1
    2024-07-10 10:21:30. 2
    2024-07-10 10:21:31. 2
    2024-07-10 10:21:32. 2
    2024-07-10 10:21:40. 2
    2024-07-10 10:21:49. 2
    2024-07-10 10:21:55. 3
    2024-07-10 10:21:56. 3
    2024-07-10 10:22:01. 3
    2024-07-10 10:22:40. 4

    If you want to have groups starting at 0, 20 and 40 seconds past the minute then:

    SELECT data_timestamp,
           DENSE_RANK() OVER (
             ORDER BY TRUNC(data_timestamp, 'MI')
                      + TRUNC(EXTRACT(SECOND FROM data_timestamp) / 20)
                      * INTERVAL '20' SECOND
           ) AS grp
    FROM   table1
    ORDER BY data_timestamp
    

    Which outputs:

    DATA_TIMESTAMP GRP
    2024-07-10 10:21:10. 1
    2024-07-10 10:21:18. 1
    2024-07-10 10:21:21. 2
    2024-07-10 10:21:22. 2
    2024-07-10 10:21:23. 2
    2024-07-10 10:21:25. 2
    2024-07-10 10:21:26. 2
    2024-07-10 10:21:29. 2
    2024-07-10 10:21:30. 2
    2024-07-10 10:21:31. 2
    2024-07-10 10:21:32. 2
    2024-07-10 10:21:40. 3
    2024-07-10 10:21:49. 3
    2024-07-10 10:21:55. 3
    2024-07-10 10:21:56. 3
    2024-07-10 10:22:01. 4
    2024-07-10 10:22:40. 5

    If you want to count 20-second groups starting from 0, 20 and 40 seconds past the minute from the first value and not skipping groups when there are zero rows then:

    SELECT data_timestamp,
           ROUND(
             (
               ( TRUNC(data_timestamp, 'MI')
                 + TRUNC(EXTRACT(SECOND FROM data_timestamp) / 20) * INTERVAL '20' SECOND
               )
               - MIN(
                 TRUNC(data_timestamp, 'MI')
                 + TRUNC(EXTRACT(SECOND FROM data_timestamp) / 20) * INTERVAL '20' SECOND
               ) OVER ()
             ) * 4320
           ) + 1 AS grp
    FROM   table1
    

    Which outputs the same as the previous except that the last group is 6 not 5 (since there are zero rows in the 2024-07-10 10:22:20 - 2024-07-10 10:22:39 range, which would be the 5th group).

    If you want to count 20-second intervals starting from the minimum time then:

    SELECT data_timestamp,
           ROUND(
             (CAST(data_timestamp AS DATE) - MIN(CAST(data_timestamp AS DATE)) OVER ())
             * 4320
           ) + 1 AS grp
    FROM   table1
    

    Which is similar to the previous (but some rows shuffle groups given the slight change of start from 2024-07-10 10:21:00 in the previous to 2024-07-10 10:21:10 in this query).

    fiddle