Search code examples
sqlpostgresqlgaps-and-islands

Get start and end date time based on sequence of rows


I have a table below with sample data.

Date&Time Message
13/07/2022 8:59:09 Normal
13/07/2022 8:59:10 Normal
13/07/2022 8:59:11 Normal
13/07/2022 8:59:12 Warning
13/07/2022 8:59:13 Warning
13/07/2022 8:59:14 Warning
13/07/2022 8:59:15 Warning
13/07/2022 8:59:16 Error
13/07/2022 8:59:17 Error
13/07/2022 8:59:18 Warning
13/07/2022 8:59:19 Warning
13/07/2022 8:59:20 Warning
13/07/2022 8:59:21 Normal
13/07/2022 8:59:22 Normal
13/07/2022 8:59:23 Normal
13/07/2022 8:59:24 Warning
13/07/2022 8:59:25 Warning
13/07/2022 8:59:26 Warning
13/07/2022 8:59:27 Error
13/07/2022 8:59:28 Error

I need to write a Postgres query which will give me output like:

Start Date&Time End Date&Time
13/07/2022 8:59:12 13/07/2022 8:59:15
13/07/2022 8:59:24 13/07/2022 8:59:26

The scenario is - if there is any 'Error' message, I need to take the start time of 'Warning' and end time of Warning, and if there is no 'Error' message after 'Warning' ignore it e.g. after 13/07/2022 8:59:20 Warning' there is no error, so query should ignore that range. Please help to find a query for this.

Setup table queries:

CREATE TABLE test_data (
  id integer PRIMARY KEY
, message VARCHAR(10)
, datetimestamp timestamp without time zone NOT NULL
);

SET datestyle = 'DMY';
INSERT INTO test_data VALUES
  (09, 'Normal' , '13/07/2022 8:59:09')
, (10, 'Normal' , '13/07/2022 8:59:10')
, (11, 'Normal' , '13/07/2022 8:59:11')
, (12, 'Warning', '13/07/2022 8:59:12')
, (13, 'Warning', '13/07/2022 8:59:13')
, (14, 'Warning', '13/07/2022 8:59:14')
, (15, 'Warning', '13/07/2022 8:59:15')
, (16, 'Error'  , '13/07/2022 8:59:16')
, (17, 'Error'  , '13/07/2022 8:59:17')
, (18, 'Warning', '13/07/2022 8:59:18')
, (19, 'Warning', '13/07/2022 8:59:19')
, (20, 'Warning', '13/07/2022 8:59:20')
, (21, 'Normal' , '13/07/2022 8:59:21')
, (22, 'Normal' , '13/07/2022 8:59:22')
, (23, 'Normal' , '13/07/2022 8:59:23')
, (24, 'Warning', '13/07/2022 8:59:24')
, (25, 'Warning', '13/07/2022 8:59:25')
, (26, 'Warning', '13/07/2022 8:59:26')
, (27, 'Error'  , '13/07/2022 8:59:27')
, (28, 'Error'  , '13/07/2022 8:59:28')
;

Solution

  • This is a typical problem.

    SELECT min(datetimestamp) FILTER (WHERE message = 'Warning') AS start_warning
         , max(datetimestamp) FILTER (WHERE message = 'Warning') AS end_warning
    FROM  (
       SELECT *
            , count(*) FILTER (WHERE message = 'Warning' AND last_msg <> 'Warning')
                       OVER (ORDER BY datetimestamp) AS grp
       FROM  (
          SELECT datetimestamp, message
               , lag(message) OVER (ORDER BY datetimestamp) AS last_msg
          FROM   test_data
          ) sub1
       WHERE  message IN ('Warning', 'Error')
       ) sub2
    GROUP  BY grp
    HAVING bool_or(message = 'Error' AND last_msg = 'Warning');
    

    db<>fiddle here

    The query starts a new group whenever a 'Warning' is preceded by something else. And it only reports times when the 'Warning' block is followed by an 'Error'. You can have any number of different message types (except for NULL), these are the only conditions.

    The expensive part is the window function. (In particular when using different window frames.) This query makes do with a single window function and a single sort order. So it should be as fast (and simple) as it gets.

    Related:

    About the aggregate FILTER: