I have a table below with sample data.
Date&Time | Message |
---|---|
13/07/2022 8:59:09 | Normal |
13/07/2022 8:59:10 | Normal |
13/07/2022 8:59:11 | Normal |
13/07/2022 8:59:12 | Warning |
13/07/2022 8:59:13 | Warning |
13/07/2022 8:59:14 | Warning |
13/07/2022 8:59:15 | Warning |
13/07/2022 8:59:16 | Error |
13/07/2022 8:59:17 | Error |
13/07/2022 8:59:18 | Warning |
13/07/2022 8:59:19 | Warning |
13/07/2022 8:59:20 | Warning |
13/07/2022 8:59:21 | Normal |
13/07/2022 8:59:22 | Normal |
13/07/2022 8:59:23 | Normal |
13/07/2022 8:59:24 | Warning |
13/07/2022 8:59:25 | Warning |
13/07/2022 8:59:26 | Warning |
13/07/2022 8:59:27 | Error |
13/07/2022 8:59:28 | Error |
I need to write a Postgres query which will give me output like:
Start Date&Time | End Date&Time |
---|---|
13/07/2022 8:59:12 | 13/07/2022 8:59:15 |
13/07/2022 8:59:24 | 13/07/2022 8:59:26 |
The scenario is - if there is any 'Error' message, I need to take the start time of 'Warning' and end time of Warning, and if there is no 'Error' message after 'Warning' ignore it e.g. after 13/07/2022 8:59:20 Warning' there is no error, so query should ignore that range. Please help to find a query for this.
Setup table queries:
CREATE TABLE test_data (
id integer PRIMARY KEY
, message VARCHAR(10)
, datetimestamp timestamp without time zone NOT NULL
);
SET datestyle = 'DMY';
INSERT INTO test_data VALUES
(09, 'Normal' , '13/07/2022 8:59:09')
, (10, 'Normal' , '13/07/2022 8:59:10')
, (11, 'Normal' , '13/07/2022 8:59:11')
, (12, 'Warning', '13/07/2022 8:59:12')
, (13, 'Warning', '13/07/2022 8:59:13')
, (14, 'Warning', '13/07/2022 8:59:14')
, (15, 'Warning', '13/07/2022 8:59:15')
, (16, 'Error' , '13/07/2022 8:59:16')
, (17, 'Error' , '13/07/2022 8:59:17')
, (18, 'Warning', '13/07/2022 8:59:18')
, (19, 'Warning', '13/07/2022 8:59:19')
, (20, 'Warning', '13/07/2022 8:59:20')
, (21, 'Normal' , '13/07/2022 8:59:21')
, (22, 'Normal' , '13/07/2022 8:59:22')
, (23, 'Normal' , '13/07/2022 8:59:23')
, (24, 'Warning', '13/07/2022 8:59:24')
, (25, 'Warning', '13/07/2022 8:59:25')
, (26, 'Warning', '13/07/2022 8:59:26')
, (27, 'Error' , '13/07/2022 8:59:27')
, (28, 'Error' , '13/07/2022 8:59:28')
;
This is a typical gaps-and-islands problem.
SELECT min(datetimestamp) FILTER (WHERE message = 'Warning') AS start_warning
, max(datetimestamp) FILTER (WHERE message = 'Warning') AS end_warning
FROM (
SELECT *
, count(*) FILTER (WHERE message = 'Warning' AND last_msg <> 'Warning')
OVER (ORDER BY datetimestamp) AS grp
FROM (
SELECT datetimestamp, message
, lag(message) OVER (ORDER BY datetimestamp) AS last_msg
FROM test_data
) sub1
WHERE message IN ('Warning', 'Error')
) sub2
GROUP BY grp
HAVING bool_or(message = 'Error' AND last_msg = 'Warning');
db<>fiddle here
The query starts a new group whenever a 'Warning' is preceded by something else. And it only reports times when the 'Warning' block is followed by an 'Error'. You can have any number of different message types (except for NULL
), these are the only conditions.
The expensive part is the window function. (In particular when using different window frames.) This query makes do with a single window function and a single sort order. So it should be as fast (and simple) as it gets.
Related:
About the aggregate FILTER
: