Search code examples
sqlpostgresqlgreatest-n-per-groupgaps-and-islands

Get data from first and last row of groups based on sequence of rows


Taking it a step further than in my previous question: I have a table below with sample data.

datetimestamp message server system
2022-07-13 08:59:09 Normal Server 1 System 1
2022-07-13 08:59:10 Normal Server 4 System 2
2022-07-13 08:59:11 Normal Server 3 System 3
2022-07-13 08:59:12 Warning Server 35 System 27
2022-07-13 08:59:13 Warning Server 5 System 5
2022-07-13 08:59:14 Warning Server 9 System 6
2022-07-13 08:59:15 Warning Server 8 System 7
2022-07-13 08:59:16 Error Server 12 System 8
2022-07-13 08:59:17 Error Server 15 System 9
2022-07-13 08:59:18 Warning Server 29 System 10
2022-07-13 08:59:19 Warning Server 22 System 11
2022-07-13 08:59:20 Warning Server 13 System 12
2022-07-13 08:59:21 Normal Server 16 System 13
2022-07-13 08:59:22 Normal Server 19 System 14
2022-07-13 08:59:23 Normal Server 21 System 15
2022-07-13 08:59:24 Warning Server 24 System 16
2022-07-13 08:59:25 Warning Server 27 System 17
2022-07-13 08:59:26 Warning Server 25 System 18
2022-07-13 08:59:27 Error Server 30 System 23
2022-07-13 08:59:28 Error Server 31 System 20

I need to write a Postgres query which will give me output like:

Start_Datetime End_Datetime Server_Start Server_End System_Start System_End
2022-07-13 08:59:12 2022-07-13 08:59:15 Server 35 Server 8 System 27 System 7
2022-07-13 08:59:24 2022-07-13 08:59:26 Server 24 Server 25 System 16 System 18

If there is any 'Error' message, I need to take the start time of 'Warning' and end time of 'Warning' - and corresponding server and system! If there is no 'Error' message after 'Warning' ignore it e.g. after the warning at 2022-07-13 08:59:20 there is no error, so the query should ignore that range.

Setup table queries:

CREATE TABLE test_data (
  id integer PRIMARY KEY
, message varchar(10)
, datetimestamp timestamp NOT NULL
, server varchar(10)
, system varchar(10)
);

INSERT INTO test_data VALUES
  (09, 'Normal' , '2022-07-13 08:59:09', 'Server 1' , 'System 1')
, (10, 'Normal' , '2022-07-13 08:59:10', 'Server 4' , 'System 2')
, (11, 'Normal' , '2022-07-13 08:59:11', 'Server 3' , 'System 3')
, (12, 'Warning', '2022-07-13 08:59:12', 'Server 35', 'System 27')
, (13, 'Warning', '2022-07-13 08:59:13', 'Server 5' , 'System 5')
, (14, 'Warning', '2022-07-13 08:59:14', 'Server 9' , 'System 6')
, (15, 'Warning', '2022-07-13 08:59:15', 'Server 8' , 'System 7')
, (16, 'Error'  , '2022-07-13 08:59:16', 'Server 12', 'System 8')
, (17, 'Error'  , '2022-07-13 08:59:17', 'Server 15', 'System 9')
, (18, 'Warning', '2022-07-13 08:59:18', 'Server 29', 'System 10')
, (19, 'Warning', '2022-07-13 08:59:19', 'Server 22', 'System 11')
, (20, 'Warning', '2022-07-13 08:59:20', 'Server 13', 'System 12')
, (21, 'Normal' , '2022-07-13 08:59:21', 'Server 16', 'System 13')
, (22, 'Normal' , '2022-07-13 08:59:22', 'Server 19', 'System 14')
, (23, 'Normal' , '2022-07-13 08:59:23', 'Server 21', 'System 15')
, (24, 'Warning', '2022-07-13 08:59:24', 'Server 24', 'System 16')
, (25, 'Warning', '2022-07-13 08:59:25', 'Server 27', 'System 17')
, (26, 'Warning', '2022-07-13 08:59:26', 'Server 25', 'System 18')
, (27, 'Error'  , '2022-07-13 08:59:27', 'Server 30', 'System 23')
, (28, 'Error'  , '2022-07-13 08:59:28', 'Server 31', 'System 20')
;

Solution

  • You may try the following:

    With cte As 
    (
      Select D.id,D.message,D.ld,D.lg,
      Coalesce(Lag(D.id) over (order by D.datetimestamp,D.id),D.id) lastWarningId,
      Coalesce(Lag(D.id, 2) over (order by D.datetimestamp,D.id),
               Lag(D.id) over (order by D.datetimestamp,D.id)) firstWarningId
      From 
      (
         Select id, message,datetimestamp, 
                Coalesce(Lead(message) Over (Order By datetimestamp,id) , 'last') as ld,
                Coalesce(lag(message) Over (Order By datetimestamp,id) , 'first') as lg
         From test_data) D
         Where D.message<> D.ld or D.message <> D.lg
     ),
    cte2 As
    (
      Select id,message,ld,lg, lastWarningId, 
             Case When (Select Count(*) From test_data T Where T.id Between firstWarningId and lastWarningId and T.message='Warning')=1
             Then
                  lastWarningId 
             Else 
                  firstWarningId 
             End as firstwarningid2
      From cte
     )
    Select T.datetimestamp as Start_DateTime, T2.datetimestamp as End_DateTime, 
           T.server as Server_Start, T2.server as Server_End,
           T.system as System_Start, T2.system as System_End
      From cte2 C
      Join test_data T On T.id = C.firstwarningid2
      Join test_data T2 On T2.id = C.lastwarningid
      Where C.message='Error' And C.lg='Warning'
    

    See a demo from uk<>fiddle.

    The idea of this query is to find the first and last id for each group of messages, which is done in cet2. Where:

    Lag(id) over (order by datetimestamp,id) is the lastid of each group.

    Lag(id, 2) over (order by datetimestamp,id) is the first id of each group.

    Then for each group of message='Error' find the first and last id of the nearest group of message='Warning', which done by Where C.message='Error' And C.lg='Warning'.

    To get more details about how it works, try to select * from cet2 Where message='Error' And lg='Warning'.