Search code examples
sqlsql-servert-sqldatetimegaps-and-islands

SQL - Convert Time Series Events into On/Off Pairs (handling potential missing On's or Off's)


In SQL Server, I have a set of time series on/off events which look like this (for simplicity I have only shown one Alarm Number, but there are many in the same table):

'Alarms' Table:
AlarmNumber   Time                      AlarmState
1592          2020-01-02 01:52:02       1
1592          2020-01-02 01:58:07       0
1592          2020-04-28 03:46:49       1
1592          2020-04-28 06:19:10       0
1592          2020-06-04 00:25:22       1
1592          2020-08-27 01:57:03       1
1592          2020-08-27 05:16:32       0
1592          2020-09-17 02:51:57       0

Which I am trying convert into On/Off Pairs:

Output I am trying to achieve, ideally as an SQL View:
AlarmNumber   StartTime                 EndTime
1592          2020-01-02 01:52:02       2020-01-02 01:58:07
1592          2020-04-28 03:46:49       2020-04-28 06:19:10
1592          2020-06-04 00:25:22       NULL
1592          2020-08-27 01:57:03       2020-08-27 05:16:32
1592          NULL                      2020-09-17 02:51:57

If I had a clean data set, with no missing 'On' or 'Off' events, I can achieve this with:

select tOn.AlarmNumber, tOn.Time StartTime, tOff.Time EndTime
from (
select AlarmNumber, Time, 
       ROW_NUMBER() Over(Partition by AlarmNumber order by Time) EventID
from Alarms where AlarmState = 1
) tOn
LEFT JOIN (
select AlarmNumber, Time, 
       ROW_NUMBER() Over(Partition by AlarmNumber order by Time) EventID
from Alarms where AlarmState = 0
) tOff
on (tOn.AlarmNumber = tOff.AlarmNumber and tOn.EventID = tOff.EventID)

(Code modified from Adriano Carneiro's answer at T-SQL Start and end date times from a single column )

My Question: Can anyone think of an efficient way of processing 'Alarms' table to achieve my sample Output, which handles missing on/off events (shown as the NULL's in the sample Output)?

My backup is to use a Cursor and a Where loop, but I was hoping there may be a way to doing it by grouping pairs of On/Off events together, I just haven't been able to get it working. I have 500k+ of events, so it is a large data set to be iterating over.

Any ideas welcome!

Thank you, Thomas

------ UPDATE 1st Nov 2020 ------

Two great solutions have been provided and both work correctly and provide identical results on a sample of 80,000 rows of messy real world data.

  • GMB's solution is simpler to read, but a bit slower to run
  • gotqn's solution is more lines of code, but runs about 50% faster on my test server

Solution

  • Once having the rows order, just SELECT them in parts and unite the results using UNION ALL:

    DECLARE @DataSource TABLE
    (
        [AlarmNumber] INT
       ,[Time] DATETIME2(0)
       ,[AlarmState] INT
    );
    
    INSERT INTO @DataSource ([AlarmNumber], [Time], [AlarmState])
    VALUES (1592, '2020-01-02 01:52:02', 1)
          ,(1592, '2020-01-02 01:58:07', 0)
          ,(1592, '2020-04-28 03:46:49', 1)
          ,(1592, '2020-04-28 06:19:10', 0)
          ,(1592, '2020-06-04 00:25:22', 1)
          ,(1592, '2020-08-27 01:57:03', 1)
          ,(1592, '2020-08-27 05:16:32', 0)
          ,(1592, '2020-09-17 02:51:57', 0);
    
    -- Add a rowID column to the data
    WITH DataSource AS
    (
        SELECT * ,ROW_NUMBER() Over(Partition by AlarmNumber order by [Time]) rowID
        FROM @DataSource
    )
    
    -- This is just here so we can sort the result at the end
    SELECT * FROM (
    
    -- Select rows of DataSource where there is an ON and subsequent OFF event (DS1 Alarm is ON and DS2 Alarm is OFF)
    -- This also catches where there is an ON, but no subsequent OFF (DS2.Time will be NULL)
        SELECT DS1.AlarmNumber
                ,DS1.Time As StartTime
                ,DS2.Time As EndTime
        FROM DataSource DS1
        LEFT JOIN DataSource DS2
            ON DS1.[rowID] = DS2.[rowID] - 1
            AND DS1.AlarmNumber = DS2.AlarmNumber
            AND DS2.[AlarmState] = 0
        WHERE DS1.[AlarmState] = 1
    
        UNION ALL
    
        -- Select rows of DataSource where there is an OFF and there is no matching ON (aka it turned OFF without ever turning ON)
        SELECT DS2.AlarmNumber
                ,NULL As StartTime
                ,DS2.Time As EndTime
        FROM DataSource DS2
    
        INNER JOIN DataSource DS1
            ON DS2.[rowID] -1 = DS1.[rowID]
            AND DS1.[AlarmState] = 0
            AND DS2.AlarmNumber = DS1.AlarmNumber
        
        WHERE DS2.[AlarmState] = 0
    
        UNION ALL
    
        -- Select rows of DataSource where the first event for this alarm number is an OFF (it would otherwise be missed by the above)
        SELECT DS1.AlarmNumber
                ,NULL As StartTime
                ,DS1.Time As EndTime
        FROM DataSource DS1
        WHERE DS1.[AlarmState] = 0 AND DS1.rowID = 1
    ) z
    ORDER BY COALESCE(StartTime,EndTime), AlarmNumber
    

    enter image description here