In SQL Server, I have a set of time series on/off events which look like this (for simplicity I have only shown one Alarm Number, but there are many in the same table):
'Alarms' Table:
AlarmNumber Time AlarmState
1592 2020-01-02 01:52:02 1
1592 2020-01-02 01:58:07 0
1592 2020-04-28 03:46:49 1
1592 2020-04-28 06:19:10 0
1592 2020-06-04 00:25:22 1
1592 2020-08-27 01:57:03 1
1592 2020-08-27 05:16:32 0
1592 2020-09-17 02:51:57 0
Which I am trying convert into On/Off Pairs:
Output I am trying to achieve, ideally as an SQL View:
AlarmNumber StartTime EndTime
1592 2020-01-02 01:52:02 2020-01-02 01:58:07
1592 2020-04-28 03:46:49 2020-04-28 06:19:10
1592 2020-06-04 00:25:22 NULL
1592 2020-08-27 01:57:03 2020-08-27 05:16:32
1592 NULL 2020-09-17 02:51:57
If I had a clean data set, with no missing 'On' or 'Off' events, I can achieve this with:
select tOn.AlarmNumber, tOn.Time StartTime, tOff.Time EndTime
from (
select AlarmNumber, Time,
ROW_NUMBER() Over(Partition by AlarmNumber order by Time) EventID
from Alarms where AlarmState = 1
) tOn
LEFT JOIN (
select AlarmNumber, Time,
ROW_NUMBER() Over(Partition by AlarmNumber order by Time) EventID
from Alarms where AlarmState = 0
) tOff
on (tOn.AlarmNumber = tOff.AlarmNumber and tOn.EventID = tOff.EventID)
(Code modified from Adriano Carneiro's answer at T-SQL Start and end date times from a single column )
My Question: Can anyone think of an efficient way of processing 'Alarms' table to achieve my sample Output, which handles missing on/off events (shown as the NULL's in the sample Output)?
My backup is to use a Cursor and a Where loop, but I was hoping there may be a way to doing it by grouping pairs of On/Off events together, I just haven't been able to get it working. I have 500k+ of events, so it is a large data set to be iterating over.
Any ideas welcome!
Thank you, Thomas
------ UPDATE 1st Nov 2020 ------
Two great solutions have been provided and both work correctly and provide identical results on a sample of 80,000 rows of messy real world data.
Once having the rows order, just SELECT
them in parts and unite the results using UNION ALL
:
DECLARE @DataSource TABLE
(
[AlarmNumber] INT
,[Time] DATETIME2(0)
,[AlarmState] INT
);
INSERT INTO @DataSource ([AlarmNumber], [Time], [AlarmState])
VALUES (1592, '2020-01-02 01:52:02', 1)
,(1592, '2020-01-02 01:58:07', 0)
,(1592, '2020-04-28 03:46:49', 1)
,(1592, '2020-04-28 06:19:10', 0)
,(1592, '2020-06-04 00:25:22', 1)
,(1592, '2020-08-27 01:57:03', 1)
,(1592, '2020-08-27 05:16:32', 0)
,(1592, '2020-09-17 02:51:57', 0);
-- Add a rowID column to the data
WITH DataSource AS
(
SELECT * ,ROW_NUMBER() Over(Partition by AlarmNumber order by [Time]) rowID
FROM @DataSource
)
-- This is just here so we can sort the result at the end
SELECT * FROM (
-- Select rows of DataSource where there is an ON and subsequent OFF event (DS1 Alarm is ON and DS2 Alarm is OFF)
-- This also catches where there is an ON, but no subsequent OFF (DS2.Time will be NULL)
SELECT DS1.AlarmNumber
,DS1.Time As StartTime
,DS2.Time As EndTime
FROM DataSource DS1
LEFT JOIN DataSource DS2
ON DS1.[rowID] = DS2.[rowID] - 1
AND DS1.AlarmNumber = DS2.AlarmNumber
AND DS2.[AlarmState] = 0
WHERE DS1.[AlarmState] = 1
UNION ALL
-- Select rows of DataSource where there is an OFF and there is no matching ON (aka it turned OFF without ever turning ON)
SELECT DS2.AlarmNumber
,NULL As StartTime
,DS2.Time As EndTime
FROM DataSource DS2
INNER JOIN DataSource DS1
ON DS2.[rowID] -1 = DS1.[rowID]
AND DS1.[AlarmState] = 0
AND DS2.AlarmNumber = DS1.AlarmNumber
WHERE DS2.[AlarmState] = 0
UNION ALL
-- Select rows of DataSource where the first event for this alarm number is an OFF (it would otherwise be missed by the above)
SELECT DS1.AlarmNumber
,NULL As StartTime
,DS1.Time As EndTime
FROM DataSource DS1
WHERE DS1.[AlarmState] = 0 AND DS1.rowID = 1
) z
ORDER BY COALESCE(StartTime,EndTime), AlarmNumber